I was recently tasked with a data transfer exercise, between a number of existing systems and a new system that went live in March 2015. It gave me the chance to exercise some great Oracle PL/SQL features that I believe would be difficult to match in other database environments. As usual with this kind of operation the following was true:
My preferred overall strategy in this case was as follows:
So here are the features that made this task a pleasure in Oracle:
If linking the Oracle database to the original source is complex then, so long as an extract can be turned into a formatted text file, we can set up an external table.
External Tables allow us to view text files (comma or tab delimited, or with fixed fields lengths) as if they were a table in themselves. This really helps with the repeatability of the load process and with testing (we can add our own records easily). We can get an initial extract to work with and when it is time to load the data for real, just repeat the extract process and replace the formatted file.
Points to watch for..
If the translation process is running for a long time, we will want to know what stage it is at. We can easily store the process and debug information in a table, but we would need to commit regularly to see the results. If we have a large batch processes that we are half way through, we want to minimise the commits. So we’ll need to wait until it finishes to see the results. Unless we use Autonomous Transactions.
By setting up procedures which have the Autonomous Transaction Pragma, the log information we want can be committed without committing the data that we are translating.
Points of note
It is understood (thanks to Oracle Performance Tuning gurus) that processing data one row at a time is a slow process. We can avoid this approach by using a few techniques. The first is BULK COLLECT.
This approach will load the results of a query into memory, where we can perform translation and validation without writing to the database.
This is a dream solution to the performance issues that can be associated with calling a function from a SQL statement. If the function is executed once for every row returned then the time for the statement to run will be significantly impaired. Adding RESULT_CACHE to the function declaration will allow Oracle to store the return value for a specific set of input parameters. The next time is sees those parameters for that function call it will just return the stored result. (But, see notes below).
It is ideal for the translation of reference data. It can be used to first translate the source fields into a common language of the new system, then again for the functions which turn those common names into codes for the new system.
Points to note…
This Pragma statement allows us to name an exception that we can catch later. This feature is very useful given that we want to translate many columns into a new set of columns. We can store the translation logic in functions that will hide the business logic and make the translation process more transparent.
If the function fails because the original source data cannot be translated then an error can be raised by the function. An exception handler in the translation process can then trap that error and record the issue.
The alternative would be to call the function then check the result for each function call. That could result in a massive IF..ELSIF…ELSIF… etc. to control the flow. Raising and catching exceptions allows for specific error messages to be created and makes the translation process more readable…
FORALL with SAVE EXCEPTIONS
With the records for the new dataset stored in memory structures (collections) we can then write them in single batches into the database. The FORALL will traverse a collection and execute one statement (e.g. INSERT) for all entries in the collection.
We don’t want to stop processing ever time there is an error. We want to record any integrity errors in the issues table. The SAVE EXCEPTION option allows us to handle errors in this way.
Points to note…
There are many other features which cheer me up when working with Oracle. For ETL you might also want to consider MERGE andINSERT ALL. Even a simple WITH statement can reduce code complexity and improve performance enormously.
At some stage I’ll compile a list of favourites through the various versions of Oracle. If you have any favourites I’d love to hear them….