The recent discovery of a LONG column in a table used by a business critical application cause me cause to stop and consider my approach. The scenario was that the new application was based on communication layers that would not support LONG column data. The old application would not support CLOB data, so I would have to change the column at the point of going live with the new app.
I have serious dislike for LONG columns. It's not at the level of the total revulsion and outrage I feel for WHEN OTHERS THEN NULL, but it is significant. LONGs are incredibly restricting with regard to what you can do with them and difficult to work with as a result.
They can be easily dealt with, however. How about...
ALTER TABLE mytable MODIFY(mycolumn CLOB);
It's does have a couple of benefits.
Job done!? Well... it's the LONG column we want to say goodbye to, not our career, so what's the backout plan? What if something goes horribly wrong during the operation? What if the database change is made successfully, but we have to fall back to the old application for other reasons?
That command is a one way trip! There is no way to alter it back. Flashback won't help because its DDL.
So what tools do we have in the box, besides the tempting ALTER command above?
The standard approach would be to create a duplicate table using the CREATE TABLE...AS..(OR INSERT INTO)... SELECT... TO_LOB()... command. (This is a special case where we are allowed to use the TO_LOB() function to generate a LOB from a LONG (or LONG RAW).
We could replicate the table (with a CLOB instead of a LONG) and all the table's orbital objects (indexes, triggers, etc.), then RENAME the original table to a backup name and RENAME the new table to the original name. After a recompile of dependent code, everything should be good again.
(Additional note: The RENAME command prevents the use of a schema name for the object, so must be run as the owner of the object. The ALTER TABLE <schema>.<table> RENAME TO <new name> syntax can be run from a privileged account, making automation simpler - many thanks to the DBA team for this tip).
We also have the option of recovering from a Data Pump export file if the process fails. this will require some downtime for operations which depend on the table in question. It's also worth testing the recovery process before altering the database. We need to know that recovery is reliable and we know operations with LONG columns aren't always supported.
The CREATE TABLE command (with related objects) is still the most appealing approach. While it requires additional space (if the DBAs can provide it), it has the following benefits:
Points Worth Noting
I was recently tasked with a data transfer exercise, between a number of existing systems and a new system that went live in March 2015. It gave me the chance to exercise some great Oracle PL/SQL features that I believe would be difficult to match in other database environments. As usual with this kind of operation the following was true:
My preferred overall strategy in this case was as follows:
So here are the features that made this task a pleasure in Oracle:
If linking the Oracle database to the original source is complex then, so long as an extract can be turned into a formatted text file, we can set up an external table.
External Tables allow us to view text files (comma or tab delimited, or with fixed fields lengths) as if they were a table in themselves. This really helps with the repeatability of the load process and with testing (we can add our own records easily). We can get an initial extract to work with and when it is time to load the data for real, just repeat the extract process and replace the formatted file.
Points to watch for..
If the translation process is running for a long time, we will want to know what stage it is at. We can easily store the process and debug information in a table, but we would need to commit regularly to see the results. If we have a large batch processes that we are half way through, we want to minimise the commits. So we’ll need to wait until it finishes to see the results. Unless we use Autonomous Transactions.
By setting up procedures which have the Autonomous Transaction Pragma, the log information we want can be committed without committing the data that we are translating.
Points of note
It is understood (thanks to Oracle Performance Tuning gurus) that processing data one row at a time is a slow process. We can avoid this approach by using a few techniques. The first is BULK COLLECT.
This approach will load the results of a query into memory, where we can perform translation and validation without writing to the database.
This is a dream solution to the performance issues that can be associated with calling a function from a SQL statement. If the function is executed once for every row returned then the time for the statement to run will be significantly impaired. Adding RESULT_CACHE to the function declaration will allow Oracle to store the return value for a specific set of input parameters. The next time is sees those parameters for that function call it will just return the stored result. (But, see notes below).
It is ideal for the translation of reference data. It can be used to first translate the source fields into a common language of the new system, then again for the functions which turn those common names into codes for the new system.
Points to note…
This Pragma statement allows us to name an exception that we can catch later. This feature is very useful given that we want to translate many columns into a new set of columns. We can store the translation logic in functions that will hide the business logic and make the translation process more transparent.
If the function fails because the original source data cannot be translated then an error can be raised by the function. An exception handler in the translation process can then trap that error and record the issue.
The alternative would be to call the function then check the result for each function call. That could result in a massive IF..ELSIF…ELSIF… etc. to control the flow. Raising and catching exceptions allows for specific error messages to be created and makes the translation process more readable…
FORALL with SAVE EXCEPTIONS
With the records for the new dataset stored in memory structures (collections) we can then write them in single batches into the database. The FORALL will traverse a collection and execute one statement (e.g. INSERT) for all entries in the collection.
We don’t want to stop processing ever time there is an error. We want to record any integrity errors in the issues table. The SAVE EXCEPTION option allows us to handle errors in this way.
Points to note…
There are many other features which cheer me up when working with Oracle. For ETL you might also want to consider MERGE andINSERT ALL. Even a simple WITH statement can reduce code complexity and improve performance enormously.
At some stage I’ll compile a list of favourites through the various versions of Oracle. If you have any favourites I’d love to hear them….
When I began developing Project Shaneera, I had been working in and around the Oracle Database for my entire career and was spoiled by the rich functionality which had grown impressively over the years. Oracle is an expensive option, so I selected MySQL as more cost effective, long term prospect for my application.
I believed that Object Relational Mapping could take place at the database layer.
The logic behind that architecture is this. If we want to minimise network traffic then the more processing that takes place at the database the better. If we are going to communicate in JSON then why not allow the database itself to present and consume JSON.
I find that model appealing for two reasons.
1. The idea of heavy middle tier development in C#, Java or PHP with complex patterns to essentially rebuild the functionality of the database in the middle tier so that we can present SQL to the database fills me with dread.
2. The database currently feels like the most stable part of the architecture. The middle tier technologies have shifted rapidly over the last decade to a point where anything more than two years old “feels” out of date.
The reality of the gulf between Oracle RDBMS and MySQL became apparent almost immediately.
MySQL has plenty of column types that Oracle does not. You can deal with Boolean values, for example, in SQL statements which is great.
The lack of features in the procedural language, such as arrays, default parameter values or the ability to base a variables type on a column has been a source of frustration. Initially, progress has been slower than I expected but there has always been a solution.
Testing is simple for both halves.
There is a very thin PHP layer which completes the text exchange between the two, but does nothing else.
I like the simplicity and clarity of that structure. The database layer is now stable and rarely need any attention.