First comes problems.
Then comes technology.
Then comes solutions.
The Problems
There are two relevant problems that have emerged in recent years. Difficulties in handling overloaded databases and costs associated with maintaining data for retired applications.
These have become problems due to the following factors:
> increased data retention requirements due to regulatory actions
> larger companies being created through mergers and acquisitions
> electronic data being recognized as discoverable by the law
> upgrading first generation legacy applications to newer technologies
> the natural aging of data
For operational databases, the impact has been ever larger databases being online. Studies indicate that 60-80% of data in operational databases is typically inactive and does not need to be in the operational setting for business purposes. However, retention requirements mandate that it be kept somewhere. The presence of this large amount of data in the operational database can cause severe difficulties in managing, tuning, recovering, and changing these databases.
Another, separate problem has emerged regarding data created by applications that stop being used. The application may be replaced with a newer version that is data incompatible with the original application. The application may be retired in favor of a parallel application acquired as a result of a merger. It may be replaced by a packaged application. The common characteristic is that the data created by the original application is not easy or possible to move to the operational database of the replacement application. However, there may exist a requirement to retain the application data for many years (or decades) into the future. The problem that emerges is the cost of maintaining the database, the application, and knowledgeable staff for many years “just in case” the data is needed.
The technology
The solution to both problems above is the same: move the inactive data in a manner that makes it independent of the original system, DBMS, and application programs that created it. This is the essence of database archiving.
What is Database Archiving?
Database archiving is a process of moving the data for inactive business records from the operational databases into a more optimal form for long term retention. This process does two things. It frees up costs of running the operational application and it provides lower cost and more appropriate data management to inactive data. It also ensures that the inactive data will be available and presentable in a correct form if the need arises within its required life span.
What were the first attempts to solve these problems:
Homegrowns.
Most problems get addressed initially with homegrown solutions that stab at the problem. This has certainly occurred in this case. Most IT shops have ventured into homegrown solutions using procedures wrapped around saving image copies of databases, unloading data into LOAD file formats, or copying data to parallel but identically structured databases. Once these artifacts are put away, the inactive data can be deleted from the operational databases.
As with most homegrown attempts at serious problems, the cure is worse than the disease. Homegrowns have typically provided relief from the immediate operational problems but create serious problems with the archived objects created. Most IT shops that are using these forms will have a difficult and expensive problem to solve if a request for the data is made sometime on the future. In many cases, they will be unable to satisfy the request and be “not in compliance” with mandated retention requirements.
Emergence of robust technology.
In recent years the problems of putting large, very large, amounts of data away in a robust archive have been under the microscope and we now see the emergence of best practices and software to give IT managers the tools to enable them to effectively and safely archive inactive data from the large operational systems that we have created.
This collection of technology includes:
Trained database archivists. IT shops need to train or hire a staff that understands the requirements of a sound database archive and the best practices for creating and maintaining one. Few shops have such staff. More need to be trained. Without a competent staff that understands the entire requirements set, there is a high probability that mistake will be made.
Best Practices. There needs to be well thought out methodologies for identifying candidate applications, building business justifications, getting approvals, designing the processes needed, implementing those processes, and managing the archive over time. The state of the art for these methodologies is in early stages but beginning to emerge. A good start is the book “Database Archiving: How to Keep Lots of Data for a Long Time”.
Best practices can often be acquired through hiring consultants trained in database archiving to help in implementing the first one or two applications. This can be a solid learning experience for the IT staff. Unfortunately the number of qualified consultants is limited and the number of those with experience is even smaller. This will change over the next few years as database archiving gets more recognition as a strong technology for solving IT problems.
Software. It is possible to implement a database archiving application entirely in-house. This assumes that trained archivists have completely specified all of the requirements and processes needed. This is sharply different from using homegrown solutions of the past. Not many IT shops are equipped to do this. However, a complete understanding of the requirements of each process would enable those that are brave enough to attempt to do so. The solution to one application would not be transferable to other applications easily.
There have emerged a small number of vendors offering generic software solutions to database archiving problems. Some of these solutions are limited in the scope of their applicability and some have less than robust implementations. However, the technology companies have their sights on database archiving as a next-in-line data management area to address. You can expect more companies, more products, and more solutions that are complete and robust. Every IT shop needs to keep abreast of all offerings in this area and to take advantage of those that are appropriate for their situation. Vendor software can and should be a part of most database archiving solutions.