Neon Enterprise Software Blog

Welcome to Neon Enterprise Software Blog Sign in | Join | Help
in Search

Data Management Today by Craig Mullins

News, views, and issues involved in managing data as a valuable corporate asset.

Database Archiving

If you are a regular reader of my other blogs you know that I have written about the topic of database archiving before. But, for the most part, I have written about the topic from a high-level perspective. That is, I've discussed the market forces driving the need to archive, how archiving can improve the performance of operational databases, how legal and regulatory compliance impact database archiving, and even precisely defined the term "database archiving." If you are interested in any of those issues click on the links to reads those entries.

So why blog about database archiving again? Well, there are a lot of issues that I've yet to explore in any depth. This blog entry will start the beginning of a series of entries on database archiving issues and requirements. In this series I will discuss some of the nuances of database archiving that you may never have considered -- that is, until you seriously start to embark upon an archiving project.

With that in mind, consider how you might approach archiving data from your production databases today. Oftentimes organizations believe they are archiving their data for long-term retention, but the “solutions” that are being employed to do so actually are falling short of the mark. This situation typically occurs when insufficient attention is paid to the query requirements against the archived data. Let’s examine some of these “solutions” in a bit more depth.

File and database backups are sometimes viewed as a solution to data archival. The idea proffered is that backups need to be done for recoverability anyway so why not use them for the archive? This is an incorrect assumption for several reasons. Why?

The first problem is that backup copies are not in a format that can be queried. In order to access the data it needs to be recovered back to a database format. Conducting such a recovery to enable queries against data no longer in the database is problematic, though. Do you recover the data directly into the existing operational database? This can cause data integrity problems because the backup data contains not just the data no longer in the database, but perhaps also data currently in the database but in a different state. In other words, although you might put the old data back in the database, you are likely also to corrupt existing data in the database.

OK, maybe you decide to create a "reference" database that looks exactly like the database from which the data was backed up. But will this work? Some DBMSs check internal identifiers to ensure that the recovery is being done to the same database objects. A new reference database, even if it looks exactly the same as the operational database, will be different -- at least in terms of how the DBMS views it. You may be able to manipulate the database structures or deploy utilities that operate at a lower level to get the recovery to work, but it will require additional work and effort to make it happen.

And what if you need to recover several backups? Do you create multiple reference databases or try to recover them all to a single reference database. Of course, the second option sounds better, doesn't it? But it might not be feasible. What if the database schema changed over the course of those backups? Maybe you added a column or changed a data type. Well, then, that recovery will most likely fail because the schema does not match.

Of course, you could use an unload utility to unload the data from the image copy backup (if that technology is available to you). But then you would have to go through that data to determine what is and what isn’t needed before you load it into the production database. And you still would have to jump the hurdle of dealing database schema changes since the backup was made. So you will have to consider that when re-loading the data from the backup. And is the data actually “the same” if it is loaded into a different schema definition? It might seem to be, but technically it is not – and that might be a problem from a legal perspective.

And think about the work involved when you run into an audit situation, where you are asked to produce transaction records spanning multiple years. This requires the very labor-intensive, manual process of slogging through the data and matching it against every database change that was made during the timeframe in question (which could be several decades). It might be (somewhat) feasible for time spans of 2 to 3 years perhaps, but not for archived data that spans 10, 20 or more years.

So there are many reasons why backups are not exactly useful for archival purposes. A similar approach to using backups is to use UNLOAD files. But, once again, this has all of the same problems as using image copy backups, with the additional problem that the UNLOAD files have to be created in addition to the backups.

Another approach might be to create a “reference” copy of the operational database to house the data. But, of course, we still have to deal with the issue of changing database schemata. Furthermore, if the data is maintained in a traditional database system then it is not protected from modification by database language commands such as SQL INSERT, UPDATE, and DELETE statements. And even if you use database security commands to prohibit modification this will not stop super users (e.g. SYSADM) from changing data.

Sometimes organizations choose just to leave the data resident in the operational database. This has all of the problems of the “reference” database approach, along with the additional problem of negatively impacting the performance of production applications and queries against the operational database.

These “solutions” are really no solution to the problem of archiving data for long-term retention. None of them are really useful in today’s environment. Indeed, your long-term data retention plan needs to account for the following:

  • Archival of logically selected data instead of entire files or database structures
  • Policy-based archival rules that automatically trigger data retention based on business and regulatory mandates
  • Deletion of the data from the operational database when it is archived
  • The ability to query directly from the archive to avoid the problems of rebuilding data to support audit requests

I'll discuss the items in this list in more detail in future blog posts. And keep in mind that NEON Enterprise Software will be releasing our solution for database archiving that tackles all of the issues I'm discussing in my blog.

Published Tuesday, January 30, 2007 1:33 PM by cmullins
Filed under:

Comments

 

Data Management Today by Craig Mullins said:

As we continue this series of blog postings on database archiving we need to spend some time discussing

February 19, 2007 9:35 AM
 

Data Management Today by Craig Mullins said:

OK, for the past month or two I've been blogging about database archiving for long-term data retention

February 27, 2007 3:48 PM
 

Data Management Today by Craig Mullins said:

I have been blogging about database archiving and the need to store data for long periods of time. The

March 12, 2007 11:29 AM
 

Data Management Today by Craig Mullins said:

We all read about regulatory compliance and how it impacts IT and data management, but e-discovery will

March 21, 2007 10:38 AM
 

Data Management Today by Craig Mullins said:

According to several analysts I've spoken to recently, database archiving is growing at a compound

March 26, 2007 2:59 PM
 

Data Management Today by Craig Mullins said:

Those of you who've been reading my blog for awhile know that I've been talking about the trends

October 8, 2007 2:14 PM
 

Data Management Today by Craig Mullins said:

I've written frequently on the need to archive your database data for long-term retention . Topics

January 14, 2008 10:16 AM
 

Data Management Today by Craig Mullins said:

Organizations are generating and keeping more data now than at any previous time in history. This is

May 13, 2008 1:14 PM
 

Data Management Today by Craig Mullins said:

Operational databases are growing in size for many reasons. There is the overarching trend of more and

August 8, 2008 1:58 PM
Anonymous comments are disabled

About cmullins

Craig S. Mullins is a data management strategist for NEON Enterprise Software, Inc.. Craig has extensive experience in the field of database management having worked as an application developer, a DBA, and an instructor with multiple database management systems, including working with with DB2 for z/OS since Version 1. Craig is also an IBM gold consultant and is the author of two books: "DB2 Developer’s Guide" and "Database Administration: Practices and Procedures."
Powered by Community Server, by Telligent Systems