Neon Enterprise Software Blog

Welcome to Neon Enterprise Software Blog Sign in | Join | Help
in Search

Data Management Today by Craig Mullins

News, views, and issues involved in managing data as a valuable corporate asset.

The Growth of Database Archiving

According to several analysts I've spoken to recently, database archiving is growing at a compound annual rate of around 35 percent. But why? What is compelling organizations to archive their data?

Fundamentally, there are two reasons why data would ever need to be archived out of an operational database. One reason is to improve the efficiency of the operational environment. This reason has existed for years, but DBAs have not necessarily attacked the problem with sufficient vigor. Let's address the issue first - and then I'll tackle the reasons for the historical lack of attention to the matter.

Operational Performance

We all know that our operational databases are growing in size; some analysts put the rate as high as 125 percent each year. Of course, your rate will vary, but I doubt many of you would dispute that you are managing larger databases today than you were two years ago, right? So what? Well, as data volumes expand, it impacts operational databases in two ways:

1. additional data stresses transaction processing by slows things down, and;

2. database administration tasks are negatively impacted.

The more data that is maintained in the operational databases, the less efficient transactions will be as they access those databases. Table scans must reference more pages of data to return a result. Indexes grow in size to support larger data volumes, causing access by the index to degrade because there are more levels to traverse to return an answer. Such performance impacts are causing many companies to seek solutions that offload older data from the operational database; that is, to archive the data.

Additionally, database administration tasks will run longer as more data is stored in the database. This causes longer processing time and outages to perform functions such as image copies, unloads, reorganizations, recoveries, and disaster recoveries. In many cases the lengthened outages due to DBA tasks become unacceptable over time, causing organizations to seek ways to remove data from the operational databases and thereby optimize these tasks. Once again, this means archiving database data.

Regulatory Compliance

Operational performance is only one force driving the need to archive database data. Indeed, performance and administration issues are ancillary to regulatory compliance issues. Although both are driving the need to move data from the operational database into an archive data store, it is the legal requirements that have the biggest impact in terms of data volume expansion. The growing number of regulations and the need for organizations to be in compliance is driving data retention and extending the length of time that data must be retained. Regulations such as the Sarbanes-Oxley Act, HIPAA and BASEL II are some of the laws governing how long data must be retained. But this is just the tip of the iceberg. Industry analysts have estimated that there are over 150 federal and state laws that dictate how long data must be retained.

Many of these laws greatly expand the duration over which data must be retained. Until recently, most organizations dealt with mandatory retention periods of only a few years for important business data. And this data was kept around longer mostly to server business purposes and not for legal requirements. But the situation has changed due to the bevy of new regulations at the federal, state, and local levels. Depending on the industry, what was once five or seven year retention periods is now expanding to 20, 30, even 70 years. Today, retention periods are determined almost exclusively by government regulations and not from business needs.

To comply with these laws corporations must re-evaluate their established methods and policies for managing and retaining data. What worked in the past to retain data for a few years will no longer be sufficient over a much longer period.

Of course, the exact legal requirements for data retention will vary for each organization based on its business and location. Indeed, data retention requirements will vary from business subject matter to business subject matter. For example, a hospital will have different data retention requirements for its patient records than it will for its employee safety data. The only overarching truism that can be stated is that more and more data is mandated to be retained for longer and longer durations. As such, businesses will need to become more adept at categorizing data to accurately grade it for its mandated retention period. And then businesses must be capable of retaining and accessing that data in accordance with the appropriate regulations, as required.

Historical Lack of Attention

So why has database archiving been ignored by most organizations? The reasons are many. First of all, keeping all of the data in the operational database is easier for DBAs. In the past, this was known as ignoring purge requirements: that is, once the data gets into the database there is no procedure for removing it. Today, many DBAs still conflate purging with archiving - but the two are distinctly different tasks. Purging is removing data from the operational databases and throwing it away... never to be accessed again. Archiving is removing data from the operational database and storing it in a data store designed for long-term data retention so that the data can be retrieved later if needed.

Perhaps foremost among the reasons that database archiving has been neglected is the dearth of solutions for solving the problem. Oh, you could purge data using REORG utilities. Or maybe you could cobble together a home grown program to sift through data and "archive" it, but usually not efficiently or appropriately enough to match the business requirements.

So, when faced with growing databases, DBAs did the expedient thing -- they threw hardware at the problem -- more storage, more memory, more processors. This masked the situation and kept the databases and applications running. But the landscape has changed. Additional hardware cannot ensure that your company is in compliance with data retention regulations!

The final reason for neglect is that the regulatory compliance aspect of database archiving is a somewhat recent phenomenon. The corporate accounting scandals of the past few years (Enron, etc.) have caused an onslaught of new laws to be written. And older laws that have been on the books are being enforced more rigorously than in the past. Basically, government regulations are being adopted to ensure that corporations are "doing the right thing" with their data.

The Bottom Line

The point I'm driving toward with this long-ish post is that database archiving is fast becoming a requirement for most organizations. Analysts at the Enterprise Strategy Group estimate that "organizations will archive over 8,000 Petabytes of database information in the next 5 years." ("Database Archiving: A simple approach to Intelligent Information Management with tangible benefits", Brian Babineau, May 2006).

Now that is a heckuva lot of database data that needs to be archived! So the spotlight is shining on the increasing need to archive database data... is your organization prepared?

Published Monday, March 26, 2007 2:16 PM by cmullins
Filed under:

Comments

 

Data Management Today by Craig Mullins said:

Operational databases are growing in size for many reasons. There is the overarching trend of more and

August 8, 2008 1:58 PM
Anonymous comments are disabled

About cmullins

Craig S. Mullins is a data management strategist for NEON Enterprise Software, Inc.. Craig has extensive experience in the field of database management having worked as an application developer, a DBA, and an instructor with multiple database management systems, including working with with DB2 for z/OS since Version 1. Craig is also an IBM gold consultant and is the author of two books: "DB2 Developer’s Guide" and "Database Administration: Practices and Procedures."
Powered by Community Server, by Telligent Systems