Sometimes organizations believe they are archiving their data for long-term retention, but the “solutions” that are being employed to do so actually fall well short of the mark. This situation typically manifests itself when data authenticity and accuracy requirements are ignored.
Why do we archive data from our databases? In general, it is to comply with a governmental regulation. So we archive data because there are laws that require us to retain data for long periods of time. Different types and categories of data have different mandated legal retention requirements. But generally, data is retained because there may be reasons for it to be accessed in the future. What reasons?
Well, many times it is to support legal actions. Perhaps your company is being sued for some reason, so you need to produce data that confirms and/or shows your behavior. Maybe you will be asked to produce product manufacturing components and details to support a product liability case; or human resource information to support an employee hire or dismissal action. When the time comes to litigate, your company will be asked to produce data for discovery. The Federal Rules of Civil Procedure stipulate how and how quickly information must be provided for discovery. The thing to keep in mind here is that most discovery requests these days are e-discovery requests - that is, you will be asked to provide electronic data.
Now it is important that the data you provide for discovery is authentic. It must accurately represent the business activities and transactions as they occurred. The lawyers will ask questions like:
- "Does this data accurately represent what happened?"
- "Is there any way at all that this data could have been changed since the time of the event in question?"
- "If there is, how can the court be sure it has not changed?"
- "You say there is no way it could change, but how can you be sure? Can you prove that, and if so, how?"
If you store your archived data in a database format, data authenticity is suspect. For a relational DBMS, the SYSADM can always access and change data. And it can be difficult to prove that such changes did not occur.
Your database archive solution must protect the authenticity of the data it archives. When data is moved to the archive it must be protected against all change. There should be no insert or update capability for archived data. And it should be protected with digital signature and checksum algorithms to detect surreptitious access (ie. hacking). Furthermore, recovery from a backup (also digitally signed and protected) should be available to recover from such a data breach. Without this level of protection it is simply not possible to declare (legally) that your data has not been corrupted.