When archiving data from a database it is imperative that metadata be captured and archived along with the actual data. Indeed, analysts at Gartner corroborate this assertion saying “Metadata must be preserved when files are requested for evidentiary purposes.” But there are two types of archive metadata. The metadata defining the archived data itself is needed in order to enable access to the data. The archive must be able to store multiple versions of this metadata for the same “logical” archived object. As the operational schema changes the archive metadata changes, but it is the same “logical” object. Because data cannot change once it is stored in the archive, the metadata must remain unchanged for all prior versions of the archived data.
Furthermore, any query mechanism used to access archived data must be able to handle the differences in metadata across the like “logical” business objects. In other words, you need to be abled to query across metadata breaks -- changes in the schema -- when querying your archived data. Without this capability querying archived data become unwieldy at best, and impossible in many cases depending on the timeframe required to access that data. For example, if you are charged with producing archived data during the discovery phase of a trial you do not have an unlimited amount of time to comply.
The second type of metadata is the archive metadata that controls when the data is archived, which data is archived, from where, and any transformations that occur. This is the metadata that drives and defines the archive itself. Both types of metadata are needed for the archive to operate.
Taking all of these considerations into account, then, a secure, durable archive data store must be used to retain data that is no longer needed for operational purposes, and it must enable query retrieval of the archived data in a meaningful format until it is discarded.
Basically, without metadata, an archive is just a big box of bits. And that has no practical use whatsoever...