Threats to data integrity from use of large-scale management environments

Technical Report
Matthew Addis
Not Applicable
Compression, Storage services

This Technical Report is a product of the PrestoPRIME Project (D3.2.1.), the major project on digital preservation in the audiovisual sector in Europe.

Maintaining data integrity when using IT infrastructure for the long-term storage of audiovisual files is a major challenge. What evidence is there that mass storage technology isn’t ‘safe’ for digital preservation of audiovisual content, in particular at the scale of Europe’s AV archives?

Mass storage technology from the IT Industry simply doesn’t have the levels of reliability needed for long-term preservation of large audiovisual data files. Ways in which loss can occur are manifold, hard to predict, and most worrying can take place silently, even in storage systems explicitly designed to prevent data loss.

A meaningful strategy for assessing the threats to data preservation from the use of IT storage technology has to consider the risk of loss, the cost of mitigating this risk, and the benefits of doing so. We call this a ‘cost of risk of loss’ approach.

Maintaining integrity of digital audiovisual assets is a proactive activity and has to be supported by appropriate corruption detection tools, a quality control process, and a knowledge base of what can go wrong, how likely this is, and what to do about it.

The trend towards ever higher capacity of media (tapes, hard disks) for the same cost is very attractive to archives for obvious reasons. However, increase in reliability is not keeping pace with this increase in capacity, which has the result of making mass storage cheaper, but also more likely to cause large scale data loss.
Strategies for data distribution across storage media or systems need to evolve to ensure that the ways in which loss might occur are kept within acceptable limits. Simple strategies that work today, e.g. direct data tape replication to create two copies may not be so applicable in the next decade or beyond.