You are here

Matthew Addis's picture
Matthew Addis

Tales From the Attic

After rummaging in my attic I resurrected my old Garrard record deck, much to the bemusement of my kids.  They thought it crazy that to play music I had to cue up a big black disc of plastic and drop a needle on the top!  I thought it was fantastic - especially the Pink Floyd records I hadn't played for years. It worked first time. It sounded great. I didn't mind the odd pop and crackle.  A testament to how 'analogue' media and simple engineering can stand the test of time.

I then started to wonder what my children would be bringing down from their attics when they are my age. This reminded me of a story in the UK of a man who'd found the UK's oldest working Seagate hard disc drive (shown below) in his attic. Nearly 30 years old and it too worked first time. A good sign for the future or a false hope?  More on that in a minute.

But first take a closer look at that hard drive.  The storage capacity is a whopping 10MB and it cost about £260 at the time, which according to a handy little inflation calculator is nearer £700 in today's money.  Today I can go to the local electronics store and buy a 1TB drive for less than £50.  That's 1 million times more capacity for about the same money.  Or to put it another way, cost per TB has halved every 18 months for the last 30 years.
What if I wind forward another 30 years?  If the trend continues, I'll have 1 exabyte of data on a single device.  What does an exabyte mean?  Well a mere Petabyte (PB) is a stack of fully loaded 16GB iPads twice the height of the Empire state building.  An Exabyte (EB) is 1000 times taller than that.  Another way of looking at it is that in a machine the size of my record deck I'll be able to store and play the entire audiovisual record of the 20th century (10 x 1 EB drives loaded with content with an average bit rate of 100MBit/sec = 200M hours, the approximate amount of our collective audiovisual record). From a handful of vinyl records to 200 million hours of AV content in a box small enough to carry - in just one lifetime.  That's pretty amazing.  Strikingly it also means that storage becomes effectively free.  Wind on another few years and you’ll be able to store the entire current contents of today’s AV archives for the cost of a memory stick.  There's loads more to say about this, including the economics of  'forever cost' models and what the true long-term total cost of ownership of storage and access really is - and why it's a whole lot more than just the media - but that's all for a future post!
Can the march of technology really take us this far?  Indications of future capacity based on past performance look good.  It's easy to look at trends like these and think that we can continue to ride the wave of ever increasing storage capacity.  But there are some other trends that are less well publicised.  Over the last 15 years, hard drive capacity has increased by a 1000 times, but the rate we can fill a drive hasn't kept up - not even close - the speed increase has only been a factor of 30.  It would have taken about a minute to fill up that old Seagate drive.  Today it takes the best part of a day to fill a 2TB SATA hard drive.  In 30 years time, if capacity and performance continues to improve at the same rates as they have been, I'll be able to buy a 1EB hard drive from my local store, but it will take over a year to fill it up!  The other trend worth looking at is error rates.  All hard drives have what is termed a 'Bit Error Rate' (BER).  This is the chance that the drive will have a problem reading the data on it and won't be able to return the correct value.  A modern hard drive has a BER of 1 in 1014. That's amazingly small, but there are about 10^13 bits in a TB, so the chance of seeing these errors is no longer negligible.  BER has only improved 10 fold over the last 10 years.  At this rate, my 1 Exabyte drive will give me 100s of errors in the data every time I read it all back - if I had the time to read it all back.  Storage media has always had errors of course, with error correction being the norm and built into HDD, data tape, DVDs and pretty much every form of digital storage.  This error correction toils away tirelessly behind the scenes to correct a multitude of problems so we hardly ever see them.  But as error rates and performance fail to improve as fast as capacity, error correction becomes ever more complex and relatively more errors will inevitably slip through the net - or even be created by bugs in the very systems designed to prevent them.
I said I'd come back to that hard drive from the attic.  Is it a miracle that it still worked?  To some extent yes, but by today's standards it's a relatively simple and robust bit of engineering.  Not so for modern hard drives.  As a colleague of mine said in the storage industry, 'hard drives are designed to be on the edge of not working'.  Perfectly understandable given the competitive nature of the industry and that hard drives today are little more than disposable items not designed to be used for much more than a few years.  Don't get me wrong, hard drives are amazing things that work on tolerances you wouldn't think possible, but if they are on the edge of not working now, what's the chance of them working in 30 years time?  Near zero.  Almost as low as the chance of having anything that you could plug the drive into to read it anyway. 
As Jim Lindner said to me once, 'storage is a solved problem', and it is, if you know there is a problem.  And if you know what the solution is.  And if you can afford that solution.  There's plenty of help out there including from PrestoCentre.  David Rosenthal's blog is a great round up of the challenges of digital preservation using IT storage, which leads nicely to his work on the 'Lots of Copies Keeps Stuff Safe' (LOCKSS) solution, with the big question being ‘How few copies?’.  But there are many other approaches too, that include printing bits to film or the use of more reliable digital media, in addition to the use of hard drives or data tapes.  All have their pros and cons and I presented a way of comparing these at the PrestoCentre Screening the Future Conference earlier this year using a 'cost of risk of loss' approach we've developed in PrestoPRIME.
What seems clear to me is that the traditional approaches of archiving media on shelves transferred to digital media, be it data tapes, hard drives or optical discs, will really struggle to remain viable - especially with such a large volume of material.  The use of automated systems that monitor and manage the replication, integrity and migration of media will need to become the norm.  Multiple copies in multiple places held using multiple technologies - all in automated systems.  That's the only way we'll be able to do digital preservation at this scale.  But it will be a hungry beast.  It will need constant attention from skilled professionals and it will need to be fed with power, media and parts.  And if it isn’t fed or it isn’t cared for properly, then that’s when the content will be lost – it’s not something that can be left alone in the hope that ‘it will probably be alright’.  The larger archives have the resources to properly care for such preservation machines, but smaller archives will surely lack the budget or expertise and will be worse off as a result - but that's a subject for another post too.
We will surely continue to be seduced by ever increasing storage capacity.  In the coming years we'll move HD to 4k to SuperHiViz.  We'll go from 2D to stereoscopic 3D to holographic 3D and onto who knows what else.  All of this will continue to gobble up storage no matter how much capacity is available.  Those producing and consuming this content will pay scant attention to the consequences of storing ever more vast amounts of data in our audiovisual archives.  And for those archives it will be an ever harder job of keeping it safe.  It's not impossible, but those tales from the attic of technology that appears to last forever really don't help.