Media Digitization and Preservation Initiative
Dennis Cromwell, Executive Director
Adam Nickel, Processing and QC Specialist
Patrick Feaster, Media Preservation Specialist
Mike Casey, Director of Technical Operations, Audio/Video
Brian Wheeler, Senior Systems Engineer, IU Libraries
Over the past four years, Indiana University’s Media Digitization and Preservation Initiative has digitized more than 315,000 audio and video recordings. In the meantime, we’ve heard from many other folks who are planning digitization projects that predicting file size and recording length is difficult. Of course, every collection is different, and nobody else will have exactly the same mix of holdings as we do at IU. Still, the general statistics we’ve gathered from the 315,000+ items we’ve reformatted to date may help provide others with at least some rough guidance for project planning.
The winner in the duration contest for video formats is VHS with an average running time of just about 100 minutes. This is not a surprise given that VHS is a fully consumer format. Consumers place higher value on the length of time available for recording than other characteristics, and so the format was developed with an emphasis on maximizing this variable. That’s why the EP (extended play) version of the format—that triples the possible recording time while reducing quality—was introduced. Over 3,000 of our VHS tapes are EP. One practical implication of this fact is that we must acquire and integrate consumer-level playback machines into our digitization signal chain, since professional decks do not support EP.
It may also be no surprise that the archenemy of VHS—Betamax—is close behind with an average duration of 94 minutes. Betamax is also a consumer format.
The more professional video formats – such as 1-inch, Betacam, and U-matic – tend to have shorter durations, in our experience.
On the audio side, Digital Audio Tape (DAT) is the clear winner with an average duration of 117 minutes. 120 minutes was a common length for the format. The majority of our DATs are recordings of concerts and recitals from the Jacobs School of Music. These events typically lasted on the order of two hours.
These durations, of course, impact the size of the digital files that are created. Tables 3 and 4 below list average file sizes.
In the tables above, the term ‘Package’ refers to the preservation package placed in long-term storage. It includes the preservation master file, all derivatives, metadata documents, and other items. In these tables, ‘Pres’ stands for preservation master file while ‘Prod’ is a production master file. Note that a mezzanine file (video) and a production master (audio) are the same in terms of role or function within the archive.
When planning for storage services, video is all important here, as file sizes are much larger than audio. For VHS, we expect to see an average preservation master file size of around 70 GB. MDPI uses the FFv1 format wrapped in Matroska for video digitization. Other formats, such as uncompressed (using the v210 codec) or JPEG 2000, would yield different numbers.
The smallest average video preservation master size is 2.7 GB for the DVD format. The preservation master format in this case is an ISO disk image.
It’s worth bearing in mind that the averages shown above are mean averages, calculated as the total duration or size divided by the number of objects. This type of average may be the most useful one for large-scale project planning, but it may not provide the best insight into the “typical” example of a given format because it’s liable to be skewed by outliers at either end of the value range. The median length was also evaluated for comparison. For most of the formats, the median and mean were fairly close (10%). However, several video formats demonstrated larger differences. For example, the 100-minute average for VHS reflects the fact that our holdings include a number of EP tapes with unusually long durations, as noted earlier. In the case of VHS, the median duration turns out to be just 63.7 minutes. So if you were going to bet on the duration of any one randomly chosen VHS tape at IU, your best guess would be 63.7 minutes. You can see that the 7% of IU’s VHS tapes that were EP significantly skewed the numbers.
But the duration of playback isn’t the only variable that has a bearing on how much time and effort will be needed to complete a reformatting project. Some carriers are more challenging and more time-consuming to deal with than others.
In the case of MDPI, Memnon Archiving Services, a Sony Company, transfers the bulk of our items in an onsite facility using (mostly, but not exclusively) parallel transfer workflows where one technician digitizes multiple recordings simultaneously. But some items aren’t a good fit for parallel transfers and instead need individualized attention. Take an open reel audio tape with speed changes, for example. Field researchers will sometimes change recording speeds to conserve tape while working in remote areas. The IU record is 34 speed changes on a single reel of tape! Items that fail parallel transfer, or that are known from the start to be fragile or problematic, are sent to IU Media Digitization Studios, where staff can take the time to make one-on-one transfers. The table below provides a breakdown of the formats handled respectively by Memnon and IUMDS.
The next two tables show the percentage of recordings that were initially routed to a parallel transfer workflow but were then kicked out of it. In other words, these recordings required more than one straightforward attempt for digitization to be successful. Some intervention was required after the first digitization try. Typical interventions include tape repair, playback machine reconfiguration, stopping and starting playback any number of times to fix problems, baking more than one time, rehousing into a new shell, resolving disc tracking problems by hand, etc.
The relatively high failure rate of open reel audio tape can be attributed to the fact that our holdings in this format include large numbers of field recordings. Field collectors were more likely to change speeds and even track configurations within one tape than other users of the format.
The major causes of audiocassette failures include tapes breaking during transfer, transport issues where the tape will not move or play consistently, and tapes recorded at half speed or double the usual speed of 1.875 ips.
U-Matic video tapes are particularly challenging to digitize at this time due to both degradation and obsolescence issues. They have a high rate of mechanical and tape-related failures.
VHS failed for a number of reasons. Some changed speed in the middle—SP to EP or LP, for example, and sometimes back again. Others were unexpectedly PAL or SECAM. There were also a range of tape-related issues that required multiple cleaning passes or more than one baking session.
We will continue exploring our data in posts to come, one of which is tentatively titled “Life at the Extremes, or The Long and the Short of It.” We will also welcome guest posts from two other institutions in which they report on their digitization statistics. Will their findings be similar to ours or different? We don’t know yet and are very curious. Stay tuned!