by Mike Casey, Director of Technical Operations, Audio/Video, Media Digitization and Preservation Initiative, Indiana University
At the ARSC conference a few weeks ago, I explored the quality control system we developed for the audio/video side of MDPI, including the different types of QC we use. At the highest level, we recognize two basic QC types: automated (machine-based) and manual (human-intensive). Automated QC is machine-based and carried out by software tools, while human-intensive QC is a manual process that relies on the human senses of sight and sound as well as on our capacity for logic and reasoning. These may seem like very different strategies, but in truth, the two types of QC make use of each other. For example, human cognition is necessary to interpret the software output of machine-based QC while machines must render the digital files that are analyzed using sight and sound.
Automated machine-based QC routines may use commercial, open source, and/or homegrown applications. For MDPI, we developed our own scripts with which we analyze 100% of the files produced by the project. This is part of the MDPI post-processing system, and it includes checks in the following areas, among others:
- presence/absence of digital provenance metadata
- presence/absence of specific embedded metadata
- directory and file names
- presence of expected file types (preservation master, production master, etc.) for the format digitized
- format and wrapper
- file extension
- audio stream count
- sample rate, bit depth, codec name, frame rate, pixel format
- duration across streams and across file types
Human-intensive QC, on the other hand, features an MDPI staff member listening to and/or viewing digital files to judge the accuracy of characteristics that are typically not assessed well by machines. For example, open reel audio tapes recorded in the field occasionally exhibit problems like reversed audio or changes in speed. These issues cannot be accurately discovered by machine. Nor can the problem of mixed up audio/video content and its corresponding metadata that obviously and mistakenly refer to different recordings.
Project resources allow us to undertake human-intensive QC for approximately 10% of the recordings digitized by MDPI. To maximize our resources we employ the following strategies, all of which we define as different types of human-intensive QC:
- Value-based QC—directing more QC resources to formats, collections or recordings that are considered of higher value and fewer to those deemed less valuable. For example, the project digitized a limited number of commercial LPs and 45s. IU curators told us that, while these were valuable enough to digitize, they were significantly less valuable than other formats. In their estimation, directing fewer QC resources to these formats in order to make available resources for more in-depth checking of valuable formats was worth the risk.
- Risk-based QC—analyzing digitization workflows and digitized formats to identify where risk is greater and directing more resources to those areas. For example, we define the time period in which something new is started or something is changed as carrying greater risk. Therefore, when the digitizing operation begins a new format, hires a new person, or begins using a new machine, the QC operation will allocate additional QC resources for a specific period of time to mitigate the risk.
It is also useful to analyze a digitization workflow to define workflow steps or procedures that may carry greater risk. The azimuth adjustment step in an open reel audio tape or audiocassette workflow, for example, is somewhat subjective and relies upon the judgement and accuracy of the operator. The only way to check this step is to compare digital files to playback of the digitized recording. This prompted us to implement another type of quality control that we call Direct QC in order to assess this variable. Direct QC compares part of the digitized recording directly to the corresponding part of the digital file that was created during digitization to assess the accuracy of the azimuth step as well as other workflow steps and choices.
There is much more to say about quality control for media digitization operations. Look for future posts about the MDPI QC operation.