by Patrick Feaster, Media Preservation Specialist, Media Digitization and Preservation Initiative, Indiana University
If you spend much time exploring Google Books, you’ve surely seen your fair share of anomalies: stray images of workers’ hands, motion-blurred pages caught in the act of being turned, that sort of thing. There’s a whole community of enthusiasts out there dedicated to finding, sharing, and enjoying these quirks, its most famous expression being a Tumblr blog called The Art of Google Books (“TAGB”). Here at MDPI, of course, we’re digitizing time-based audiovisual media rather than books, but some useful analogies still exist between our work and the curiosities TAGB likes to collect.
Some TAGB finds are digitization errors, pure and simple: pages obviously aren’t supposed to be turned in mid-scan (or scanned in mid-turn), and workers’ hands are supposed to be kept discreetly out of sight. Others, such as fold-out plates imaged while still folded, seem to reflect conscious policy—in that case, Google apparently instructs its workers not to unfold plates for scanning, which doesn’t produce a fully satisfactory digital surrogate but might still serve the limited goal of making books text-searchable.
We at MDPI spend much of our time and energy ensuring that the digital files we create don’t have problems like these. Our QC program is designed to catch equivalents to pages scanned in mid-turn, while our digitization policies aim to capture all content optimally—including equivalents to fold-out plates.
But some other TAGB finds are the natural result of photographing the pages of a book with particular characteristics, and it’s this second kind of peculiarity—the “good” kind, if you will—that I’d like to focus on here. The exhibits shown below aren’t mistakes, nor do they seem to be the result of intentionally cutting corners in the interests of efficiency. They’re just part of the nature of the beast itself: not something to be avoided during digitization, but something to be understood, or misunderstood, during use.
What we’re seeing in these three cases might seem obvious, but let’s spell it out anyway:
- An engraved portrait of Abraham Lincoln, protected by semi-opaque tissue paper, facing a title page which the ink from the engraving has discolored over time, creating a mirror image of it.
- Two pages of different sizes bound together, causing parts of both to be visible simultaneously.
- A library circulation slip—not originally part of the book, but later attached to it—stamped with the date due each time the book was checked out.
If these points seem too obvious to need explanation, that’s because we know from long experience how a book “works,” physically speaking: how it opens to display a pair of pages side by side, how it circulates in the context of a lending library, how new things can be pasted into or stamped onto it, and so on. But what if we didn’t know these things? Would our ability to use these digitized sources in research or teaching be diminished?
People have been predicting the demise of the traditional book for years—for just one fun example, see Octave Uzanne’s “The End of Books” (1894). Somehow they’re still with us. Still, it’s not too hard to imagine a future—say a hundred years hence—in which the average person has never handled a “real” book, and yet still has access to digital facsimiles of books through Google (or some distant corporate heir). What’s still obvious to us now might not be obvious to readers then. Would they recognize what’s going on with the portrait of Abraham Lincoln, or with the differently-sized pages, or with the library circulation slip? For that matter, would they even understand what a book cover is, or why page numbers sometimes appear alternately in the left and right corners? Maybe, maybe not—and it might or might not be important, depending on exactly what people are trying to study or teach.
But what might be true of books then is already true of many legacy audiovisual formats now, and we expect it eventually to become true of all the formats on MDPI’s plate. Today, the average person has certainly never played a phonograph cylinder or projected an eight-millimeter film. Can we make similar statements about audiocassettes and VHS tapes? Probably—but even if we can’t now, we’ll be able to soon enough. My point is that we’re pretty sure most future beneficiaries of MDPI’s digital files won’t have had first-hand experience using the source formats. It’s true that researchers, educators, and students won’t always need to understand those formats in order to make effective use what they’re seeing or hearing. But sometimes they will, just as they’ll sometimes need to know something about books to make sense of the views Google Books throws at them.
Obsolescence, then, doesn’t only threaten the survival of content itself. It can also threaten people’s ability to make intelligent use of content after it’s been safely digitized—to know, among other things, how increasingly unfamiliar technologies are likely to have shaped content through the distinctive mechanisms for recording, editing, duplication, dissemination, presentation, and navigation they offered (or didn’t offer).
The threat obsolescence poses to the survival of content in legacy audio and video formats comes with a looming deadline: experts estimate that we have ten to fifteen years to preserve what we can through digitization. By contrast, the challenge obsolescence poses to intelligibility will last for as long as the content itself does.
But it’s hard to describe that challenge in the abstract, so with this post we’re launching a new series to highlight specific examples of content we’ve digitized that require some media-archaeological background to understand or appreciate. We hope you’ll find them interesting and informative. Stay tuned.