by Julie Hardesty, Metadata Analyst, Digital Collection Services, Indiana University Libraries
MDPI’s charge to digitally preserve and provide access to all significant audio and video recordings on all IU campuses by IU’s Bicentennial in 2020 is only possible with metadata. Metadata, or the information accompanying the audio and video recordings, allows us to know basic details about each recording so that as you search or browse for content on the Internet you can find that IU audio or video recording relevant to your research.
Using metadata we can present detailed information so you’ll know the content of an item without needing to listen to, for example, 32 hours of field recordings. This information also tells us things like how an item was digitized and what preservation activities have occurred to ensure the original digital files are still in good condition and usable. We can use metadata to know how openly we can provide access to MDPI items and how we can group similar items together for access and sharing – by subject, type of recording, date, or other useful categories. Metadata is essential to navigating a collection of hundreds of thousands of items and makes it possible for you to take it all in and understand what we have to offer.
The digital package that represents a single physical item digitized in MDPI can be made up of multiple digital files and the metadata available for that item can also be made up of multiple sources of information.
Here are the main sources of MDPI metadata:
If the item is already cataloged in Indiana University’s online library catalog (IUCAT), that is fantastic news. We are then able to make use of descriptive information which has been crafted and supplied by a trained cataloger.
We also have the POD, our Physical Object Database. The POD is what we use to track the physical objects that have been selected for digitization through the digitization process. There are a limited number of fields for descriptive information that may be completed using what is written on the recording. This provides a backup descriptive source if there is no library catalog record for the item.
The digitization process itself produces metadata about how the item was digitized and may provide more accurate physical information than what we can learn from the pre-digitization visual inspection.
Finally, when we receive digital files, we use software (ffprobe, a component of FFmpeg) to identify and extract technical information from each file, so we know what kind of file it is along with its size and other details that help us store and use the file. Additionally, the header of any audio or video file can contain metadata. We embed unit, collection, and identifier information that can be extracted to help us connect the digital file to the other metadata files already mentioned, in case they are separated.
All of these sources of metadata produce files with information encoded in XML, an open standard machine- and human-readable markup language that allows us to use all of this metadata in various ways.
This post is the first in a series discussing metadata in MDPI. Metadata truly makes everything possible – storage, access, preservation, discovery. We hope to provide useful and more specific information in future posts on the following topics as they relate to MDPI metadata:
- Tracking physical items sent out for digitization
- Storing digitized items long term
- Discovery and access of digitized items
- Technical metadata and digital provenance
Metadata is both essential and complicated but hopefully these posts will help explain the work involved in MDPI around metadata.