This document is a report on the state-of-the-art in metadata standards and approaches in Europe. Metadata are widely recognized as a critical component of digital preservation and it is typically the case that within individual cultural heritage organizations numerous different metadata schemes are employed, each of which aims to capture particular aspects of digital objects. KEEP is particularly focused on emulation as a digital preservation strategy and addresses directly dynamic digital objects. Emulation places unique demands on metadata. In addition to holding fairly general information about preserved digital objects (format etc.). It is often possible to determine this kind of information by closely examining stored digital objects. However an emulation approach also requires us to have at our disposal a great deal more detailed information on environments (e.g, creating application, operating system etc.). Information of this sort can to some extent be derived – for example, if a digital object is determined to be an AppleScript application we may reasonably infer that it was produced on an Apple platform rather than on an IBM PC clone. However, such inference is often incomplete and frequently ambiguous leaving us in the position that we cannot say exactly which application created the digital object or know for sure the target platform(s) for which it was originally intended. In order to know which emulator is appropriate or best to run a given digital object, it is precisely this sort of information which is needed. Within that context we need to investigate whether there is a need to develop new preservation approaches to record metadata pertinent to emulation. This document discusses the various digital preservation strategies currently employed, and assesses the extent to which they address the demands imposed by dynamic objects. We examined the role played by ‘environment’ or ‘technical’ metadata in current metadata standards and practice in three national libraries and a computer games museum: Bibliothèque nationale de France (France), Deutsche Nationalbibliothek (Germany), Koninklijke Bibliotheek (Netherlands) and Computerspielemuseum Berlin (Germany). Finally we raise a number of issues that need to be addressed by KEEP (and its successors) in the future. The literature on preservation metadata standards shows that there has been very little effort expended directly in the development and implementation of preservation metadata to support digital preservation strategies based around emulation. Unsurprisingly, given that the main goal of libraries is to provide access to their digital collections with efficient search systems, their primary interest has been in the development of descriptive metadata such as author, title, subject, publication and date. The most cited preservation metadata standard is PREMIS which is a result of years of work of international experts under the OCLC/ RLG working group. The PREMIS data dictionary is a high level definition of metadata schema for preservation purposes. It defines core implementable metadata which should mean that the PREMIS metadata dictionary is not tied to any specific preservation strategy but we found that in practice it supports migration more easily than emulation. It is essential that emulation-based digital preservation strategies develop scalable, interoperable metadata schema which capture enough detail to record core information about objects, and their hardware and software environments. Emulation In the development of a metadata schema for emulation-based digital preservation strategies, the OAIS conceptual model should be able to serve as a reference model to assist scalability and interoperability. Grid computing is currently a favoured approach for web archiving. The ramifications of this for emulation should be borne in mind; Emulation is in its infancy in terms of use by major library / archival institutions; however these bodies are clearly stating an urgent need for this preservation strategy to deal with burgeoning collections of evermore complex and dynamic digital objects; OAIS, METS and PREMIS are standards around which the three national libraries can coalesce, even though each is likely to have their own instantiation. File format recognition software such as PRONOM could play a vital part in any future emulation system by automatically providing technical metadata for a good proportion of complex digital objects, and this could help in the uptake of emulation by libraries who might otherwise find it not sufficiently automated. The games preservation community has thrown up some interesting work. Huth’s model represents the only dedicated and systematic model for game preservation metadata currently available, and further study should be undertaken to properly analyse compatibility with the PREMIS extension being considered as the core metadata structure of KEEP. It is also the only model that aims to specifically include emulation and detailed run-functionality technical data. Concerns are the complexity of the model and the impact of this upon a non-automated ingest procedure. Equally, Huth’s model does raise very starkly the sizeable issue with cross-dependencies and object- extensions/alterations (in the form of patches, commercial extension packs, cracks and mods) that are so common in this medium, and of real importance for the preservationist. Commercial game sites, typified by Gamespot, appear to offer accurate but limited metadata about recent releases and may provide information about new objects being ingested. It may be worth considering farming this data now whilst it is readily available, for later archiving of the objects. In other words, although most of the objects detailed by these sites are currently too complex for robust emulation (although the emulation community have produced emulators for many current platforms, mainly consoles, with all the legal issues surrounding these we might expect), there is no reason why objects could not be ingested for later emulation, in which case, the descriptive data supplied in these sites would become useful. MobyGames is far more practically useful at this stage as, although unsystematic and community-driven, it does engage with older games that are more likely to be suitable for emulation via the KEEP framework. Again, technical metadata is extremely limited, but the wealth of descriptive metadata available does suggest this is a resource not to be overlooked. In particular, supplementary descriptive metadata such as developer credits on the site extend Huth’s model Abandonware sites, although often holding and distributing material of a somewhat fuzzy legal status, and having limited and highly unsystematic metadata structures, nevertheless may offer access to both objects and simple emulation metadata that could be of use to KEEP. The fact that Abandonia, for example, explicitly suggests suitable emulators for objects in its archive should be noted. Further, regardless of the problems in terms of limited metadata and legality, it is community-driven sites like these that have probably done more, in international terms, to preserve computer games than any other preservationists, including national libraries and archives. Even if KEEP requires a greater robustness to its metadata and preservation strategies, we should actively seek dialogue and aim to supplement and enhance the large, if shallow, body of information available through these initiatives. Over that last decade there has been considerable effort expended on defining preservation metadata elements, the overwhelming majority of which have been intended to support migration strategies. While a few attempts have been made to define environment metadata they have insufficient specificity and detail to be used for the emulation framework at the heart of the KEEP project.
31 Jul 2009


