As I mentioned last time, for one of my classes in library school, I am learning about digital curation. We’ve been learning both about the theories and models behind digital curation in addition to getting some hands-on practice with some of the computer programs and tools used in digital curation. So the question arises, how do the theory and practice mesh?
One of the theoretical frameworks used in digital curation is the OAIS (Open Archival Information System) model. One of the most important aspects of this framework is to give names and definitions for different objects, roles, and function in digital curation. Specifically it defines and describes data as submission information packets (SIPs), archival information packets (AIPs), and dissemination information packets (DIPs). The model also includes a “flow chart” of functions depicted below which shows how SIPs that producers generate transform into AIPs which are curated by mangement and eventually into DIPs which are used by consumers. Each of these transformation requires metadata to describe the data. The model does not prescribe any particular software or methods for data ingestion, preservation planning, data management, archival storage, access, or administration.
Because OAIS does not prescribe any particular method for data curation, any number of tools and software can be used. The group I’ve been working with for the past few weeks used a repository software system called Fedora. Fedora is open access software which means you don’t have to pay for it and you can customize it to your needs. The system we played was pretty barebones and had both a graphical user interface (GUI) that you can access from the web and a command line interface (CLI) that you can access from the unix server where the software was installed.
One thing that I enjoyed doing was matching up the GUI commands with the CLI commands: so clicking on the “Add” button in the GUI simply asks for all the inputs for the CLI fedora_ingest command and executes it. The GUI commands were pretty self-explanatory because the system would simply ask you for all the inputs it needed although it would have been nice if there were a few more explanations. I also found the location of some action tabs not logical. The CLI commands were harder to figure out because the documentation is very terse and I wasn’t always sure what all the terms meant. It also would have helped if the documentation had more examples.
From the GUI interface, it was pretty easy to upload files from my harddrive, to display files, and to modify the information about them. I would have liked the ability to move files or folders around. I didn’t try uploading or downloading files from the CLI interface because it was a pain to move the files to/from my harddrive from/to the server. One thing I noticed was that the files that are generated when data are uploaded are not all located together. I never quite figured out the rhyme or reason of how the information was stored so there would be no simple way to move files around.
Now, back to the original intent of this post, how does theory play out in practice? I would say that it is pretty easy to distinguish the SIPs, the AIPs, and the DIPs from each other. The SIPs are the uploaded files, and the DIPs are downloaded files so the functions of ingest and access are self-explanatory. The AIPs are stored on the server and Fedora adds metadata to describe the data so the process of archival storage is also in Fedora. Commands for data management I found a bit lacking: you can edit metadata and purge or erase files but you can’t move the files. Preservation planning and administration in OAIS are bigger picture functions and don’t involve working with individual files in the repository. So, overall Fedora fits the data curation scheme put forth by OAIS.