Data Curation Intro
As I have mentioned before I am currently taking classes for my library degree. As part of my degree I am planning on getting a certificate in digital curation. What exactly is digital curation and why am I interested in it? I’m actually interested in digital curation as a part of data management/curation.
Most people associate libraries with books, but probably more accurately libraries should be associated with information (which is often found in books, but also in journals, maps, pictures, electronic resources, and other formats). Data or facts are not very useful or informative until put into some sort of context. Part of this context includes the description of the content of the data and where it is stored. In other words, data about the data, or what librarians call metadata, helps data become information (or something useful).
Although everybody has data (think about your photographs and other “collections” around your house), I want to help researchers (specifically those at Baylor) transform the data they collect or generate into information that the world can find and learn from and even use for further research. I see this process of transforming data into information as curation (although I often wonder if curation is a word since it always is underlined as being misspelled).
To me this process of curation means first working with researchers as the generate or collect the data to make sure the data is described well and stored properly. Then helping them decide which data should be kept (raw data or reduced data, for example) and who should have access to it. For the data that are kept, the next step is making sure that they are stored somewhere permanently and accessible to others who might want to view and perhaps reuse them and not just on a random harddrive somewhere and making sure that there is a plan to review and update the data and upgrade the storage over time.
The hard part I anticipate is getting researchers to include these curation steps as part of their workflow. Although mandates from funding agencies have begun to nudge researchers into thinking about data curation, I don’t think that researchers will really buy into the idea and change their habits until they see some sort of benefit to themselves.
How data curation will be incorporated into a researcher’s workflow is largely dependent on discipline and the type of data. Even though they may all be biologists, a researcher who runs clinical trials will have very different types of data than one who analyzes genome sequences or one who tracks endangered species in the field. Nevertheless, much of the data collected and generated today in the sciences and social sciences and increasingly in the humanities is digital (hence my pursuit of a certificate in digital curation).
In my courses, I’ll be learning about different about software that can be used to manage digital data. The different programs seem to be strong in different areas: description, organization, discovery, storage, preservation, etc. I’m not sure how much I will use any of these programs regularly as the STEM librarian at Baylor, but it will be important for me to know them as I try to guide researchers and the university into turning their data into information.