Tag Archive for theory and practice

(Digital Collections) English Street Art, Full Text Searching and Raising the Dead

“Crane” by Banksy, available at http://www.banksy.co.uk

“I mean, they say you die twice. One time when you stop breathing and a second time, a bit later on, when somebody says your name for the last time.”

This quote, attributed to anonymous and ubiquitous English street artist Banksy, is a surprisingly profound viewpoint on memory and mortality, and it got me to thinking: if that’s true, then couldn’t full-text searching serve as a means of “resurrecting” the dead?

Stick with me here.

Wherever possible, we run optical character recognition (OCR) on the materials we’re digitizing and adding to our digital collections. The process involves a piece of software scanning the text in an image and looking for patterns that match letters and groups of letters that it “recognizes” as words. Those patterns are then matched against a dictionary in the software’s database, and where a match is found, it is embedded into the image as a layer of machine-readable text. It’s the way we’re able to provide full-text searching for our materials, and with our current software solutions we’re able to achieve almost 100% accuracy in some cases.

Running OCR on items with printed text is de rigueur for most digital collections. It’s almost mandatory if you want your collections to show up in Google, Bing or other search engines, as the more “harvestable” data you have in your collections, the better chance those engines will ingest it and make it available for people searching for a particular word or phrase.

We’re seeing lots of activity on our collections related to what I would call genealogist/family history searches. Often, it’s someone looking for information on a relative who attended Baylor at some point in the past. They will enter a simple search string in Google – James A. Smith Baylor University 1918 – and the results will almost always bring them to an item in our digital collections. Now, no one in our office sat down and typed James A. Smith’s name into the metadata for the 1918 Round Up, for example, so the researcher was only able to find it because the OCR’d text generated a hit via the search engine.

Pattie Orr, Dean of University Libraries and VP for Information Technology, has often remarked that she’s seen potential donors and friends of the library get truly excited when she demos our system to them by typing in a family member’s name and seeing what comes up. They are often overjoyed at the opportunity to read about a favorite grandfather’s exploits in a debate club, or to see photos of their aunt posing with her sorority sisters. Dean Orr notes that this kind of connection often serves to create informed advocates for the work we’re doing, as they are able to show and tell others about the treasures they’ve discovered in our collections.

Which leads me to this final thought: as more and more genealogists, researchers, faculty members and casual historians are accessing our materials, they are going to be encountering the names, faces and stories of people who may have died centuries ago. Many of them passed from this earth so long ago that no one living today has ever said their name aloud. So any time a great-great-great granddaughter finds her antecedent’s name in our collection and says it aloud for the first time, it could very well be the first time anyone has done so in decades.

That simple act of saying a name can truly serve as a resurrection – not of a body, but of a story, an identity, a history. And we are more than happy to serve as a the avenue for something so important.

Bonus Content

Just for fun, here are some of my favorite names I’ve run across during my work with our collections. Feel like “resurrecting” one of your favorites? Just say it out loud!

Carlyne Trautwein

Odo Surratt

Hettie Clegg

W.M.W. Splawn

L. Blum Wootters

Wilby T. Gooch

Cloantha Copass

(Digital Collections) Gather ‘Round and Download the Tale: A Primer for Digital Storytelling in Archives

Storytelling, the analogue / shiny-shirt-wearing version
From the Frances G. Spencer Collection of American Popular Sheet Music

The storytelling urge is an ingrained part of human behavior spanning back to our earliest conversant days, when “This plant bad, that plant good” wasn’t just helpful advice for staying alive, it could also pass for a rollicking tale around a campfire.

Among the myriad ways we’ve progressed here in the 21st century is in our ability to tell stories in new ways using archival subject material. Digitized copies of 19th century letters, transcriptions of 17th century diaries, cassette tapes from the 1970s migrated to MP3: these are the tools a digital storyteller can utilize to bring the stories of the past to a new generation of listeners.

But at the heart of that process still lie some basic steps that a curator, scholar or blogger can use to bring the materials to their greatest impact.

  • Evaluate the materials: It may seem simplistic, but taking a good look at the source materials is often the most important part of any storytelling arc. Where possible, sort materials into an order that makes sense for the narrative; chronological is the most likely candidate, but you could also sort by thematic elements or format types as well.

 

  • Look for unifying themes: If your project encompasses something larger than a single life (where you’re telling the story of one principle character, for example), it can be helpful to look for unifying themes in your materials. Good starting points include a place (geographic), an idea, a repeating theme (freedom from oppression, interpretation of race in popular culture) and the like.

 

  • Gather your contextual materials: Often, we are unable to tell a full story by pulling only from the archival materials at our disposal. So it is important to gather contextual materials – secondary sources, other collections’ contents, new scholarship, etc. – to help augment the records on hand. Researchers also love seeing lists of these reference materials at the end of a blog post or contextual resource page, so include them as a guide for further investigation.

 

  • Compile the text: Now that you’ve gathered your materials, it’s time to sit down and actually write your narrative. My best advice for this part of the process? Nulla dies sine linea, or “No day without a line,” attributed to Pliny. That means not getting sidelined by writer’s block; just sit down, start typing, and see what flows. You’ll surprise yourself almost every time.

 

  • Evaluate: Don’t be afraid to take a good look at what you’ve written, and don’t forget to look back at resources you created in the past to see where they can be updated and revised for the better.

 

Toeing the Line

One challenge to overcome when writing the story to be told from archival collections is how to present enough information without editorializing or leading your readers/researchers/scholars to conclusions. As a collections professional, it is your job to present as much relevant information as possible without editorializing. Wherever possible, try to present facts in a neutral voice and present facts as plainly as you are able. Avoid the temptation to infer, guess, speculate or otherwise draw conclusions from the evidence; leave that kind of thing for your researchers and subject scholars.

That’s not to say you can’t have your own opinion, of course. Blogs are a great way to expound upon your own opinions about the collection without injecting it into the contextual research presented as part of your digital archive. So fire up a WordPress or Blogger account and flex your extemporizing muscles. After all, you’re likely the one person in the world who’s spent the most time with this particular collection of materials, shouldn’t you have a chance to give your two cents?

Where Does It End?

Just as our friend David Licata and the crew working on the upcoming documentary A Life’s Work know all too well, there are some stories that are difficult to tell simply because they have no definite end. How do we tell the story of the Black Gospel Music Restoration Project with any sense of finality? When do we stop recounting the information touched on in our Baylor University Libraries Athletics Archive? These are just two of the collections that could conceivably branch into innumerable storylines with no distinct ends, a potential problem of great concern to researchers and casual users who need the structure provided by a definitive beginning, middle and end to a story.

This is where the epilogue or “to be continued” approach can be very handy. If you’re wrangling with a subject that has no definite endpoint, include a note at the end of your contextual statement that indicates the collection in question is an ongoing project and updates will be added as they are needed. In other cases, it may be helpful to set an arbitrary cut-off date for your contextual research and simply note that, while more information is available for this collection, it is far from settled or even to a point that it can be presented in its proper context, and that an update will take place when the time is right.

A Worthwhile Yarn

The wonderful thing about digital storytelling is that it gives archival collections professionals a chance to open their resources up to a worldwide audience, something that was impossible only a few short decades ago. For truly unique or rare items, this could mean exposing someone a continent away to the treasure stored safely in your neatly ordered stacks, all without the need to undertake expensive travel. As stewards of these materials held in the public trust, it is our duty and our privilege to acquire, preserve, present and promote the materials in our care for use by researchers and interested parties the world over. Anything less would be a disservice to our stated mission of connecting people with ideas.

As you’re sitting down to tell the stories hidden in your archives, take a moment to appreciate the opportunity you have to do something truly impactful and unique in this world. While it certainly takes a great deal of work to get it done, there’s no substitute for the satisfaction of knowing you’ve used your talents to keep a story alive for a new generation of listeners.

And if all else fails, remember this: we can tell all these stories without getting campfire smoke on our materials. Now that’s a story worth telling.

(Digital Collections) “So We Can Throw These Out Now, Right?”: What We Learned From Microfilming Newspapers and How It Shapes Our Digitization Strategy

Pictured: Scanner Fuel, from the stacks at Baylor University.

Recently, I attended a workshop for a topic mostly unrelated to my work in digital collections. At introduction time, I gave a nutshell view of what I do by saying my group digitizes Baylor’s special collections and makes them available online. Despite the whole thing taking about 15 seconds and being intentionally generic, I’ve done this intro enough times by now to know what was going to happen next.

An older gentleman sitting on the front row got what I can only describe as the “ah-ha!” look on his face, and at the first break, he approached me and asked a question I get more often than not when I talk to people about what we do at the Digitization Projects Group.

“I work at a small museum, and we’re being told to digitize our collections. Once we do, we can just throw those old papers out, right? And is a DVD a good storage solution?”

My answer to him was simple, but it probably wasn’t what he expected to hear.

“Do you remember microfilm?” I asked him. “And when was the last time you used it and thought ‘Gosh, I wish I could get my hands on the original just to compare it to what I’m looking at’ only to find it’s been decades since anyone saw a paper copy? That’s why you can’t just throw things out once they’re scanned.”

“Also,” I added, “DVDs are terrible.”

***

Okay, so I wasn’t quite that blunt on the DVD answer, but the effect was the same: a stunned look of disbelief. In some ways, I don’t blame him. There’s a lot of misinformation (and outright falsehoods) out there about digitization, data preservation, and care of digitized materials, and the more channels it has to filter through to reach people at smaller institutions, the more distorted it can get.

If you haven’t done so, I encourage you to check out a book by Nicholson Baker called Double Fold: Libraries and the Assault on Paper. Baker’s central premise is that during the microfilming heyday of the 1980s and 1990s, libraries and other institutions put too much faith in the technology of microfilming and weren’t always diligent about properly preserving and storing the newspapers that had been filmed. It is a polemical, biased, uncomfortable book to read, and it is less than popular among librarians. But that was exactly Baker’s point.

Baker wanted to draw attention to the notion that just because a technology had come along that promised better access and a smaller storage footprint didn’t mean professionals could become lax about enforcing good practices of physical archival storage. While much of Baker’s criticism has been ably (and thoroughly) countered by library professionals in the decade since Double Fold’s publication in 2001, it remains a stirring think piece on the dangers of over-reliance on a “silver bullet” solution at the expense of long-term viability.

At the heart of Baker’s issues with microfilm was the prevailing attitude that, once a run of newspapers had been filmed, it was perfectly acceptable for the originals to be tossed, as the filmed versions were thought to be a reasonable substitute that preserved both the look and content of the papers at a fraction of the space required to store them. But what happens if the film is bad and no one noticed until the originals were long gone? Or what if a page was skipped, or an entire volume? Or what if the film falls prey to “vinegarization” – an inherent agent of deterioration wherein the films layers begin to breakdown and disintegrate, producing a distinctive vinegary, “salad dressing” smell – and now cannot be viewed?

If the originals are gone, the answer is clear: there’s nothing you can do.

***

Which brings me back to my fellow workshop attendee’s question: once things are scanned, they’re safe to pitch, right? The problems outlined in Baker’s book could just as easily apply to the process of digitizing archival materials. We believe the technology behind digitization is reliable, replicable, and sustainable, and we’ve learned a great deal about how to approach digitizing materials thanks to the lessons revealed by the great microfilm boom of the last century. As such, we’ve got processes and technologies in place to monitor our digital files, keeping them secure and accessible for decades to come.

But what about the things we can’t predict? What if the next generation of computers is so different from what we’re used to today that the very idea of digital files changes completely? What if a specialized virus destroys every TIFF file in creation? What if the Mayans were right, and civilization as we know it craters at the end of the year, rendering all our painstaking efforts profoundly moot?

The best answer is to do what people have done since 200 BC: go back to the paper versions.

That’s why we counsel our partners to use the process of digitizing materials to serve as a catalyst for rehousing materials in archival storage if they’re not stored that way already. That’s why we urge conservation of fragile materials before they arrive at our center. That’s why we never tell them it’s safe to throw something away just because it’s been scanned, cataloged and placed in a digital collection.

That’s why I told the man from the workshop that the answer to his question is a very simple, “No.”

And the DVD question? Think about this: when was the last time you popped a CD into your car’s stereo that you hadn’t listened to in a while, only to find that your favorite song was skipping like a hyperactive preschooler thanks to a series of almost-imperceptible scratches? It’s happened to all of us, and the same thing can happen to a supposed “100 year, archival” gold DVD.

But for years, digitizers at institutions large and small were told that backing up your files to a DVD and putting it on a shelf was a great example of a reliable backup, to the point where many early digitization outfits didn’t keep any other versions of files around once they were burned to disc. But we found pretty quickly that those discs weren’t reliable enough to be a sole backup source, so now we keep multiple copies on spinning discs, analog tape, and in the cloud both on- and off-site to ensure long term stability of our digital assets.

***

All of this makes good sense, but if professionals at big institutions like the Library of Congress, the National Archives and even Baylor’s own DPG have to keep constant watch on evolving technology trends just to stay up to speed, how can we expect staffers at small to mid-size institutions to keep up?

Ultimately, it comes down to education and using a common sense approach to digitization projects. Education on the part of large institutions like the Library of Congress, the Texas Historical Commission, and, at a local/regional level, our own staff to educate people at small institutions on the basics of digitization and file management. Workshops, webinars, websites and more can be found that contain basic information about how to scan documents, how to manage the data that results, and what to do to keep it safe, and more access to this kind of information can do great good to counteract some of the old misconceptions that are still out there.

And common sense? That’s something Baker’s Double Fold should give us reason to trust in spades. If something is important enough to scan and put online, isn’t it common sense to think that it’s important enough to preserve physically? If an archival collection was kept safely stored for decades in the right environment, does it make sense to throw it out now that it’s been scanned? And if we know that paper-based items can last for centuries when properly stored, doesn’t it make sense to hold onto them as long as we can, just in case?

***

Is digitization an important undertaking for libraries, museums, and archives of all sizes? Undeniably.

Should we take steps to ensure our cultural heritage – digital and physical – is properly stored, displayed, and accessed? Without a doubt.

Does either of those facts mean it’s safe to discard a decade’s worth of 19th century American newspapers once they’ve been scanned, as happened with microfilmed newspapers in the 1990s?

If anyone’s reading this post in 3012, do me a favor: look me up and let me know.

(Digital Collections) Go With the (Work)Flow: How Things Get Done in the RDC

One thing we’ve learned about digitizing Baylor’s unique collections is the importance of front-end planning for the overall success of a project. It’s the crucial step that separates a “well, that went smoothly” project from a “nightmare of epic proportions” project.

The challenge with workflow planning is that it’s the least glamorous part of almost every project, so giving it its due isn’t usually our first point of interest. Lots of digitization outfits fall into the trap of rushing to get items onto scanners as quickly as possible, assuming that things like useful filename identifiers and quality controlling will just work themselves out over the course of the project. Unfortunately for them, this is rarely the case, and failing to plan ahead becomes the first step in a rapid spiral into a project with no direction, frequent backsliding, and endless frustration.

So how do we avoid these pitfalls with projects that can encompass hundreds of thousands of items and up to a dozen different employees taking part in the process?

1.    Practice restrained exuberance. No matter how exciting the source material you’ve been tasked to digitize, letting the awesomeness of the items overwhelm your better judgment is a classic rookie mistake. Taking time to dispassionately evaluate the materials gives you a better handle on things like the items’ physical state, the extent of the collection (number of items), logistical challenges, and content-related concerns.

2.    Go with what works. Years of experience (and trial and error) have provided us with some practical tips that work with projects of almost any size. In the end, it comes down to some little things that make a big difference: get an accurate estimate on the number of items; use filenames that make sense (texas-johnson-diary-001-01.tiff) so you can find things easily; scan using best practices (300 dpi tiffs for preservation, etc.); and assign people to the kind of work that suits their personalities. There’s no need to reinvent the wheel for most digitization projects, and if you have to be inventive, make it an upgrade, not a complete redesign.

3.    Figure out who’s doing what. DPG staff handle the higher-level planning and ultimate quality control on all projects, but graduate assistants and undergraduate student workers carry out the bulk of the actual digitization and file manipulation for most projects. That means explicitly assigning portions of a project among one to ten people, something that can be a major hassle, unless you …

4.    Create a spreadsheet. Free tools like Google Docs offer our group a fast, free, cloud-based solution to keeping large groups of people in step with one another over the course of a project. Google Docs offers spreadsheets, documents and more with customizable levels of access so we can see at a glance where any project stands.

Workflow spreadsheet for the Browning Letters Project

5.    Create a workflow chart. DPG Manager Darryl Stuhr is a big fan of workflow charts, and his creations are virtuoso-level masterpieces of data management. Take a look at this piece (currently taped in his office window) for our Baylor University News Releases Project. These visualizations of how things work help him plan each byte from scanner to preservation server and online access.

Darryl Stuhr's workflow chart for the BU News Releases Project

6.    Stay on top of everyone’s work. Managing data is only half of the task; keeping the team on task is the other. It takes a great deal of effort to ensure students are scanning at a high level of quality, that files are ending up where they’re supposed to be, and that the final product is a collection people will find useful, accurate and interesting.

7.    Celebrate successes. Adding end-of-project pizza parties to our workflow has been a fun way to reward hours of often-repetitive effort on the part of our student workers. (College students like free pizza; who knew?) But often it’s the simple act of saying “Thank you” and celebrating together that makes the difference.

So if you’re setting out to start a major digitization project, keep these tips in mind. This may be the blog post that prevents you from regretting tackling one in the first place, and who knows? It may even give you an excuse to celebrate with pizza.

(Digital Collections) The Education of a Digitization Projects Group: A Dispatch from TCDL 2012

When the Digitization Projects Group isn’t busy saving the world (one scan at a time), we’re taking time to recharge our creative batteries and hone our technical skills at various conferences, symposia and workshops. This past week, half of the DPG (our Manager, Darryl Stuhr and myself) traveled to Austin for the Texas Conference on Digital Libraries.

This is the kind of group where library and IT types coexist in harmony, focused on the lofty goal of providing access to digital content, management of data, and the preservation of that data now and forevermore. Topics covered at TCDL included collaborative project workflows, data architecture, preservation systems, streaming video and much more. It’s the kind of group where a speaker may use the phrases “crowdsourcing,” “Internet 2” and “replicating server” in the same sentence with confidence that most people in the room will know what they’re talking about.

Darryl presented as part of a panel immediately following the opening session. His portion of the show covered the Browning Letters Project, specifically the challenges and rewards of working in collaboration with multiple parties to achieve a common goal. As outlined in this post, the Browning Letters Project is a major collaboration with Wellesley College focusing on the written correspondence of the poets Robert and Elizabeth Barrett Browning.

After a morning of presentations, nothing keeps things moving like a spicy “bowl of red” at the Texas Chili Parlor. We were joined by colleagues Tim Logan (Assistant Vice President for the Electronic Library) and Billie Peterson-Lugo (Director of Electronic Libraries Resources & Collection Management Services) for a lively round of conversation and traditional Texas chili.

Pictured: conference fuel

Presentations in the afternoon included information on streaming video for faculty use; crowdsourcing transcriptions of manuscript collections; and workflow/planning for collaborative projects. There was even an appearance by  Georgia Harper (University of Texas at Austin), a copyright expert who helped consult our group regarding the Black Gospel Music Restoration Project.

Georgia Harper: Copyright Rock Star

The day ended with a poster presentation session/reception where I presented a poster outlining what we’ve been up to in the realms of curation of digital assets and outreach to our respective publics.

Note the irony of presenting info about digital collections on a printed poster.

These conferences always generate lots of good ideas we can integrate into our work back in Waco. And while it can be easy to fall into the trap of “conference high” – where every idea you had seems like the most important thing in the world and must happen right now – there’s no doubt that taking advantage of opportunities like TCDL allows us to network with like-minded professionals, get exposed to new ideas and benefit from the critical mass that forms when lots of people interested in the same thing gather in one place for an extended time.

And did I mention the chili?

(Digital Collections) Mrs. Neff’s Portrait: Or, The Things We Scan That Aren’t Online

If you’re a regular reader of this blog,* you know we feature items in this space that are drawn from our digital collections that we believe are unique, interesting or otherwise worthy of added exposure. And for that purpose, we have more than 35,000 objects online to write about – more than enough to keep bloggers busy for years to come.

But what about the things we digitize at the Riley Digitization Center that don’t go online? What makes something worthy of occupying a spot in cyberspace and what makes something a candidate for relegation to a dark archive, securely stored and likely never to see the light of the Internet?

In general, there are four major reasons to digitize an item:

  • Rarity. The object in question is one-of-a-kind.
  • Fragility. The object is in a state of physical disrepair and digitization is a step on the way toward better storage, conservation or digital enhancement.
  • Access. The object will make a beneficial addition to an online environment.
  • Preservation. The object isn’t deemed an acceptable candidate for online presentation for any one of a number of reasons.

The reasons for keeping an item offline are numerous, including copyright concerns; lack of provenance information (source, date, authenticity, etc.); sensitive content; or potential for misuse.

Copyright Concerns
Some material we digitize in order to preserve the information on a physical medium (a 45 rpm disc, for example) before it can be lost. However, the copyright holder’s status for such an item may be unclear, as happens often with items from our Black Gospel Music Restoration Project. In these cases, the files are preserved (but access is limited) until copyright claims are established and addressed.

Lack of Provenance
In some cases, there is very little background information on an item, and that uncertainty makes is usefulness as a digital object less clear. For an item to be truly considered useful, the more verifiable information we can gather about it, the better. Sometimes an item is scanned to preserve information but held offline until further research can reveal crucial information that would make it a useful online object.

Portrait of Mrs. Pat M. Neff, courtesy Baylor University

This portrait of Mrs. Neff is an example of an item scanned for preservation – as part of a 2010 project to digitize all 13 official portraits of Baylor’s past presidents – but is not online due to a dearth of information about it. Until further research is done to establish some basic information about the provenance of the portraits, they are being held offline; in the future, they may be added to an online collection.

Sensitive Content
Sensitive content generally takes the form of information that could be considered patentable or otherwise copyrightable. For instance, original research generated by a doctoral student as part of a dissertation – which is then commercialized in the form of a book or product – may be digitized for preservation but not placed online due to its potential marketability.

Potential for Misuse
Items in this category include things like the blueprints for extant buildings. The Digitization Projects Group worked with Baylor’s architect and his staff to digitize the original blueprints for many of the buildings on campus, including recognized landmarks like Pat Neff Hall, Armstrong Browning Library and Tidwell Bible Building.

Unfortunately, because these items could be used for nefarious ends by people intent on doing harm, we will not be releasing them online. This is a true shame, as the plans include amazing details, all of them hand-drafted in a era before computer assisted design (CAD) became the standard for draftsmen. Below is a small excerpt of the ornamentation of Waco Hall’s main entrance to illustrate the kind of material in this category.

Detail of main entrance to Waco Hall, courtesy Baylor University architect’s office

Doomed to Darkness?
That’s not to say that these items will never be included in our digital collections. Often, they are slated for addition to future collections, or they are queued up for further research that will make them valuable additions to existing collections. Regardless of whether they find themselves displayed in your browser window someday, the staff at the DPG is committed to keeping them safe, secure and functioning for decades to come, a commitment that extends to any object that finds its way through our doors and onto our equipment.

*If you’re a regular reader and haven’t yet signed up to receive email notices when a new post goes live, take a moment to do so in the side bar. It’s the easiest way to ensure you’ll never miss a word of what’s happening with the DPG!