Feeding an Elephant, One Book at a Time: Supporting the Hathi Trust Digital Library

Modern researchers rely on access to information in a manner that was unthinkable less than a generation ago: the Internet, with its light-speed connection to all the resources of the world’s libraries, archives and cultural heritage institutions. But even with the explosive growth in digital collections there remains an untold number of books and other resources that remain difficult to find online. One way to make the job of finding them easier is through aggregator systems that bring together materials from a number of disparate places.

The HathiTrust is one such site. From their website:

HathiTrust is a partnership of academic & research institutions, offering a collection of millions of titles digitized from libraries around the world.

Those digitized titles – numbering more than 14 million as of 2015 – cover an impressive range of topics, time periods and authors. But like so much in life, the items in the HathiTrust are only useful if they’re available. And that means someone has to take the time to ensure the items the site has flagged as being in the public domain actually are, no simple feat when you consider the sheer volume of materials ingested into the system every year.

When Baylor committed to being part of the Copyright Review Management System  World (CRMS-World) team for the HathiTrust, it was with the understanding that we were pledging to review thousands of books and manuscripts to determine their copyright status and whether or not they fell into the category of a public domain work in the United Kingdom, Canada and Australia. Enter our Electronic Library team, specifically Denyse Rodgers, Darlene Youts and Brenda Anderson.

Denyse was the first Baylor library member to sign on for the project, in 2012; Darlene and Brenda joined in 2014. The initial grant that funded the project expired in February 2016, but the trio of Bears volunteered to keep working on the project through the end of the year on a volunteer basis.

Using a custom web interface developed by HathiTrust, Denyse, Darlene and Brenda spent hundreds of hours reviewing digitized copies of books added to the database from HathiTrust partner institutions. Using a set of criteria provided by the Trust, they reviewed provided metadata records for each book and determined whether or not it met predetermined public domain criteria. Then, they would flag the record in the system and it would join the ranks of the PD or non-PD materials in the HathiTrust catalog.

In toto, the project reviewed more than 305,000 volumes, identifying more than 154,570 as public domain works. Locally, the numbers were:

  • Brenda reviewed 21,011 items, of which 9,807 were PD
  • Denyse reviewed 20,860 items, of which 9,857 were PD
  • Darlene reviewed 7,090 items, of which 3,116 were PD

Perhaps the most exciting result from the project is that our Baylor team – along with teams from Penn State and University of Illinois Urbana-Champaign – reviewed roughly 1/3 of the entire corpus of works covered during the project’s timeline.

After the project’s completion, Denyse told me, “I believe this was a very worthwhile project because it allowed materials to be made openly accessible that otherwise might not be.  In April 2016, ALA recognized the program with the L. Ray Patterson Copyright Award.  I was pleased to be able to participate in such a worthy effort.”

Billie Peterson, director of Resources and Collection Management Services for the Electronic Library, also praised the project. “From my perspective,” she said, “all of the institutions that participated in both the original CRMS grant and the CRMS World grant enabled opening access to hundreds of thousands of English-language orphaned works contained in the HathiTrust corpus and developed a tested and robust method to determine whether or not English-language orphaned works are actually in the public domain.”

The Baylor team worked on the HathiTrust project in addition to their regular daily work managing library information systems (Denyse), coordinating the usage statistics process (Darlene) and managing the course reserves process (Brenda). As of the Spring 2017 semester, their hard work paid off: thanks to their efforts, the Baylor Libraries had met their commitments and added thousands of titles to the public domain list in the database.

The old joke, of course is that you eat an elephant one bite at a time. But as the efforts of our Electronic Library colleagues show, it takes a steady diet of daily work to feed one.

Thanks for your hard work, and Sic ’em, Denyse, Darlene and Brenda!

Leave a Reply

Your email address will not be published. Required fields are marked *