Archivists on the Issues is a forum for archivists to discuss the issues we are facing today. Today’s post is by Eira Tansey, Digital Archivist and Records Manager at the University of Cincinnati.
This post is about the Repository Data project, an SAA Foundation grant funded project to assemble a comprehensive data set of US archival repositories. The research team consists of Ben Goldman (Penn State University), Eira Tansey (University of Cincinnati), and Whitney Ray (UNC-Chapel Hill). By contacting over 145 archival organizations, they have received data on thousands of archival repositories across the United States. They are still processing the data, but it will eventually be made accessible to the public. Read on!
As we previously noted, the only existing open data set for archival repositories – OCLC’s ArchiveGrid – lacks representation of many small archives, historical societies, and other nebulously-defined archives. As many of you know, inclusion in ArchiveGrid is primarily driven by having various descriptive data (MARC records, EAD finding aids, etc) online and crawlable to OCLC. This means that repositories with professional archivists on staff and the resources to make archival description available online are over-represented in the ArchiveGrid data set. In reality, there are many archives that don’t fit this description, and are therefore literally invisible to much of the profession.
This has been frustrating to us as we pursue our work on archival vulnerability to climate change. The institutions that are most at risk for sea-level rise and climate change influenced disasters are also the least likely to have professional staff and sufficient resources to sustain archival collections even in “normal” times – let alone during an emergency. And yet, these are the archives that weren’t visible in our first pass at mappingrepository vulnerability to climate change.
But now we’d like to show you the dramatic way in which our research project has uncovered how many archives exist – even if they aren’t putting their finding aids online.
This is the “Before” map, reflecting OCLC’s data – according to ArchiveGrid as of 2016, there are approximately 44 repositories in the state of Ohio:
Although this data is not yet final, this is our beta data set for Ohio – i.e., our “After” map. You can see a dramatic difference in how many more archives have been revealed thanks to our efforts (and especially that of Whitney, our fantastic research assistant, who has done the heavy lifting in reaching out to archival organizations to compile and clean data). According to our preliminary* data, there are well over 500 repositories in the state of Ohio.
I want to highlight that constructing archives as those repositories that participate in networked archival descriptive infrastructure tends to erase the visibility of small archives, especially those outside of major population centers. Let’s use southeastern Ohio – aka Appalachia – as an example.
The light-green counties are those that are part of the federally-defined Appalachian Regional Commission’s jurisdiction. (Clearly there are cultural constructions of Appalachia that do not fit in with these county delineations, but those aren’t as easy to find as open GIS data!)
In the “before” map, only 3 archives exist in Ohio’s Appalachian counties, and they are all associated with higher education: Marietta College, Youngstown State, and Ohio University.
But in the after map, we see that there are roughly 100 (100!!!!!!!!!!) archives in Ohio’s Appalachian counties. Why the massive difference? Because our efforts to get as much data from local, regional and state archival organizations means we have pulled in dozens of small historical societies, public libraries, and museums.
We haven’t done before and after comparisons yet with other states, but I anticipate they would look very similar to what we’ve seen with Ohio. Building the first comprehensive data set of US repositories is no small task, but we think the preliminary results speak for the importance of our work.
*We say preliminary because we still have some cleaning and minor de-duplication tasks left with our data.