Gray Digital Preservation Repository

LAUNCHED 2024.01.16!!!

The Ohio State University Libraries' Gray Digital Preservation Repository (Gray Repo) service provides a path to preservation for born digital (or received as digital) content that has been accessioned, and is only intended to be minimally processed and/or is temporally restricted. As such, and in accordance with Distinctive Collections' accessioning policies and procedures, it is the default digital preservation repository. Further, it provides a preservation environment for some legacy digitized preservation files. It is a "dim archive" that allows for curatorial deposit and retrieval, but no direct patron access.

Go-link for this site: go.osu.edu/Gray-Repo-Wiki

Context

The Gray Repo is a "dim digital preservation archive" that provides no public access, and limited curatorial access to the University Libraries' digital objects stored within. This is in contrast to a "light archive" which provides public access, or a "dark archive" which only allows custodial access. The Gray Repo allows for curatorial deposit and retrieval, but no direct patron access. It is much more akin to a physical archival storage facility, much like our Book Depository, where items are stored on shelves in a environmentally regulated and well managed manner, and appropriately described in conformance with accepted standards, while the public and unvetted personnel are not allowed to wander the stacks.

Background

Gaps Addressed

Linkage to Strategic Directions

The Gray Repo emerged from an initial use case presented by the University Archives (Archives) for their annual collection development efforts. The Archives regularly accrues archival materials to existing (and sometimes new) collections on an annual basis, due to a mandate for collecting University records. Whether the records are analog or born digital, they are typically so voluminous that the Archives practice is to accession the records, update the finding aid, store them and provide mediated access. There is minimal descriptive effort, and discovery is incumbent upon the researcher/patron to examine the accession inventories and/or the records themselves. The Libraries' existing digital preservation platform, Digital Collections, which inherently requires item level description, was not designed for the ingest and management of digital assets at an accession level; hence the need for a second type of digital preservation repository.

While this initial use case presents itself most definitively within the University Archives, the discovery phase of developing this service and service description revealed that the standard operating procedure for the accessioning and/or accrual of any archival and special collection is one of minimal processing. As such, it will be the University Libraries default digital preservation repository for born digital content.

Two additional use cases arose from our initial discussion with stakeholders, one that had broader implications, but ultimately tied to the underlying and evolving nature of the GDR; and the other with targeted impact.

  • The former instance was posited by the Ohio Public Policy Archive, but could have broader implications, and concerns temporally restricted content. Typically the donation of congressional papers comes with a donor restriction as to when the content can become available for public research. Combined with the fact that almost all congressional papers donations are now either a majority or completely digital, we need to provide a true digital preservation platform to secure these records while they await further processing. That "further processing" will happen within a specified time based upon the donor agreement, and may lead to a re-ingest of minimally processed records akin the University Archives; or they me be described in more detail and ingested into Digital Collections.

  • The latter instance regards legacy digitized content. Before the evolution of the Digital Collections platform, digitized University Libraries content was typically ingested into the Knowledge Bank to provide access. The files ingested are access files, not preservation files. The preservation files were stored on the Libraries sFTP server known as "The Dark Archive." Unfortunately, it is neither a dark archive, nor a digital preservation platform or environment. These KB-related files would be better preserved in the Gray Repo.

Provides University Libraries a formalized path to preservation for:

  • born digital content

  • born digital content that is temporally restricted

  • digitized preservation files for content that resides in other Libraries repositories, such as the Knowledge Bank

  • Empower Knowledge Creators

  • Engage for Broader Impact

  • Enrich the User Experience

  • Model Excellence

https://library.osu.edu/strategic-directions

Service

Stakeholders

Stakeholders

Service Owner & Providers

Content Owners/Curators

Content Processing

Consultants

Informed Parties

  • Application Development & Operations (Provider)

  • Digital Preservation (Owner & Provider)

  • Infrastructure (Provider)

  • Billy Ireland Cartoon Library & Museum

  • Byrd Polar and Climate Research Center Archival Program

  • Music & Dance Library

  • Ohio Public Policy Archives

  • Publishing & Repository Services

  • Thompson Special Collections

  • University Archives

  • Archival Technical Services

  • Billy Ireland Cartoon Library & Museum

  • Digital Preservation

  • Preservation & Digitization

  • Thompson Special Collections

  • University Archives

  • Copyright Services

  • Cybersecurity

  • Bibliographic Initiatives

  • Collection Development

  • Electronic Resources

  • Executive Committee

  • Management Committee

  • Metadata Initiatives

  • Research Services

  • Subject Liaisons

Description

Description

Components

Content

  • Repo: Fedora on Amazon Web Services (AWS)

  • Ingest:

    • AWS ingest buckets

    • Bag-It

    • VPN

  • Staging:

    • Digital Processing (K-drive)

    • One Drive

    • "Dark Archive" (to be emptied and decommissioned)

    • VPN (Libraries share drives when necessary)

    • sFTP (when necessary)

  • Transfer to University Libraries

    • One Drive

    • External drives

    • External Media

    • Donor cloud storage

  • Forensics:

    • DROID (creates manifest with checksums and file characterizations)

    • Tesseract and ABBYY FineReader (to create OCR text for images and "dumb" PDFs)

    • Bulk Extractor (for Personally Identifiable Information (PII) identification)

  • Management

    • Finding Aid

      • Archivist ToolKit

      • PastPerfect

      • OPAC

    • Help/Maintenance

      • JIRA

    • Local Administrative Dashboard

      • Teams Channel file storage

      • Microsoft Lists

    • Plain text reader app

  • Mediated Patron Access

    • Archivist ToolKit

    • PastPerfect

    • OPAC

    • Secured Virtual Reading Room (sVRR)

    • One Drive

    • External drives

    • External Media

  • Initial considerations:

    • Is its University records retention permanent?

    • Do we have a Deed of Gift?

    • Have we accessioned it?

    • If it were analog,would we store it in the Book Depository or other closed stacks?

    • Are they preservation copies of other materials in the KB?

    • Is it something we own, but not the rights that we have digitized (e.g. brittle books project)?

  • Typical content:

    • University Records:

      • Born Digital records

      • Transferred digitized records

      • Preservation digitized files for third party platforms (e.g. Veridian)

    • Special Collections:

      • Born Digital collection objects

      • Preservation digitized files for Knowledge Bank content

    • Publications & Repository Services (P&RS):

      • Preservation digitized files for Knowledge Bank content

  • Future potential content types:

    • Digitized University Libraries' content (ostensibly audio-visual objects) without clear rights

    • Web Archiving WARCs

    • P&RS other publishing platform content

    • Brittle Books at Internet Archive preservation digitized files

    • HathiTrust preservation digitized files

Process Workflow Overview

The following is a brief overview of the workflow process/components.

Gray Digital Preservation Repository High Level Workflow
Gray Digital Preservation Repository High Level Workflow

 

Due to information security concerns, the complete Gray Digital Preservation Repository Workflow is available from the Digital Preservation Department upon request for internal University Libraries use only; however, this redacted version , is publicly available. This version was updated to correct an error in explaining the Payload Oxum number.



The Ohio State University

If you have a disability and experience difficulty accessing this content, please contact LIB-a11y@osu.edu.