Master Objects Migration Project Plan
Project has been superseded by the Dark Archive Decommissioning Project.
Content was last updated as of 2019-07-16
Goals
Successfully migrate master objects and required metadata to a preservation environment in the Master Objects Repository (MOR), from:
Dark Archive (DA)
J-Drive
External Media (EM)
Determine the appropriate disposition for the remaining content.
Objectives and Deliverables
Preparation for Migration
De-duplication of within the DA and in relation to J-Drive & EM
Format Analysis
Identification of Master Objects (MO) to be migrated
Establishment of metadata requirements for MOs to be migrated
Migration
Prioritization
Rights analysis
Metadata identification and/or creation
Migration testing
Migration of MOs
Post Migration Clean-up
Disposal of derivatives
Disposal of excess copies found in J-Drive and EM
Determination of disposition for other non-MOs
Scope
Location | In Scope? |
Dark Archive | |
/archive/Archived | No |
/archive/Committees | No |
/archive/Dept/ARV | Yes |
/archive/Dept/ATH | Yes |
/archive/Dept/CGA | Yes |
/archive/Dept/CSS | No |
/archive/Dept/DI | Yes |
/archive/Dept/HIL | Yes |
/archive/Dept/KB | Yes, as it pertains to MO determinations in other collection folders |
/archive/Dept/MUS | Yes |
/archive/Dept/RAR | Yes |
/archive/Dept/SRI | No, this is just a mapping to /archive/Dept/KB |
/archive/Dept/TRI | Yes |
/archive/Dept/WIT | No |
/archive/Fedora | No |
/archive/FedoraMORdata | No |
/archive/lost+found | No |
J-Drive | Yes, as it pertains to MO determinations in DA folders, but most likely secondary project post DA migrations. |
External Media | Yes, as it pertains to MO determinations in DA folders, but most likely secondary project post DA migrations. |
Stakeholders
Project Role | Who | OSUL Role |
Executive Sponsors | Jennifer Vinopal (formerly Lisa Carter) | AD: Special Collections & Area Studies (Interim) |
Jennifer Vinopal | AD: Information Technology | |
Process Owner/Manager | Dan Noonan | Digital Preservation Librarian |
Project Team | Application Development & Support (AD&S): Individuals ID’d on as needed basis | MOR interaction Subject Matter Experts (SME) |
Digital Initiatives:
| Metadata transformation & DC/MOR processing SMEs | |
OSUL-IT Infrastructure (IS):
| DA/MOR Analysis Reports SMEs | |
Archives & Special Collections:
| Collections SMEs | |
Content & Access and Archival Description & Access:
| Metadata & DC/MOR processing SMEs | |
Preservation & Reformatting:
| Preservation & Reformatting SMEs; Digital Imaging (DI) folder | |
Publishing and Repository Services (PRS): Maureen Walsh | Metadata & Knowledge Bank (KB) SME | |
Copyright Resource Center (CRC): Sandra Enimil | Rights SME |
Tasks
De-duplication of Dark Archive
Hash sum de-duplication: Includes OSUL-IT Infrastructure Support’s (IS) creation of reports that identify files with duplicate hash sums provided in csv format for analysis; initial analysis by Digital Resources Archivist (DRA); meetings between DRA and appropriate members of the curatorial staff; final analysis and deletions carried out by appropriate curatorial staff.
Derivative de-duplication: The MOR is a repository for master objects, not derivatives; therefore we need to be certain to only migrate the masters. This task includes IS’s creation of reports that identify files with duplicate file-names (e.g. 001.TIFF and 001.jpg) that is provided in csv format for analysis; initial analysis by Digital Resources Archivist (DRA); meetings between DRA and appropriate members of the curatorial staff; final analysis and determinations made regarding whether the duplicate files names should be maintained or disposed of by appropriate curatorial staff.
External de-duplication: External Media and potential Master Objects on the J-Drive will need to be analyzed and compared with digital objects in the Dark Archive to determine which objects should be migrated to the MOR and which are to be discarded as duplicates.
The majority of this effort has been completed. As content is migrated to the K-Drive for processing into the Master Objects Repository, it will be double-checked for any residual duplicates.
Formats/Collections Analysis
The DRA in collaboration with IS will develop reports of number of files by type by collections. This analysis will identify the quantifiable scope of the project and be used to assist in identifying migration priorities.
MOM Formats and Collections Scope
The DRA will develop a Collections Analysis Template that will allow curators to examine their collections, while identifying key information to facilitate the the migration and disposition of content.
Establishment of metadata requirements for MOs to be migrated
The Metadata Objects Work Group (MDOWG) has been working towards establishing minimum metadata guidelines for placing objects in the MOR. The current draft guidelines are for images and will need to be extended to include other format types and complex objects.
Identification of Master Objects (MO) to be migrated
Of the nearly 2,000,000 items in the Dark Archives (and digital objects stored on external media and potentially the J-Drive) not all are Master Objects that should be migrated to the MOR. Curators and Archivists will conduct this endeavor and will rely upon the outcomes of the "De-duplication of Dark Archive" efforts, the application of the definitions for "Master Objects" and the Format/Collection Analysis efforts to determine which objects will be migrated.
Migration
Prioritization
An initial analysis for prioritization of collections to be migrated was developed based upon:
completion of the de-duplication process
the file formats that the MOR can ingest
the format homogeneity of the collection
metadata and rights readiness of the collection
User access demands
Prioritization decisions were to be made by collection curators and archivists in consultation with Strategic Digital Initiatives Work Group (SDIWG - which includes the Heads of Digital Initiatives, Digital Content Services, Preservation and Reformatting and Application Development & Support), Head Special Collections Access & Description and the Digital Resources Archivist.
That system did not work. As of August of 2017, the Digital Preservation Librarian was charged with developing a new prioritization metric that accounted for:
born digital vs. reformatted content
File type
Object complexity:
single
complex <10
complex >10
ordered complex
Metadata readiness
KB master
Security/restriction constraints
Special considerations
The results of that process can be found here:
Lists by Departments:
Rights analysis
To ensure proper citation and public availability, content rights statements MUST accompany all content. Rights statements should include information regarding the rights holder, access permissions, and special processing instructions (e.g., watermarked or not). This is a collaborative effort with the curatorial staff, Special Collections Access & Description and the Copyright Resource Center (CRC).
Metadata identification and/or creation
NO ITEMS WILL BE MIGRATED WITHOUT METADATA. Based upon the minimum metadata guidelines established by the MDOWG, as well as desired elements for particular collections metadata will be created (or appropriate existing sources will be identified) for all objects to be migrated. This is a collaborative effort with the MDOWG, SCA&D and curatorial staff, in consultation with AD&S.
Migration Testing
Individual and batch migration workflows was tested with migration-ready content prior to taking the process live in 2016 and 2017. While processes were refined issues arose with complex objects that had more than 10 components. Call through Active Fedora bogged the system down, eventually prompting a 503 timeout error. Therefor that content is not only inaccessible to our patrons, but to curators. We did successfully test batch ingest, ingest of simple complex objects including those with various file types, and we successfully tested ingesting complex object that include public and suppressed files.
Migration of MOs
Non-mediated content: images, documents, audio/visual objects that can be individually and batch loaded to MOR with little- to- no mediation from curator. This is a collaborative effort with Digital Preservation (DP) and curatorial staff in consultation with Digital Initiatives (DI), Special Collections Access & Description (SCA&D) and Application Development & Support (AD&S).
Mediated content: images, documents, audio/visual objects that can be individually and batch loaded to MOR that require significant mediation from curator. This is a collaborative effort with Digital Preservation (DP) and curatorial staff in consultation with Digital Initiatives (DI), Special Collections Access & Description (SCA&D) and Application Development & Support (AD&S).
Disposition determinations of remaining content. Decisions will be made by Digital Preservation Librarian, Head of Digital Initiatives, Head of Preservation and Reformatting, Head of Publications and Repository Services and appropriate curatorial staff.
Migration will be tracked through JIRA tickets. The JIRA Ticket Number can be found in the priority lists:
Lists by Departments:
Below is an example of the MOM Kanban Board that will be used for tracking. In this example we see two existing projects that have been moved from the "To Do" to "In Progress". Projects will appear in the "Pending" column when Digital Preservation is awaiting curatorial input or actual ingest completion prior to QC.
This example shows the detail of the "Ticket". Digital Preservation (DP) will track activity here. When a Ticket is moved form "To Do" to "In Progress" DP will loop the curatorial staff in to viewing the Ticket's information.
Post DarkArchive Migration Clean-up
Time-frame: TBD
Disposal of derivatives
Disposal of excess copies found in J-Drive and External Media
Determination of disposition for other non-Master Objects
Documentation
In addition to this project plan, documentation will be maintained in BuckeyeBox at: https://osu.box.com/MOM.
If you have a disability and experience difficulty accessing this content, please contact LIB-a11y@osu.edu.