Quarterly Harvests to DPLA

Four times a year, metadata from our Repox instance is harvested by DPLA. The status of these ingests are tracked in the DPLA wiki.

Check and Harvest Before Ingests

Before each ingest, we minimally should test every set and transform that was touched prior to the last harvest. The easiest way to do this is to use DLTN check and harvest against the MODS metadata prefix in Repox. With check and harvest, you can test an entire provider or go set by set. Check and harvest will only test whether every record has a title, a data provider, a rights statement, a thubnail, and a link to the object. Therefore, there can still be other problems.

Additional Testing

Additional Checking with DLTN Metadata QA and variety

An easy way to look for other problems is to harvest a set into DLTN metadata QA and look at the output in variety. If odd namespaces or weird output exists, it’s possible to catch some things here.

Unit Tests in DLTN XSLT

We try to add unit tests for every provider and transform in DLTN XSLT. These unit tests should test whether the first OAI PMH response of a given transform results in a valid well formed document. They don’t test for bad data in random nodes.

Generating Reports

To generate reports, a few things need to happen. In the DLTN XSLT tracker, each issue completed during the quarter should have:

  1. an associated milestone

  2. an associated assignee

  3. be closed

Then, quarterly reports can be generated by running issue_reports.py in dltn technical docs.

You’ll then need to have a completed config.yml, with a GitHub user, and GitHub password, and a GitHub access token.

You can then generate a report with:

python generate/issue_reports.py -m my_github_milestone