eprint2rdm is a Caltech Library oriented command line application that takes an EPrint hostname and EPrint ID and returns a JSON document suitable to import into Invenio RDM. It relies on access to EPrint’s REST API. It uses EPRINT_USER, EPRINT_PASSWORD and EPRINT_HOST environment variables to access the API. Using the “-all-ids” options you can get a list of keys available from the EPrints REST API.
eprint2rdm can harvest a set of eprint ids into a dataset collection using the “-id-list” and “-harvest” options. You map also provide customized resource type and person role mapping for the content you harvest. This will allow you to be substantially closer to the final record form needed to crosswalk EPrints data into Invenio RDM.
Environment variables can be set at the shell level or in a “.env” file.
Example generating a JSON document for from the EPrints repository hosted as “eprints.example.edu” for EPrint ID 118621. Access to the EPrint REST API is configured in the environment. The result is saved in “article.json”. EPRINT_USER, EPRINT_PASSWORD and EPRINT_HOST (e.g. eprints.example.edu) via the shell environment.
eprint2rdm 118621 >article.json
Generate a list of EPrint ids from a repository
eprint2rdm -all-ids >eprintids.txt
Generate a JSON document from the EPrints repository hosted as “eprints.example.edu” for EPrint ID 118621 using a resource map file to map the EPrints resource type to an Invenio RDM resource type and a contributor type map for the contributors type between EPrints and RDM.
eprint2rdm -resource-map resource_types.csv \
-contributor-map contributor_types.csv \
eprints.example.edu 118621 \
Putting it together in the to harvest an EPrints repository saving the results in a dataset collection for analysis or migration.
dataset init eprints.ds
eprint2rdm -all-ids >eprintids.txt
eprint2rdm -id-list eprintids.txt -harvest eprints.ds
At this point you would be ready to improve the records in eprints.ds before migrating them into Invenio RDM.