Caltech Library logo

Building software at the edges of heterogeneous repositories

Robert S. Doiel, Thomas Morrell

Caltech Library, Pasadena, California

Caltech Library has a heterogeneous mix of repository systems (e.g. EPrints hosts CaltechAUTHORS and CaltechTHESIS, while CaltechDATA is based on Invenio). Caltech Library has changed its focus from developing in the specific repository system to one of development at the edges leveraging web APIs. This has allowed us to not only repurpose content but start working at collection level curation by integrating external data sources like ORCID, CrossRef, FundRef and DataCite. The philosophy we have evolved is to work from copies of the data in JSON form using an Open Source tool Caltech Library created called dataset as well as additional Open Source tools in a project called datatools. These command line tools are written in Go but can be easily used from more popular languages like Python. This talk will introducing these tools and demonstrate their usage via Python.

Related Software Projects

Presentation

Narration

Bash version of live demo