Robert S. Doiel, Thomas Morrell
Caltech Library, Pasadena, California
Caltech Library has a heterogeneous mix of repository systems (e.g. EPrints hosts CaltechAUTHORS and CaltechTHESIS, while CaltechDATA is based on Invenio). Caltech Library has changed its focus from developing in the specific repository system to one of development at the edges leveraging web APIs. This has allowed us to not only repurpose content but start working at collection level curation by integrating external data sources like ORCID, CrossRef, FundRef and DataCite. The philosophy we have evolved is to work from copies of the data in JSON form using an Open Source tool Caltech Library created called dataset as well as additional Open Source tools in a project called datatools. These command line tools are written in Go but can be easily used from more popular languages like Python. This talk will introducing these tools and demonstrate their usage via Python.