R. S. Doiel, rsdoiel@caltech.edu
Caltech Library, Digital Library Development
April 2026
Rather than modifying repository software or writing plugins, we enhance capabilities by:
COLD is an example of this approach.
Caltech Library maintains authoritative lists of:
These records need to be consistent across multiple systems.
A web application for curating metadata objects and sharing them across library systems.
Used for:
Retrieves data:
COLD ←── RDM review queue snapshots pulled in for staff search
COLD ←── Caltech dir directory sync keeps people records current
COLD ←── ROR funder data updated as ROR releases new dumps
Provides data:
COLD ──→ InvenioRDM controlled vocabularies (people, groups, journals, thesis options)
COLD ──→ Feeds people and group metadata for public-facing library pages
COLD ──→ RDM reports author reconciliation and review queue data
No plugins. No modifications to repository software. Just APIs exchanging JSON.
COLD runs as three independent web services:
Access is controlled by Apache + Shibboleth in front — library staff log in with their campus single sign-on.
| Collection | Purpose |
|---|---|
| people.ds | Caltech researchers and staff (ORCID, ISNI, VIAF, group memberships) |
| groups.ds | Departments, labs, centers, research groups |
| funders.ds | Funding organizations with ROR identifiers |
| journals.ds | Preferred journal titles and ISSNs |
| subjects.ds | Subject classifications |
| thesis_options.ds | Degree program options |
| ror.ds | Local copy of Research Organization Registry |
| rdm_review_queue.ds | Snapshot of InvenioRDM submission review queue |
A person record in COLD holds:
This single authoritative record drives the people vocabulary in RDM, the author pages in Feeds, NSF collaborator reports, and directory sync.
The Research Organization Registry (ROR) is the emerging standard for funder and organization identifiers.
ror.dsInvenioRDM’s submission review queue is now searchable inside COLD.
This is the “develop at the edges” approach applied to an existing repository system.
| Report | What it produces |
|---|---|
| People CSV | Full CaltechPEOPLE export — usable in LibreOffice, Excel, or Jupyter |
| Groups CSV | Full CaltechGROUPS export |
| Division People CSV | People listed by Caltech directory division |
| People Membership CSV | One row per person-per-group membership |
| Division People Crosswalk | People crosswalk restricted to Division-level groups |
| Group People Crosswalk | People and all group affiliations |
| People Identifier CSV | All external identifiers per person — datestamped snapshot |
| Report | What it produces |
|---|---|
| Author’s Records CSV | All records from RDM request, drafts, and records metadata |
| Authors’ Review Queue CSV | Current items in the review queue for author reconciliation |
These reports support the ongoing work of matching RDM submission authors against COLD’s authoritative people records.
| Report | Output | Purpose |
|---|---|---|
| Journals Vocabulary | journal_vocabulary.yaml | Preferred journal names loaded into RDM |
| Groups Vocabulary | group_vocabulary.yaml | Caltech groups list loaded into RDM |
| People Vocabulary | people_vocabulary.yaml | Authors vocabulary (with ORCID and affiliation) |
| Thesis Options Vocabulary | thesis_option_vocabulary.yaml | Degree options for thesis submissions |
Staff run reports. YAML files can installed and consumed by InvenioRDM as controlled vocabularies. COLD becomes the single place to update a name or identifier. The library doesn’t ’t require a developer to curate an RDM vocabulary.
The NSF Collaborator Report is a mediated, parameterized report:
clpid<clpid>_nsf_collaborator_report.csvA report that used to require manual spreadsheet assembly is now a self-service staff request.
Staff requests report in browser
→ request queued in reports.ds
→ cold_reports picks it up (FIFO, 10-second poll)
→ runs the configured script
→ output saved to the web server
→ staff notified by email with a download link
Adding a new report requires writing a script and one YAML configuration entry — no changes to the middleware.
Staff curates a person record in COLD
→ People vocabulary updated → RDM gets the authoritative author list
→ Feeds updated → public-facing researcher pages stay current
→ Collaborator report reflects the latest publication data
→ People CSV available for any downstream analysis
Changes made in one place propagate outward. Systems stay in sync without manual reconciliation across spreadsheets.