Caltech Library logo






ep3harvester is a command line program for metadata harvesting of EPrints repositories.

ep3harvester takes a JSON settings file and harvests all the EPrint repositories defined in the settings file into a JSON store implemented in MySQL 8. One repository per MySQL 8 table.

Each MySQL 8 table has several columns id, src (holding the JSON document as a JSON column) and an updated (holding the timestamp of when the metadata was harvested).


ep3harvester can generate an example settings JSON document. You can then edit it with any plain text editor (e.g. nano). Then you’ll need to setup a MySQL 8 database and tables to store havested data in.

ep3harvester uses a MySQL 8 database for a JSON document store. It will generate one table for EPrint repository. You can generate a SQL program for creating the MySQL database and tables from your settings JSON file using the “-sql-schema” option. Using the option will require a JSON settings filename parameter. E.g.

    ep3harvester -init harvester-settings.json
    nano harvester-settings.json
    ep3harvester -sql-schema harvester-settings.json >collections.sql


display help
display version
display license
Harvest groups from CSV files included configuration
generate a settings JSON file
harvest the eprintids indicated by the filename, one id per line
Harvest people from CSV files included configuration
Harvest people and groups from CSV files included configuration
-repo string
Harvest a specific repository id defined in configuration
Crosswalk the harvested eprint record to the simplified record model before saving the JSON to the SQL database.
display SQL schema for installing MySQL jsonstore DB
use verbose logging


Harvesting repositories for the month of May, 2022.

    ep3harvester harvester-settings.json \
        "2022-05-01 00:00:00" "2022-05-31 59:59:59"

Harvesting a caltechauthors repo using harvester-settings.json for week month of the month of May, 2022.

    ep3harvester -repo caltechauthors harvester-settings.json \ 
        "2022-05-01 00:00:00" "2022-05-31 59:59:59"

ep3harvester 1.2.4