Caltech Library logo

NAME

ep3util

SYNOPSIS

ep3util OPTIONS ACTION [ACTION_PARAMETERS …]

DESCRIPTION

ep3util provides a quick wrapper around EPrints 3.3 REST API. By default ep3util looks for five environment variables.

REPO_ID
the EPrints repository id (name of database and archive subdirectory).
EPRINT_HOST
the hostname for EPrint’s.
EPRINT_USER
the username having permissions to access the EPrint REST API.
EPRINT_PASSWORD
the password for the username with access to the EPrint REST API.
C_NAME
If harvesting the dataset collection name to harvest the records to.
EPRINT_DB_HOST
The MySQL hostname holding the EPrints repository database
EPRINT_DB_USER
The MySQL username used to access EPrints repository database
EPRINT_DB_PASSWORD
The MySQL password used to access EPrints repository database

The environment provides the default values for configuration. They maybe overwritten by using a JSON configuration file. The corresponding attributes are “repo_id”, “eprint_host”, “c_name”, “eprint_db_host”, “eprint_db_user”, and “eprint_db_password”.

If the environment variables for MySQL access are set then the results reflect direct access to the database instead of the EPrint REST API.

OPTIONS

help
display help
license
display license
version
display version
config
provide a path to an alternate configuration file (e.g. “irdmtools.json”)

ACTION

ep3util supports the following actions.

setup
Display an example JSON setup configuration file, if it already exists then it will display the current configuration file. No optional or required parameters. When displaying the JSON configuration a placeholder will be used for the token value.
get_all_ids
Returns a list of all repository record ids. The method uses OAI-PMH for id retrieval. It is rate limited and will take come time to return all record ids. A test instance took 11 minutes to retrieve 24000 record ids.
get_modified_ids START [END]
Return a list of records created or modified in the START and END date range. If END is not provided it is assume to be today.
get_record RECORD_ID
Returns a specific simplified record indicated by RECORD_ID, e.g. 23808. The RECORD_ID is a required parameter.
harvest HARVEST_OPTIONS [KEY_LIST_JSON]
harvest takes a JSON file containing a list of keys and harvests each record into a dataset collection. If combined with one of the options, e.g. -all, you can skip providing the KEY_LIST_JSON file.

HARVEST_OPTIONS

-all
Harvest all records
-modified START [END]
Harvest records modified between start and end dates.
-as-citations
This harvests the record into a minimal citation form similar to citeproc

ACTION_PARAMETERS

Action parameters are the specific optional or required parameters need to complete an aciton.

EXAMPLES

Setup for ep3util by writing an example JSON configuration file. “nano” is an example text editor program, you need to edit the sample configuration appropriately.

ep3util setup >eprinttools.json
nano eprinttools.json

Get a list of all EPrint record ids.

ep3util get_all_ids

Get a specific EPrint record. Record is validated against irdmtool EPrints data model.

ep3util get_record 23808

Harvest all records

ep3util harvest -all

Harvest records created or modified in the month of September, 2023.

ep3util harvest -modified 2023-09-01 2023-09-30