There are two components to this service:
The Python 3 script, buildxml.py, connects with a specific ArchivesSpace repository via the AS API, scans the digital objects in the repository, finds the associated archival objects, and writes the OAI Static Repository XML file. The XML output contains Dublin Core (DC) records for digital resources. The XML static repository is based on the OAI Static Repository specification, but does not adhering to it strictly. The static repository is the data source for the Open Archives Initiative (OAI) Data Provider.
The OAI Data Provider adheres to the OAI standard and supports all the verbs (Identify, ListMetadataFormats, ListSets, ListIdentifiers, ListRecords, and GetRecord), resumption tokens, and sets. Only DC metadata is provided. Sets correspond to the archival collections in the Caltech Archives.
Main features and assumptions:
The buildxml.py file is designed to be run from the command line, or from within your favorite editing environment. It uses standard Python libraries and has been tested using Python 3.12.
Installation of the ArchivesSnake client library is required to utilize the ArchivesSpace backend API. It can be installed using
pip3 install ArchivesSnake
Other required packages are all standard Python.
To generate the static XML repository run the script in the same location as the defaults.py and secrets.py files:
python buildxml.py
defaults.py contains default values that identify the OAI URL, AS repository number, base URI for identifiers, and the public repository URL. secrets.py defines the data provider base URL, and API username and password.
The XML file will be written to ‘staticrepo.xml’ in the ‘xml’ directory:
../xml/staticrepo.xml
If duplicate URLs are found they are written to ‘duplicates.txt’ and omitted from the static repository:
../xml/duplicates.txt
There are options for running the script in dev or test mode. To see options:
python buildxml.py -h
usage: buildxml.py [-h] [-r RUNTYPE] [-n NUM_RECS]
options:
-h, --help show this help message and exit
-r RUNTYPE, --runtype RUNTYPE
-n NUM_RECS, --num_recs NUM_RECS
Default runtype is ‘production’ and includes all appropriate records in the repository. Any other value will cause the script to run in dev/test mode and the XML file will be written to the ‘dev’ folder. If no -n value is given, all records will be processed. If a negative -n value is given, 1000 records will be processed. Any other number defines the number of records to process.
../dev/staticrepo.xml
../dev/duplicates.txt
Running in dev/test mode does not affect the production XML output, which is the xml folder.
The OAI Data Provider is a web application written in Python 3 using the Flask micro web framework. Installation of Flask will include dependent libraries, such as Jinja2 and werkzeug. No additional libraries are required.
The OAI Data Provider functionality provided by oaidp.py. Additional functions are imported from aspace.py.
An SQLite3 database is used to store a log of OAI requests, information about collections (rewritten nightly when ‘buildxml.py’ runs), update dates, authorized users, and earliest date in the repository.
CREATE TABLE logs (date text, verb text, setname text, identifier text, datefrom text, dateuntil text);
CREATE TABLE collections (collno text, colltitle text, docount int, incl int, caltechlibrary int, internetarchive int, youtube int, other int, collid text, description text, typ text, aocount int default 0, last_edit text, type_text int, type_stillimage int, type_movingimage int, type_sound int, type_other int);
CREATE TABLE last_update (dt text, fn text);
CREATE TABLE user(username TEXT UNIQUE NOT NULL, role text);
CREATE TABLE dates (earliest TEXT);
Application defaults are stored in ‘util/defaults.py’ and ‘util/secrets.py’. See defaults_template.py and secrets_template for guidance.
Global variables for buildxml.py are listed in the “Global Configuration Section” at the top of that script.
Software produced by the Caltech Library is Copyright © 2026 California Institute of Technology. This software is freely distributed under a BSD-style license. Please see the LICENSE file for more information.
This work was funded by the California Institute of Technology Library.