cold_api.yaml is the configuration file for
datasetd, the JSON API backend that COLD builds on. It is
not read by any TypeScript code — it is consumed entirely by the
datasetd binary. Everything the middleware and browser can
access about stored objects is ultimately controlled by what this file
declares.
Reference documentation for the datasetd configuration
format: - https://caltechlibrary.github.io/dataset/datasetd.1.html
- https://caltechlibrary.github.io/dataset/datasetd_api.5.html
host: localhost:8112
collections:
- ...| Field | Purpose |
|---|---|
host |
The TCP address datasetd listens on. The middleware
(cold.ts) defaults its --apiUrl to
http://localhost:8112. They must match. |
collections |
List of dataset collection configurations. Each entry exposes one
.ds directory as a REST resource. |
Each item under collections has this shape:
- dataset: <name>.ds
query:
<query_name>: |
<SQL>
keys: true|false
create: true|false
read: true|false
update: true|false
versions: true|falseThe path to the dataset collection directory, relative to the working
directory where datasetd is started. The collection name
(without .ds) becomes the first segment of every URL path
for that collection: /api/<name>/....
A map of named SQL queries. Each query is a SQLite statement that
operates on the collection’s internal src column, which
holds the JSON object for each record. Results are returned as a JSON
array.
Named queries are exposed at:
GET /api/<collection_name>/_query/<query_name>
Parameterized queries use ? placeholders. Parameters are
passed as a JSON body or as query-string key-value pairs, depending on
how the caller constructs the request. The middleware’s
browser_api.ts builds query-string calls; internal
TypeScript handlers use the Dataset client from
deps.ts.
| Field | HTTP method enabled | datasetd endpoint |
|---|---|---|
keys: true |
GET | /api/<name>/keys |
create: true |
POST | /api/<name>/ |
read: true |
GET | /api/<name>/<key> |
update: true |
PUT | /api/<name>/<key> |
There is no delete permission declared in COLD’s
configuration. Records are never deleted through the API — they are
either versioned or left in place.
When versions: true, datasetd keeps
previous copies of each object whenever it is updated. This is COLD’s
audit trail. All collections enable versioning.
Stores person objects. Each object is keyed by clpid
(Caltech Library People ID).
Named queries:
| Query | Parameters | Returns | Used by |
|---|---|---|---|
people_vocabulary |
none | List of objects shaped for the authors vocabulary:
clpid, family_name, given_name,
identifiers (ORCID if present), affiliations
(Caltech ROR if present) |
people_vocabulary.ts,
run_people_vocabulary.bash |
people_names |
none | Compact list: clpid, family_name,
given_name, orcid, thesis_id,
directory_id |
client_api.ts getPeopleList() |
missing_bios |
none | People with a directory_user_id but an empty
bio field |
Administrative/debugging |
directory_people |
none | People with a directory_user_id (i.e., current Caltech
directory members) |
directory_sync.ts |
people_csv |
none | Full people export for the CSV report | run_people_csv.bash (via
cold_reports) |
division_people |
none | Per-person division, clpid,
orcid, family_name,
given_name |
run_division_people.bash |
people_membership |
none | Flattened list: one row per person-per-group membership | run_people_membership.bash |
get_all_clpid |
none | Ordered list of all clpid strings (for
autocomplete) |
Browser UI |
validate_clpid |
? (clpid value) |
Single-element list {clpid} if found, empty if not |
Input validation |
lookup_clgid |
? (clgid value) |
People who belong to the group with that clgid |
client_api.ts lookupGroupMembership(),
browser_api.ts |
Notes on people_vocabulary: The query
uses a CASE expression to include an ORCID identifier only
when length(src->>'orcid') > 1, which guards
against empty strings. Caltech affiliation is included only when the
ror field equals the Caltech ROR
(https://ror.org/05dxps055). These business rules live in
the SQL, not in TypeScript.
Notes on lookup_clgid: This query uses
a correlated subquery against json_each(src, '$.groups') to
find all people whose groups array contains the requested
clgid. The ? placeholder is bound by the
caller.
Stores group objects. Each object is keyed by clgid
(Caltech Library Group ID).
Named queries:
| Query | Parameters | Returns | Used by |
|---|---|---|---|
group_vocabulary |
none | List of {clgid, group_name} sorted by name |
group_vocabulary.ts,
run_group_vocabulary.bash |
group_names |
none | Same shape as group_vocabulary (duplicate, retained for
compatibility) |
client_api.ts getGroupsList() |
lookup_name |
? (name pattern) |
Groups whose name or any alternative name
matches the LIKE pattern |
client_api.ts lookupGroupName() |
lookup_name_or_clgid |
?, ?, ? (name, clgid,
alternative) |
Groups matching any of: name, clgid, alternative names | utils.ts lookupGroupInfo() |
get_all_clgid |
none | Ordered list of all clgid strings (for
autocomplete) |
Browser UI |
Notes on lookup_name and
lookup_name_or_clgid: Both use
EXISTS(SELECT true FROM json_each(src->'alternative') WHERE ...)
to search inside the JSON array of alternative names. LIKE patterns
(e.g., %LIGO%) are supplied by callers.
lookup_name_or_clgid takes three ? parameters
that are all typically set to the same search string.
Stores funder objects. Each object is keyed by clfid
(Caltech Library Funder ID).
Named queries:
| Query | Parameters | Returns | Used by |
|---|---|---|---|
funder_names |
none | List of {clfid, name, ror} sorted by name |
Funder autocomplete |
Stores subject classification objects. Each object is keyed by
clsid.
Named queries:
| Query | Parameters | Returns | Used by |
|---|---|---|---|
subject_names |
none | List of {clsid, name} sorted by name |
Subject autocomplete |
Stores thesis option objects (degree program options). Each object is
keyed by option_id.
Named queries:
| Query | Parameters | Returns | Used by |
|---|---|---|---|
thesis_option_names |
none | List of {option_id, name, division} sorted by name |
thesis_option_vocabulary.ts, thesis autocomplete |
Stores journal metadata. Each object is keyed by ISSN.
Named queries:
| Query | Parameters | Returns | Used by |
|---|---|---|---|
journal_names |
none | List of {issn, name} sorted by name |
journal_vocabulary.ts, journal autocomplete |
Stores DOI prefix records. Each object is keyed by DOI prefix string.
Named queries:
| Query | Parameters | Returns | Used by |
|---|---|---|---|
doi_prefix_names |
none | List of {doi_prefix, name} sorted by name |
DOI prefix autocomplete |
This collection is the report request queue — the integration point
between the middleware and the cold_reports service.
Objects are keyed by a UUID v5 generated from the report request content
plus a timestamp.
Named queries:
| Query | Parameters | Returns | Used by |
|---|---|---|---|
report_list |
none | All report requests ordered by updated descending
(stack view) |
cold_reports.ts handleReportsList() → renders the
/reports page |
next_request |
none | Oldest report request with status = 'requested' (FIFO
queue) |
cold_reports.ts servicing_requests() |
report_list omits the
inputs field — it is a summary view for display.
next_request includes inputs
so the runner has the parameter values needed to execute the
command.
Status lifecycle:
"requested" → "processing" → "completed"
→ "error"
→ "aborting, unknown report"
The transition from "requested" to
"processing" is written by cold_reports before
executing the command; the final status is written after the command
returns.
Stores Research Organization Registry (ROR) records imported from a
ROR data dump. Each object is keyed by the ROR identifier suffix (the
part after https://ror.org/).
Named queries:
| Query | Parameters | Returns | Used by |
|---|---|---|---|
get_ror |
? (ror key) |
Full ROR record for the key | client_api.ts getROR() |
lookup_by_name_or_acronym |
?, ? (name pattern, acronym pattern) |
List of {ror, name, country, acronyms} |
client_api.ts |
lookup_by_name |
? (name pattern) |
List of {ror, name, country, acronyms} |
ROR name lookup |
lookup_by_acronym |
? (acronym pattern) |
List of {ror, name, country, acronyms} |
ROR acronym lookup |
clear_ror_data |
none | Deletes all ROR records (DELETE FROM ror) |
ror_import.ts before re-import |
Note: clear_ror_data is a
DELETE statement. In datasetd this is exposed
as a query endpoint, not a DELETE HTTP method. The
ror_import.ts tool calls it before loading a fresh ROR data
dump.
Stores records from the RDM (Research Data Management) submission review queue. Records are deposited here from the InvenioRDM system for curator review.
Named queries:
| Query | Parameters | Returns |
|---|---|---|
browse |
none | All records ordered by updated descending |
search |
? (text pattern) |
Records where the JSON source matches the pattern |
by_name |
? (name pattern) |
Records where any creator’s person_or_org.name
matches |
review_queue_by_name |
? (name pattern) |
Same as by_name but filtered to
status = 'submitted' |
by_clpid |
? (clpid) |
Records where any creator has a clpid identifier
matching the value |
review_queue_by_clpid |
? (clpid) |
Same as by_clpid but filtered to
status = 'submitted' |
by_orcid |
? (orcid) |
Records where any creator has an orcid identifier
matching the value |
review_queue_by_orcid |
? (orcid) |
Same as by_orcid but filtered to
status = 'submitted' |
by_clgid |
? (clgid) |
Records where custom_fields."caltech:groups" contains
the clgid |
review_queue_by_clgid |
? (clgid) |
Same as by_clgid but filtered to
status = 'submitted' |
review_queue_mentions |
? (text pattern) |
Submitted records with comments containing the pattern |
These queries make heavy use of json_each to search
inside nested JSON arrays (creators, identifiers, groups). The
JOIN between two json_each calls in the
by_clpid/by_orcid family is the pattern to
follow when adding new identifier-based queries.
handleBrowserAPI (called for GET /api/...)
parses the URL path using apiPathParse():
/api/<c_name>/<query_name>[?key=value&...]
It constructs a DatasetApiClient for the named
collection and calls .query(query_name, paramList, body).
The result is returned as JSON to the browser. This is how browser-side
TypeScript modules (client_api.ts,
validator.ts, collaborator_report.ts) access
named queries.
Important: browser_api.ts only handles
GET. All browser queries are read-only. Mutation goes
through the dedicated collection handlers (people.ts,
groups.ts, etc.) which call datasetd’s CRUD
endpoints directly, not through named queries.
Each collection has a handler module that uses the
Dataset class from deps.ts to call
datasetd’s CRUD endpoints:
ds.keys() →
GET /api/<name>/keysds.read(key) →
GET /api/<name>/<key>ds.create(key, obj) →
POST /api/<name>/ds.update(key, obj) →
PUT /api/<name>/<key>ds.query(name, params, body) →
GET /api/<name>/_query/<name>These are available only because the corresponding keys,
read, create, update flags are
set to true in cold_api.yaml.
Query returns unexpected results: Run the SQL
directly against the SQLite database inside the .ds
directory:
sqlite3 people.ds/collection.db "SELECT json_object(...) FROM people ..."The src column holds the raw JSON. The table name is the
collection name without .ds.
datasetd rejects a request: Check that
the corresponding permission flag (create,
read, update, keys) is set to
true in cold_api.yaml for that collection.
A new query is not accessible: After editing
cold_api.yaml, restart datasetd. It reads the
file at startup only.
Parameter mismatch: If a query uses ?
placeholders, the caller must supply exactly the right number of
parameters in the correct order. Mismatch causes a SQLite error that
appears in the datasetd log.
Port conflict: If the middleware’s
--apiUrl does not match cold_api.yaml’s
host, all API calls will fail with a connection error.
Default is localhost:8112 on both sides.
cold_api.yaml. Under the relevant collection’s
query: block, add:my_new_query: |
SELECT json_object('field', src->'field') AS src
FROM <collection_name>
WHERE src->>'some_field' = ?
ORDER BY src->>'field'Restart datasetd.
Test directly:
curl "http://localhost:8112/api/<collection_name>/_query/my_new_query" \
-H "Content-Type: application/json" \
-d '{"param": "value"}'browser_api.ts at
GET /api/<collection_name>/my_new_query?q=value, or
from a server-side handler via
ds.query("my_new_query", ["param"], {param: "value"}).dataset init newcollection.ds 'sqlite://collection.db'cold_api.yaml:- dataset: newcollection.ds
query:
list_all: |
SELECT src FROM newcollection ORDER BY src->>'name'
keys: true
create: true
read: true
update: true
versions: trueRestart datasetd.
Add a corresponding handler module (e.g.,
newcollection.ts) following the pattern in
people.ts or groups.ts, and wire it into
cold.ts’s ColdReadWriteHandler.