Caltech Library logo

Metadata Tools

This project was inspired by the idutils Python package. This package is not a port of the Python idutils. This project focuses on the identifiers Caltech Library needs for the COLD project.

For each supported identifier three types of methods are implemented.

normalization
this will return a valid identifier into a regularize format for validation and verification
validation
this checks if the form of the identifier string makes sense, e.g. does it look like an ORCID is expected to look?
verification
verification checks with the authoritative organization to see if data is available for the identifier. In most cases that means the organizations that supports creating and managing the identifier (e.g. orcid.org, isni.org, ror.org).

Supported identifiers

USAGE

There are three ways to use Metadata Tools. The first is via the command line program mdt. This simple command line program accepts three parameters, ACTION, ID_TYPE and IDENTIFIER. See man page for details. An ACTION can be one of three things, “normalize”, “validate” and “verify”. The second parameter is one of the types listed above. The final parameter the identifier value. “normalize” will accept an identifier in a common format variation and return a consisantly formated result. E.g. The ORCID “https://orcid.org/0000-0003-0900-6903” will be normalize to “0000-0003-0900-6903” where as an ROR of “05dxps055” will normalize to “https://ror.org/05dxps055”. The normalization appplied reflects the typical way we handle the idenfier formatting in Caltech Library’s repository systems.

“validate” will check of the identifier matches and expected form. E.g. An order looks like four digits dash four digits dash four digits dash for digits. There is some check summing also applied to validate the sequence (like with ISNI). Validate will make sure the supplied identifier confirms to those types of rules. It does not verify the value with the originating organization.

“verify” will validate the identifier supplied then check via a web service if the identifier is known to the sponsoring organization or authority control. As a result it requires network access and will fail if the sponsoring organization’s website or service is unavailable.

Both “validate” and “verfiy” return booleaning true if everything checks out and false otherwise. The command line program mtd will also return a non-zero exit code if the validation or verification fails.

The second way Metadata Tools can be use is as a ESM TypeScript module. The TypeScript module you can use from Deno. The latest version of the Metadata Tools module can be found at https://github.com/caltechlibrary/metadatatools/latest/release and on the Caltech Library development group’s GitHub website, https://caltechlibrary.github.io/metadatatools/mod.ts. The mod.ts file will pull in the whole set of identifier normalization, validation and verification routines.

The third way is as a JavaScript bundle called mdt.js which you can reference from https://caltechlibrary.github.io/metadatatools/mdt.js or copy into your own site from this repository. It can be built using the Deno task bundle.

Installation

The command line program can be installed using installer scripts hosted at https://caltechlibrary.github.io/metadatatools. See INSTALL.md for details.

While Metadata Tools should be considered experimental binaries are being provided for macOS (x86_64 and M series processes), Windows 11 running on x86_64 processors and for Linux on x86_64 and aarch64 processesor.

License

See LICENSE for license details.