Caltech Library logo

dataset

dataset is a golang package for managing JSON documents and their attachments on disc or in S3 storage. dataset is also a command line tool exercising the features of the golang dataset package. A project goal of dataset is to “play nice” with shell scripts and other Unix tools (e.g. it respects standard in, out and error with minimal side effects). This means it is easily scriptable via Bash shell or interpretted languages like Python.

dataset organanizes JSON documents by unique names in collections. Collections are represented as Unix subdirectories (or paths under S3) with each collection having a series of buckets (sub-directories/sub-paths) spreading the JSON documents and their attachments across the file system (this avoids having too many JSON documents in a given directory).

Operations

The basic operations support by dataset are listed below organized by collection and JSON document level.

Collection Level

JSON Document level

Additionally

Examples

Common operations using the dataset command line tool

    # Create a collection "mystuff" inside the directory called demo
    dataset init demo/mystuff
    # if successful an expression to export the collection name is show
    export DATASET=demo/mystuff

    # Create a JSON document 
    dataset create freda.json '{"name":"freda","email":"freda@inverness.example.org"}'
    # If successful then you should see an OK or an error message

    # Read a JSON document
    dataset read freda.json

    # Path to JSON document
    dataset path freda.json

    # Update a JSON document
    dataset update freda.json '{"name":"freda","email":"freda@zbs.example.org"}'
    # If successful then you should see an OK or an error message

    # List the keys in the collection
    dataset keys

    # Delete a JSON document
    dataset delete freda.json

    # To remove the collection just use the Unix shell command
    # /bin/rm -fR demo/mystuff

Common operations shown in Golang

    // Create a collection "mystuff" inside the directory called demo
    collection, err := dataset.Create("demo/mystuff", dataset.GenerateBucketNames("ab", 2))
    if err != nil {
        log.Fatalf("%s", err)
    }
    defer collection.Close()
    // Create a JSON document 
    docName := "freda.json"
    document := map[string]string{"name":"freda","email":"freda@inverness.example.org"}
    if err := collection.Create(docName, document); err != nil {
        log.Fatalf("%s", err)
    }
    // Read a JSON document
    if err := collection.Read(docName, document); err != nil {
        log.Fatalf("%s", err)
    }
    // Update a JSON document
    document["email"] = "freda@zbs.example.org"
    if err := collection.Update(docName, document); err != nil {
        log.Fatalf("%s", err)
    }
    // Delete a JSON document
    if err := collection.Delete(docName); err != nil {
        log.Fatalf("%s", err)
    }

Releases

Compiled versions are provided for Linux (amd64), Mac OS X (amd64), Windows 10 (amd64) and Raspbian (ARM7). See https://github.com/caltechlibrary/dataset/releases.