Caltech Library logo

About And/Or

Another digital / Object repository

This is a concept document for a very light weight digital object repository implemented as "a multi-user version of dataset with a web based GUI". It targets the ability to curate metadata objects outside the scope of our existing repository systems.

And/Or is based on dataset. It is a web JSON API plus static HTML, CSS and JavaScript providing a web GUI interface for curating objects. It is intended to function as a microservice running behind a standard web server like Apache or NginX and assessible via a reverse proxy configuration. For the purposes of a proof of concept the minimum is system is And/Or web server1 providing static web hosting, BasicAUTH authentication and supporting multi-user interaction with dataset collections.
Part of the design concept is that additional functionality would be provided by other microservices or systems2.

And/Or is a extremely narrowly scoped web service. The focus is ONLY on currating JSON objects.

Limiting And/Or's scope leads to a simpler system. Code is limited to And/Or web service plus the HTML, CSS and JavaScript needed for an acceptable UI3.

This architecture aligns with small machine hosting and cloud hosting. Both keeping recurring costs to a minimum. And/Or could be run on a tiny to small EC2 instance or on hardware as small as a Rasbpberry Pi.

Goals

Project Assumptions

Limiting features and complexity

Some of the most complicated parts of digital object repositories are managing customization, managing users, manage roles, manage permissions, enforcing storage scheme and presenting public can private views of respository content. And/Or's simplification involves avoiding functionality provided by other systems and relocating requirements to an appropriate external system while only focusing on the narrow problem of curating objects.

Examples--

A web browser can create the illusion of a unified software system or single process. A single application is not required to support all desire functionality (e.g. curration and public web consumption) because And/Or uses a composible model available to all web applications. Customization is limited to static HTML, CSS and JavaScript or deferred to other micro services and external systems (e.g. looking up a record on datacite.org or orcid.org)

Some features are unavoidable in curation tool. Repositories run on the assumption of users and roles. Interestingly it doesn't require users and roles be manage through the web. Setting up users and roles can be managed through simple to implement command line tools and configuration files. This is reasonable in large part because And/Or off loads identify management and can be restarted quickly (i.e. configuration files are easily parsed and reloaded).

By focusing on a minimal feature set and leveraging technical opportunities that already exist we can radically reduce the lines of code written and maintained.

Under the hood

And/Or's JSON document storage engine is dataset.

End points map directly to existing dataset operations

dataset operations supported in And/Or are "keys", "create", "read", "update", "delete". These map to URL paths each supporting a single HTTP Method (either GET or POST).

And/Or is a thin layer on top of existing dataset functionality. E.g. dataset supplies attachment versioning And/Or does not. That functionality but could easily be added. The idea is that as dataset matures and gains the abilities useful in a multi-user context And/Or would be enhanced to support the additional dataset features by mapping them to an appropriate URL end point. Example, if adding versioning to JSON documents (e.g. stored diffs of JSON documents4) as added to dataset, that functionality was available in dataset it could be included in And/Or.

Web UI

Four pages static web pages need to be designed per collection and implemented in HTML, CSS and JavaScript for our proof of concept.

  1. Login and landing page
  2. Display List records (filterable by object state)
  3. An edit page that supports CRUD operations
  4. Page to display the logged in user roles

Limited functionality is intentional

And/Or is NOT for public facing content system (e.g. things Google, Bing, et el. should find and index) Machanisms for public facing content should be deployed separately by processes similar to how feeds.library.caltech.edu works. This keeps And/Or simple with fewer requirements.

Examples of composibility

When listing a large collection objects prudence suggests the need for paging. After retrieving all keys we can implement paging by using the "read" method with a list of keys we want to view. This allows us to segment a large collection into manageable chunks.

A search interface could be created as a microserve in the manner of Stevens' Lunr demo for Caltech People. If And/Or and the search microserver are behind the same web server you could present both services using a common URL namespace (Apache or NingX are good candites from a front facing web server integrating And/Or and your search system).

User/role/object state is a simple model

An authenticated user exposes their user id to And/Or's web service. The web service can then retrieve the available roles that scope the permissions the user has to operate on objects in a given set of states. The role can also be used to define which objects we show the user. This can be implemented with a small number of functions such as getUsername(), getUserRoles(), isAllowed() and canAssign().

Once authorization is calculated then approvided actions can be handle with simple HTTP handlers that perform a simple task mapping to an existing dataset function (e.g. keys, create, read, update, delete).

A special case of deleting objects

While And/Or service can delete objects it's more prudent to take the EPrints approach and define "delete" as a specific object state. This way you could treat deleted objects as being in a trashcan and leave actual deletion for a garbage collection routine. This approach would make deletion work like a Mac's trashcan and fully deleting objects would be accomplished by a separte process performing emptying the trash5.

Footnotes


  1. NginX and Apache could provide authentication mechanisms such as Basic AUTH, Shibboleth and OAuth 2 and pass them back to a real And/Or implementation.
  2. Public websites can be generated feeds.library.caltech.edu, a search interface can be implemented with Lunr.
  3. UI, user interface, the normal way a user interacts with a website
  4. This could be done in the manner of EPrints which can show a diff of the EPrint XML document
  5. Empting the trash boils down to traversing all collecting the keys of objects that are in the ._State == "deleted" and then removing the content from disc.