Another digital / Object repository
This is a concept document for a very light weight digital object repository implemented as "a multi-user version of dataset with a web based GUI". It targets the ability to curate metadata objects outside the scope of our existing repository systems.
And/Or is based on dataset.
It is a web JSON API plus static HTML, CSS and JavaScript
providing a web GUI interface for curating objects. It
is intended to function as a microservice running behind
a standard web server like Apache or NginX and
assessible via a reverse proxy configuration.
For the purposes of a proof of concept the minimum is
system is And/Or web server1 providing static
web hosting, BasicAUTH authentication and supporting
multi-user interaction with dataset collections.
Part of the design concept is that additional functionality
would be provided by other microservices or systems2.
And/Or is a extremely narrowly scoped web service. The focus is ONLY on currating JSON objects.
Limiting And/Or's scope leads to a simpler system. Code is limited to And/Or web service plus the HTML, CSS and JavaScript needed for an acceptable UI3.
This architecture aligns with small machine hosting and cloud hosting. Both keeping recurring costs to a minimum. And/Or could be run on a tiny to small EC2 instance or on hardware as small as a Rasbpberry Pi.
Some of the most complicated parts of digital object repositories are managing customization, managing users, manage roles, manage permissions, enforcing storage scheme and presenting public can private views of respository content. And/Or's simplification involves avoiding functionality provided by other systems and relocating requirements to an appropriate external system while only focusing on the narrow problem of curating objects.
Examples--
A web browser can create the illusion of a unified software system or single process. A single application is not required to support all desire functionality (e.g. curration and public web consumption) because And/Or uses a composible model available to all web applications. Customization is limited to static HTML, CSS and JavaScript or deferred to other micro services and external systems (e.g. looking up a record on datacite.org or orcid.org)
Some features are unavoidable in curation tool. Repositories run on the assumption of users and roles. Interestingly it doesn't require users and roles be manage through the web. Setting up users and roles can be managed through simple to implement command line tools and configuration files. This is reasonable in large part because And/Or off loads identify management and can be restarted quickly (i.e. configuration files are easily parsed and reloaded).
By focusing on a minimal feature set and leveraging technical opportunities that already exist we can radically reduce the lines of code written and maintained.
And/Or's JSON document storage engine is dataset.
End points map directly to existing dataset operations
dataset operations supported in And/Or are "keys", "create", "read", "update", "delete". These map to URL paths each supporting a single HTTP Method (either GET or POST).
/COLLECTION_NAME/keys/
(GET) all object keys/COLLECTION_NAME/create/OBJECT_ID
(GET) to creates an Object, an OBJECT_ID must be unique to succeed/COLLECTION_NAME/read/OBJECT_IDS
(GET) returns on or more objects, if more than one is requested then an array of objects is returned./COLLECTION_NAME/update/OBJECT_ID
(POST) to update an object/COLLECTION_NAME/delete/OBJECT_ID
(POST) to delete an objectAnd/Or is a thin layer on top of existing dataset functionality. E.g. dataset supplies attachment versioning And/Or does not. That functionality but could easily be added. The idea is that as dataset matures and gains the abilities useful in a multi-user context And/Or would be enhanced to support the additional dataset features by mapping them to an appropriate URL end point. Example, if adding versioning to JSON documents (e.g. stored diffs of JSON documents4) as added to dataset, that functionality was available in dataset it could be included in And/Or.
Four pages static web pages need to be designed per collection and implemented in HTML, CSS and JavaScript for our proof of concept.
And/Or is NOT for public facing content system (e.g. things Google, Bing, et el. should find and index) Machanisms for public facing content should be deployed separately by processes similar to how feeds.library.caltech.edu works. This keeps And/Or simple with fewer requirements.
When listing a large collection objects prudence suggests the need for paging. After retrieving all keys we can implement paging by using the "read" method with a list of keys we want to view. This allows us to segment a large collection into manageable chunks.
A search interface could be created as a microserve in the manner of Stevens' Lunr demo for Caltech People. If And/Or and the search microserver are behind the same web server you could present both services using a common URL namespace (Apache or NingX are good candites from a front facing web server integrating And/Or and your search system).
An authenticated user exposes their user id to
And/Or's web service. The web service can then
retrieve the available roles that scope the permissions
the user has to operate on objects in a given set of states.
The role can also be used to define which objects we show
the user. This can be implemented with a small number
of functions such as getUsername()
, getUserRoles()
,
isAllowed()
and canAssign()
.
Once authorization is calculated then approvided actions can be handle with simple HTTP handlers that perform a simple task mapping to an existing dataset function (e.g. keys, create, read, update, delete).
While And/Or service can delete objects it's more prudent to take the EPrints approach and define "delete" as a specific object state. This way you could treat deleted objects as being in a trashcan and leave actual deletion for a garbage collection routine. This approach would make deletion work like a Mac's trashcan and fully deleting objects would be accomplished by a separte process performing emptying the trash5.
._State
== "deleted" and then removing the content from disc.