Release 2.3.4
- Deprecated MySQL support in documentation, will depreciate MySQL
support in future release
- Improved loading large JSON objects from jsonl files
- Fixed issue #164 where the queries in COLD would work in v2.2.0 but
fail in v2.3.x.
- Removed support for SQL parameters in dsquery due to encoding issues
and lack of practical use cases
- Removed duplicated code from dsquery.go and api_routes.go in favor
of collection.go’s implementation of query functionality.
- Added tailing semi-colon removal for SQL queries due to changes in
behavior of SQLite3 driver
Release 2.3.3
- Issue #161 fix for handling GET with query were data is passed via
URL parameters.
- Removed support for frame, clone, sample, sync and join support
removed.
- The dsimporter cli removed (use jsonl dump and load instead).
- Improved loading large JSON objects from jsonl files
Release 2.3.2
Frame, clone, sample, sync and join support removed. The dsimporter
cli removed (use jsonl dump and load instead). Fixed a bug in using the
JSON API with GET where command line parameters were not getting
processed. Also added support for standard encoding of web forms in
queries.
Release 2.3.1:
Documentation and tests have been removed for frames, clone, and
sample. Frame support was removed from datasetd.
Release 2.3.0:
This release fixed problems related to how a collection might
incorrectly return a unknown type. v2.3.x cycle will depreciated
features in preparation for v2.4.
Release 2.0.0-b06:
This release is a rewrite of version 1 focusing on removing features,
clearifying useful concepts and abstracting the storage engines cleanly.
The latter was done to allow the web implementation of dataset to
achieve an appropriate performance and be able to scale to a larger
number of collections and size of collections.
The dataset collection’s structure has changed.
- a dataset is a directory containing a collection.json and
codemeta.json file
- the collection.json no longer contains general metadata or maps to
the keys and pairtree, it focuses on operational settings (e.g. storage
type and access information)
- when running dataset as a web service or on a shared user machine
you can setup the database connection through the environment. I.e. set
DATASET_DSN_URI value (DSN URI is formed with a protocol named for the
SQL driver, a “://” and the DSN for that driver,
e.g. “mysql://DB_USER:DB_PASSWD@/DB_NAME”)
- a codemeta.json file is now used for holding general collection
level metadata. codemeta has
been adopted by the data science community for describing data and
software
- additional JSON configuration files may be used to manage the
collection dependent on storage engine
- Experimental features in dataset v2
- Using SQL JSON columns to store our JSON documents in a relational
database
- SQLite 3
- MySQL 8
- Posgres 14.5 (very experimental, still learning Postgres)
- Object level versioning
- Attachment versioning
- dataset package to support Async, multi-process access via JSON
stored in SQL databases
Golang package changes:
- Minimum Go version is now 1.19
- The dataset v2 package has been substantially reorganized and
simplified, most things have changed
- Collection.Init() now takes two parameters, collection name, an an
optional DSN URI, if a DSN URI is provided it’ll define the storage
engine, e.g. a dataset.SQLSTORE)
- Collection.DocPath() removed, doesn’t make sense anymore since JSON
may be stored in a SQL table
- Collection.Read() only takes two parameters, not three
- Collection.Keys() returns a list of keys and an error value
- Collection.KeyExists() was renamed Collection.HasKey() to be more
idiomatic in Go
- Collection.FrameExists() was renamed Collection.HasFrame() to be
more idiomatic in Go
- Collection.Length() returns an int64 rather than an int
- Collection.MetadataJSON() renamed Collection.Metadata() returns the
codemeta JSON for the collection
- Collection.UpdateMetatada() has been added, takes the name of a
codemeta.json to replace the existing codemeta content for the
collection.
libdataset:
- The C shared library implementation has been dropped for now do to
the challenges of easily cross compiling releases
CLI changes:
- options have be restructured so that most come after the verb
- help has been restructured to better support focusing the help text
on the task needed
- the command line version is single user, single process and be
default assumes pairtree storage.
- Experimental features
- JSON stored in SQL database via JSON column types
- SQLite3
- MySQL 8
- Postgres 14.5
Web Service changes:
- the web service is generally RESTful so the end points no longer map
directly to the client syntax
- it is recommended to use SQL storage for your dataset collections
explosed using the web service (e.q. SQLite3 or MySQL). This is because
a pairtree collection doesn’t provide docuemnt locking
- relies on SQL JSON columns for storage
- Access to the SQL storage engine is through either the environment
or a URI expressing a storage type as protocol and a data source name to
making the connection
Release 1.1.0:
Added attachment support for datasetd.
Updated the metadata fields to include richer PersonOrOrg data
structures for author, contributor, funder as well as added an
annotation map field for custom metadata.
Added “MetadataJSON()” function for Collection to quickly copy out
the metadata values from a collection.
c, err := dataset.Open("MyData.ds")
...
defer c.Close()
fmt.Printf("%s", c.MetadataJSON())
Added “MetadataUpdate()” function to update a collection’s
metadata.
c, err := dataset.Open("MyData.ds")
...
defer c.Close()
meta := new(Collection)
meta.Description = "A test dataset"
meta.Version = "1.0.0"
meta.Author = []*PersonOrOrg{
&Person{
Type: "Person",
GivenName: "Jane",
FamilyName: "Doe",
},
}
err = c.MetadataUpdate(meta)
...
Depreciated dependency on namaste package and Namaste support in
command line tools. Removed “collections.go and collections_test.go”
from repository (redundant code). Updated libdataset/libdataset.go to
hold functions that were needed for the C-Shared library from
collections.go. The Namaste fields in the collection’s metadata are now
depreciated.
The dataset.Init() now places a lock file in the collection directory
and leaves the collection in an “Open” state, it should be explicitly
closed after Init is called.
E.g.
c, err := dataset.Init("MyData.ds")
...
defer c.Close()
Removed “set_*” for collection metadata fields from libdataset.go.
These should be set using the dataset command line tool only.
The dataset.Analyzer() and dataset.Repair() commands expect the
dataset collections to be closed before being called. E.g..
c, err := dataset.Open("MyData.ds")
...
c.Close()
err := dataset.Analyzer("MyData.ds", true)
if err == nil {
c, err = dataset.Open("MyData.ds")
...
}
Release 1.0.2:
Added support for datasetd, a localhost web service
for dataset collections. The web service supports a subset of the
command line tool.
Both datasetd and dataset command
line program now include a “lock.pid” file in the collection root. This
is to prevent multiple processes from clashing when maintaining the
“collections.json” file in the collection root.
Migrated cli package into dataset repository sub-package
“github.com/caltechlibrary/dataset/cli”. Eventually this package will be
replaced by “datasetCli.go” in the root folder.
In the dataset command line program the verb “detach” has been
renamed “retrieve” better describe the action. “detach” is depreciated
and will be removed in upcoming releases.
Release 1.0.1:
- Keys are stored lowercase
- Removed filtering and sorting options from dataset and
libdataset
- Use pairtree 1.0.2 configurable separator
- Added check and repair for migrating to case insensitive keys and
path
- Updated required packages to latest releases
- Added notes about Windows cmd prompt issues when providing JSON
objects on command line
- Added M1 support for libdataset
Release 1.0.0:
Release 0.1.11:
- Requires go1.16 compilation
- Add macOS M1 compiled binaries
Release 0.1.10:
- Improved memory handling when handling for large attachments
Release 0.1.8:
This release focuses on minor bug fixes in libdataset.
- Removing duplicate functions:
delete_frame() has been superseded by
frame_delete()
- Renamed functions:
make_objects() has been renamed
create_objects() to be more consistent with naming
scheme.
- Build Notes:
- Golang v1.14
- Caltech library go packages
- storage v0.1.0
- namaste v0.0.5
- pairtree v0.0.4
- OS used to compiled and test
- macOS Catalina
- Windows 10
- Ubuntu 18.04 LTS
- Python 3.8 (from Miniconda 3)
- zip has replaced tar in the releases of libdataset
- Some tests fail on Windows 10 for libdataset. These will be
addressed in future releases.
Release 0.1.6:
This release focuses on minor bug fixes in libdataset. All functions
which returned an error string only now return True for success and
False otherwise. The error string can be retrieved with
dataset.error_message().
- Build Notes:
- Golang v1.14
- Caltech library go packages
- storage v0.1.0
- namaste v0.0.5
- pairtree v0.0.4
- OS used to compiled and test
- macOS Catalina
- Windows 10
- Ubuntu 18.04 LTS
- Python 3.8 (from Miniconda 3)
- zip has replaced tar in the releases of libdataset
- Some tests fail on Windows 10 for libdataset. These will be
addressed in future releases.
Release 0.1.5:
This release focuses on refine function names, simplification and
easy of testing for Windows 10 deployments.
- Build Notes:
- Golang v1.14
- Caltech library go packages
- storage v0.1.0
- namaste v0.0.5
- pairtree v0.0.4
- OS used to compiled and test
- macOS Catalina
- Windows 10
- Ubuntu 18.04 LTS
- Python 3.8 (from Miniconda 3)
- zip has replaced tar in the releases of libdataset
- Renamed functions:
- collection_status() is now collection_exists()
- Depreciated functions and features:
- S3, Google Cloud Storage support dropped.
- grid(), if you need this create a frame first and use
frame_grid().
- Some tests fail on Windows 10 for libdataset. These will be
addressed in future releases.
Release 0.1.4:
This release has breaking changes with release v0.1.3 and early. Many
functions in libdataset have been renamed to prevent collisions in the
environments using libdataset C-shared library. Most function names now
have two parts separated by a underscore (e.g. status has become
collection_status, repair has become collection_repair).
Google Sheet integration has been dropped. It was just more trouble
then it was worth to maintain.
Tests from py_dataset now have been ported to the test library for
libdataset.
Redundant functions have been removed (we had accumulated multiple
definitions for the same thing in libdataset). Where possible code has
been simplified.
Most libdataset functions will cause an “open” on a dataset
collection automatically. Some additional functions around collections
have been added primarily to make testing easier
(e.g. open_collection(), is_open(), close_collection(),
close_all()).
Functions that were overloaded via optional parameters have been
simplified. E.g. keys() now returns all keys in collection, use
key_filter() and key_sort() accordingly.
- Dropped support for GSheet integration
- Only support pairtree layout of collection
- cleaned up libdataset API focusing on removing overloaded
functions
Release 0.1.3:
Release 0.1.2:
- Persisting _Attachments metadata when updating with clean objects
using the same technique as _Key
Release 0.1.1:
- Fixed problem where keys_exist called before an open command.
Release 0.1.0:
- Updated libdataset API, simplified func names and normalized many of
the calls (breaking change)
- libdataset now manages opening dataset collections, inspired by
Oberon System file riders (breaking change)
- Added Python test code for libdataset to make sure libdataset
works
- Added support for check and repair when working on S3 deployed
collections
- Refactored and simplified frame behavior (breaking change)