py_dataset
is the python version of dataset. dataset is a way of storing and organazing JSON documents
on disc. To use py_dataset
we usually import it using Python’s
“from” syntax.
from py_dataset import dataset
This provides a dataset
object to work with dataset collections.
This is an example of creating a dataset called fiends.ds, saving a record called “littlefreda.json” and reading it back.
import sys
import json
from py_dataset import dataset
# Creating our friends.ds dataset collection for the first time.
c_name = 'friends.ds'
if not dataset.init(c_name):
print(dataset.error_message())
sys.exit(1)
# Now let's add something to our collection.
key = 'littlefreda'
record = {"name":"Freda","email":"little.freda@inverness.example.org"}
if not dataset.create(c_name, key, record):
print(dataset.error_message())
sys.exit(1)
# We should have at least one record in our collection.
# This is the idiom for iterating and working with our collection
# objects.
keys = dataset.keys(c_name)
for key in keys:
p = dataset.path(c_name, key)
print(p)
# NOTE: the "read" method returns a touple!
record, err := dataset.read(c_name, key)
if err != '':
print(f"read error, {err}")
sys.exit(1)
print(f"Doc: {record}")
The command dataset.init(c_name)
, dataset.keys(c_name)
,
dataset.read(c_name, key)
dataset.create(c_name, key)
are the
main actors here. Most dataset methods require
the collection name as the first parameter. Likewise many return
some sort of value. If it is a boolean value than True means
success and False means failure. If the method returns data then
often it will be returned as a touple like with read()
.
If an error has occurred (e.g. permissions on disc raising a problem)
you can retrieve the dataset error message by using the error_message()
function. If you’re done with the error you can use error_clear()
to reset the error message queue.
Now check to see if the key, littlefreda, is in the collection
dataset.has_key(c_name, 'littlefreda')
You can also read your JSON formatted data from a file but you need to convert it first to a Python dict. In theses examples we are creating for Mojo Sam and Capt. Jack then reading back all the keys and displaying their paths and the JSON document created.
with open("mojosam.json") as f:
src = f.read().encoding('utf-8')
dataset.create(c_name, "mojosam", json.loads(src))
with open("capt-jack.json") as f:
src = f.read()
dataset.create("capt-jack", json.loads(src))
for key in dataset.keys(c_name):
print(f"Path: {dataset.path(c_name, key)}")
print(f"Doc: {dataset.read(c_name, key)}")
print("")
It is also possible to filter and sort keys from python by providing extra parameters to the keys method. First we’ll display a list of keys filtered by email ending in “example.org” then sorted by email.
print("Filtered and sorting in action")
# Get all keys
keys = dataset.keys(c_name)
# Fitler our keys
keys = dataset.key_filter(c_name, keys, '(has_suffix .email "example.org")')
for key in keys:
print(f"Path: {dataset.path(c_name, key)}")
print(f"Doc: {dataset.read(c_name, key)}")
print("")
print(f"Filtered {keys}")
# Sort our filtered list of keys
keys = dataset.key_sort(c_nane, keys, '.email')
print(f"Sorted {keys}")
for key in keys:
print(f"Path: {dataset.path(c_name, key)}")
print(f"Doc: {dataset.read(c_name, key)}")
print("")
Filter and sorting a large collection can take time due to the number of disc reads. It can also use allot of memory. It is more effecient to first filter your keys then sort the filtered keys.
print(f"Filtered, sort by stages")
all_keys = dataset.keys(c_name)
keys = dataset.key_filter(c_name, keys, '(has_suffix .email "example.org")')
keys = dataset.key_sort(c_name, keys, ".email")
for key in keys:
print(f"Path: {dataset.path(c_name, key)}")
print(f"Doc: {dataset.read(c_name, key)}")
print("")