Caltech Library logo

Samples

dataset keys support the -sample option. The sample option expects a sample size as an argument. If the sample size is greater than zero then a sample output will be taken.

If the sample size is greater then the results returned then the who results set is return without random sampling. If sample size is less than result set then a random sampling of the results is taken.

If you are doing Machine Leaning type of sampling (e.g. calculating a test and training set) then normally you create a test key list like this

    dataset -sample="$N" keys

where

    $N

holds the test sample size. After keylist is generated you can then create a training set by excluding the keys associated with the sample.