Given the a CSV file called data.csv you select a sampling of rows with the command csvrows using a few options. In this example we will assume data.csv has a header row we want to preserve and that our resulting sample will be called sample.csv. The options we use are
-i
selecting data.csv as the input source-o
sends the resulting CSV to the file named
sample.csv-header=true
indicates the header should be preserved
and not be counted as part of the sample-random
sets the number or rows to return in the
sample, in this case twentyPutting it all together–
csvrows -i data.csv -o sample.csv -header=true -random=20
NOTE: If data.csv has less than 20 rows then sample.csv will include all the rows of data.csv in a shuffled order.
csvrows reads in the entire csv file into memory, shuffles the row using Go’s rand package to calculate the rows to swap and then write out the number of rows request in the shuffled order. The randomness is limitted by the shuffle and the write of the first N shuffled rows.