Caltech Library logo

How to select random rows from a CSV file

Given the a CSV file called data.csv you select a sampling of rows with the command csvrows using a few options. In this example we will assume data.csv has a header row we want to preserve and that our resulting sample will be called sample.csv. The options we use are

Putting it all together–

    csvrows -i data.csv -o sample.csv -header=true -random=20

NOTE: If data.csv has less than 20 rows then sample.csv will include all the rows of data.csv in a shuffled order.

How -random works

csvrows reads in the entire csv file into memory, shuffles the row using Go’s rand package to calculate the rows to swap and then write out the number of rows request in the shuffled order. The randomness is limitted by the shuffle and the write of the first N shuffled rows.