Chapter 2 Documentation

Documentation has sometimes been called “a love letter to your future self” as it helps you remember important details about your research data. The great thing about research documentation is that it’s not limited to a laboratory or research notebook, though notebooks are still very important! This chapter introduces two types of useful documentation – a project-level README.txt and a data dictionary – and offers worksheets for writing both. The chapter also includes a worksheet to evaluate an older entry in your laboratory notebook to ensure your documentation is of sufficient quality.

2.1 Evaluate a Laboratory Notebook

Description: The laboratory or research notebook is a fundamental documentation method for many researchers. But for how ubiquitous the lab notebook is, documentation can sometimes be lacking. The ideal laboratory notebook allows someone with similar training as you to be able to follow everything you did in your research. This exercise prompts you to review an old entry within your laboratory notebook to evaluate if your documentation is sufficient for reproducing your work.

Instructions: You will need a laboratory notebook entry from 6-12 months ago to do this exercise. Once you have the entry, read through it to try to understand what you did on that day. Answer the exercise questions to evaluate the entry and identify any note keeping improvements to make.


Date of lab notebook entry being evaluated: ____________________

Read the entry and summarize the work you did on that date:

 

 

 

How easy was it to understand the work you did from your notes?

 

 

 

Could you reproduce your work based on the information in your notes? If not, what extra information do you need?

 

 

 

What worked well with your note keeping?

 

 

 

What should you improve about your note keeping?

 

 

 

List one change you will implement to take better research notes:

 

 

 

2.2 Write a Project-Level README.txt

Description: Data files living on a computer often need extra documentation for someone to understand what research they correspond to. In particular, it is useful to record the most basic project information and store it in the top-level folder of each research project. This can be done with a README.txt. The name, “README,” indicates that the file conveys important information and the file type, TXT, can be opened by many different software programs, making the content maximally accessible. This exercise walks you through the key information needed in a project-level README.txt file. The same information can also be recorded at the front of a physical laboratory notebook.

Instructions: Pick a research project and answer the following questions. Copy all of the text into a TXT file and save it with the name “README.txt.” Store this file in the top-level of the project folder on your computer, alongside the project files.

Source: This exercise was adapted from the “Project Close Out Checklist” (K. A. Briney, 2020b).


Project documented by this README.txt: ____________________

Write a brief project description:

Example: The Data Doubles project was a 4-year, IMLS-funded research project examining student perceptions of privacy in library learning analytics.

 

 

 

What is the time period the project was done over?

Example: The project was conducted between summer 2018 and summer 2022.

 

 

 

Who worked on the project?

Example: Eight researchers from eight different instutions worked on the project, including: KMLJ, AA, KB, AG, MP, MR, DS, and MS.

 

 

 

Where are the data, code, and other files are stored?

Example: Research files are stored on Google Drive, with the exception that participant data is stored in IU-hosted Box. Survey data is also in Qualtrics. Code is on GitHub. The shared literature library is in Zotero.

 

 

 

How and where is the project documented?

Example: Documentation for the project is in Google Drive. Notes on team decisions from meetings are in the DataDoubles/Meetings folder. Notes on data are in the DataDoubles/Research folder.

 

 

 

How are files organized? Are any naming conventions used and, if so, what are they (see Chapter 3)?

Example: All data is in the DataDoubles/Data folder, with subfolders labelled by interview theme code. Each site has its own folder within the project folder for individual site files. Interview data files are named with: interview theme, site, interview ID, interview date, and data type/analysis stage (e.g. “PRO_BL03_20180222_Audio.mp3” and “AWA_MK01_20180222_Notes.pdf”). Please see the living data management plan for complete set of codes and more details.

 

 

 

What else does someone need to know to understand these files?

Example: Additional documentation on the project and public research files are available on OSF.

 

 

 

2.3 Create a Data Dictionary

Description: Ideally, a spreadsheet is formatted with a row of variable names at the top, followed by rows of data going down. This makes easy for data to be used in any data analysis software (interoperability is a good thing) but makes it impossible to document a spreadsheet within the file itself. For this reason, it’s useful to create a data dictionary to describe the spreadsheet so that others can interpret the data. This exercise walks you through the major information you should record for each variable in the spreadsheet, adding up to a complete dictionary to accompany the spreadsheet file.

Instructions: Pick one spreadsheet variable and record its information in the corresponding rows of the table. Repeat this process for the remaining variables in the spreadsheet. Copy all information into a text document and save it next to the spreadsheet. It is useful to save the data dictionary with the same root name as its data file by appending “_dictionary” on the end of the file name; for example, the data dictionary for the file “myData.xlsx” would be “myData_dictionary.txt.”

Source: This exercise was adapted from “Leveling Up Data Management” (K. A. Briney, 2023).


Question Example
Variable name site
Variable description Two-letter abbreviation describing the name of the overall site where the sample was collected.
Variable units N/A
Relationship to other variables Partner to variable “sampleNum,” which together define the sample ID (site name + sample number at that site). Related to variables “latitude” and “longitude,” which record exact coordinate location and are more specific than the larger site code.
Variable coding values and meanings Coding values and meanings: BL = Badlands NP; DV = Death Valley NP; GT = Grand Teton NP; JT = Joshua Tree NP; ZN = Zion NP
Known issues with the data Some Badlands samples were collected outsideof the park boundaries; see latitude and longitude variables for specific locations.
Anything else to know about the data? Older data (pre-2013) used one-letter abbreviations for site code but this was updated for clarity and ease of identification.

 

Question Variable
Variable name                                                                                                          
Variable description                                                                                                  
Variable units                                                                                                          
Relationship to other variables                                              
Variable coding values and meanings                                              
Known issues with the data                                                          
Anything else to know about the data?                                              

References

Briney, K. A. (2020b). Project Close-Out Checklist for Research Data. https://doi.org/10.7907/yjph-sa32
Briney, K. A. (2023). Leveling Up Data Management. https://doi.org/10.7907/syk7-3z92