Chapter 5 Data Management

While this entire workbook covers data management activities, it’s often useful to take a step back and document the data management decisions that have been made. This chapter provides exercises in documenting data management in two areas: a worksheet for writing a living data management plan (which builds on exercises from previous chapters); and a worksheet for discussing roles and responsibilities around data management with your research collaborators.

5.1 Write a Living Data Management Plan (DMP)

Description: Many researchers are aware of the two-page data management plan (DMP) for a grant application, but you may not be aware of the more useful type of DMP: a living DMP. This document describes how data will be actively managed during a project and may be updated whenever necessary to reflect current data practices. A living DMP is a useful touchstone for understanding where data lives, how it’s labelled, how it moves through the research process, and who will oversee the data management. This exercise guides you through the process of creating a living DMP for your research.

Instructions: Pick a project and answer the following questions to build your living DMP. This DMP may be changed at any time to improve practices. If you are doing collaborative research, work through this exercise with your collaborators to agree on shared conventions.

Write a short summary of the project this DMP is for:

Example: This project uses mass spectrometry to identify isotopic composition of soil samples.

Where will data be stored? How will data be backed up? (See Exercise 4.1: Pick Storage and Backup Systems.)

Example: The data is generated on the mass spectrometer then copied to a shared lab server. The server is backed up by departmental IT.

How will you document your research? Where will your research notes be stored?

Example: Data collection and analysis is primarily documented in a laboratory notebook, organized by date. README.txt files add documentation to the digital files as needed.

How will your data be organized? (See Exercise 3.1: Set Up a File Organization System.)

Example: Each researcher has their own folder on the shared server. Data within my folder is organized in folders by sample site with subfolders labeled by sample ID. Sample ID consists of: two-letter sample site code, three-digit sample number, and date of sample collection formatted as YYYYMMDD (e.g. “MA006-20230901” and “CB012-20100512”).

What naming convention(s) will you use for your data? (See Exercise 3.2: Create a File Naming Convention.)

Example: Files will be named with the sample ID, type of measurement, and stage in the analysis process; these pieces of information will be separated by underscores. Examples: “MA006-20230901_TIMS_raw” and “CB012-20100512_SIMS_analyzed.”

Do you need to do any version control on your files? How will that be done?

Example: Version control will be very simple through file naming, appending analysis information onto the end of file names to keep track of which version of the file it is.

How will data move through the collection and analysis pipelines?

Example: Once data is collected on the mass spectrometer, I will copy it to the correct folder on the shared server for analysis. Data will stay in its sample ID-labeled folder as it gets analyzed, with different file names to annotate analysis stage. Data that will be published will be copied into separate folders, organized by article.

Record any project roles and responsibilities around data management:

Example: It is each researcher’s responsibility to ensure that data moves through the analysis pipeline and is labeled correctly. The lab manager will ensure that the shared server stays organized and will periodically check that backups are working.

Record any other details on how data will be managed:

Example: Copies of this DMP will live in my top-level folder on the lab server so that others can find and use my data as needed.

5.2 Determine Data Stewardship

Description: It is often helpful to be up front about requirements and permissions around research data. This exercise encourages you to discuss these issues with supervisors and peers to make sure that there are no misunderstandings about who has what rights to use, retain, and share data.

Instructions: Determine which research data should be discussed. Bring together the Principle Investigator, the researcher collecting the data, and anyone else who works with that data. As a group, answer the questions in the exercise, making sure that everyone agrees on the final decisions. Record the results of the discussion and save them with the project files.

Source: This exercise was adapted from the “Project Close Out Checklist” (K. A. Briney, 2020b).

Who is participating in this discussion?

Example: This discussion includes the graduate student who collected the data, the project Principle Investigator (PI), and the laboratory manager.

What data is being discussed?

Example: This discussion covers all of the data collected by the graduate student during their time at the university.

Are there security or privacy restrictions on the data and, if so, what are they?

Example: Some of the research data includes human subjects data. This data must be held securely with limited data sharing, as outlined in the IRB protocols.

Are there intellectual property limitations on the data and, if so, what are they?

Example: There are no intellectual property concerns for the data.

Are there any requirements to publicly share the data and, if so, what are they?

Example: This research was funded by the NIH, which requires data sharing. The laboratory plans to share all data reproducing published results with the exception of the human subjects data.

Who will store the copy of record of the data and for how long?

Example: The project PI will retain the copy of record of the data for at least 3 years after the end of the grant award, with an ideal 10 year retention period.

Who is allowed to keep a copy of the data after the project ends? Which data?

Example: The graduate student may keep a copy of all data except the human subjects data after they leave the university.

Who is allowed to reuse the data after the project ends? Which data? Are there any requirements for reuse, such as co-authorship?

Example: The graduate student may reuse and publish with the data collected during their time at the university but must offer co-authorship of any papers using the data to the project PI and any relevant lab members.

Who keeps any physical research notebooks after the project ends?

Example: The PI will keep all physical laboratory notebooks but the graduate student may make copies to retain for their personal records.

References

Briney, K. A. (2020b). Project Close-Out Checklist for Research Data. https://doi.org/10.7907/yjph-sa32