Chapter 6 Data Sharing

Sharing data that underlies research has become a common expectation within scholarly research. However, the landscape of data repositories is still uneven and many researchers are still learning best practices for data sharing. To help in this area, this chapter offers of two exercises: a decision tree-inspired worksheet for picking the best data repository for your data; and checklist for working through the process of sharing data in a data repository.

6.1 Pick a Data Repository

Description: It can be difficult to know where to share research data as so many sharing platforms are available. Current guidance is to deposit data in data repository that will give you a DOI or similar permanent identifier. This exercise guides you through the process of picking a data repository, starting with repositories for very specific types of data and defaulting to generalist data repositories. Note that some repositories charge fees for deposit, most often for large data (500 GB or larger).

Instructions: Identify the data that needs to be shared and work through repository selection from discipline-specific data repositories to more general data repositories. Once you have identified a repository for all of your data, deposit the data and record the corresponding permanent identifiers. Note that, depending on data types, you may need to deposit your data into multiple repositories (for example, a discipline-specific repository for one type of data and an institutional data repository for the rest of the data).

Workflow diagram upon which the exercise is based. Starting at the top, decide is there is a known disciplinary data repository (if so, deposit data), a logical disciplinary data repository (if so, deposit data), an institutional data repository (if so, deposit data), and if none of those work pick a generalist data repository.


1. Identify all of the data that needs to be shared.

Example: My data to be shared includes: 1) genetic data for Drosophila; and 2) microscope images of flies.

 

 

 

2. Is there a known disciplinary data repository for some or all of the data? For example, is there a data repository used by everyone in your research area or required for your data type by your funding agency?

If so, deposit some or all of your data there. Go to step 7 if the repository will accept all of your data or go to the next question if there is still some data left to deposit.

Example: The database FlyBase is used for Drosophila genes and genomes. My genetic data will be shared there.

 

 

 

3. Review the list of recommended data repositories from PLOS (PLOS ONE, 2023). Is there a logical disciplinary data repository for some or all of your data?

If so, deposit some or all of your data there. Go to step 7 if you have shared all of your data or go to the next question if there is still some data left to deposit.

Example: There isn’t a logical disciplinary data repository for microscope images of flies.

 

 

 

4. Does your institution have a data repository?

If so, deposit the remainder of your data there and go to step 7.

Example: The California Institute of Technology hosts the data repository CaltechDATA. I will deposit my microscope images in CaltechDATA.

 

 

 

5. Do you have a preferred generalist data repository (NIH, 2023)?

If so, deposit the remainder of your data there and go to step 7.

Example: [All data has been shared already.]

 

 

 

6. Pick a generalist data repository (NIH, 2023) and deposit the remainder of your data.

Deposit your data and go to step 7.

Example: [All data has been shared already.]

 

 

 

7. Record the permanent identifier, ideally a DOI, from each data deposit.

DOIs make data FAIR (Wilkinson et al., 2016) and aid with data sharing compliance. If you did not receive a permanent identifier (such as a DOI, permanent URL, etc.) during deposit, return to step 2 and pick a different data repository for your data.

Example: CaltechDATA provides DOIs for all deposits; my permanent identifier is doi.org/10.22002/XXXXXXXXXXX. FlyBase provides stable links to data reports using FlyBase ID numbers; my permanent identifier is flybase.org/reports/FBXXXXXXXXX.

 

 

 

6.2 Share Data

Description: Data sharing is becoming common and expected by funding agencies and journals. While the process of depositing data into a data repository will vary between repositories, there are some common actions that should be taken to prepare data for sharing. This exercise walks you through these standard requirements for sharing data.

Instructions: This checklist enumerates the necessary steps and decisions to deposit data supporting a research article into a data repository. Identify the data to be shared and work through the list. Note that, if data will be shared as multiple deposits or in multiple repositories, the checklist should be worked separately for each group of data.


Data Selection

__ Select the data files that reproduce published results.

__ Perform quality control on the data files.

__ Convert data in proprietary file types to open file types, as appropriate (see Exercise 7.2: Convert Data File Types).

__ Determine if data should be shared under one citation or as several citations. (Group data as makes most sense for citation and reuse. Options can include: sharing as one large group, grouping files by data type, giving large data files their own citations, etc. Each citation represents a unique deposit into a data repository.)

__ If there will be multiple deposits in one repository or data will be divided across more than one data repository, work through the remainder of the checklist separately for each citation/group of files.

Data Documentation

__ Document any spreadsheet data with a data dictionary (see Exercise 2.3: Create a Data Dictionary). The data dictionary should be shared with the other files.

__ Write a brief description of each data file, including any data dictionaries, and what it contains. Save this information as a README.txt file and share it with the other files.

Sharing Information (Metadata)

__ Give the dataset a title. Default title is “Data from: [name of the article].”

__ Write a brief description of the dataset to be used as the abstract/description.

__ Write down keywords/subject terms for the data.

__ Determine who will be listed as authors of the data and in what order; this may be different than the authors of the article.

__ Identify author ORCID numbers for submission (note: this is best practice but not all data repositories have integrated ORCID yet).

__ Record all funding information that applies to the dataset.

__ Chose a license for reuse rights. Default license is CC0 (for more information on CC0, see (Creative Commons Wiki, 2014)).

Deposit Data

__ Pick a data repository/data repositories for the shared data (see Exercise 6.1: Pick a Data Repository).

__ Deposit the data and documentation files into the data repository, and fill in metadata as determined above.

__ If you are depositing a large number of datasets, contact repository administrators about potentially using an Application Programming Interface (API) to skip manual entry of duplicate metadata.

Share Data

__ Share data with its DOI or, as applicable, other permanent identifier.

__ Link the publication to its data, either in a Data Availability Statement or as a citation.

References

Creative Commons Wiki. (2014). CC0 use for data. https://wiki.creativecommons.org/wiki/CC0_use_for_data
NIH. (2023). Generalist Repositories. https://sharing.nih.gov/data-management-and-sharing-policy/sharing-scientific-data/generalist-repositories
PLOS ONE. (2023). Recommended Repositories. https://journals.plos.org/plosone/s/recommended-repositories
Wilkinson, M. D., Dumontier, M., Aalbersberg, Ij. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., Silva Santos, L. B. da, Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., … Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3(1), 160018. https://doi.org/10.1038/sdata.2016.18