Chapter 4 Data Storage

All research data needs to be stored and backed up, but it can be frustrating to pick these systems and ensure that they are working correctly. This chapter consists of two exercises: a worksheet to document available storage and backup options and decide between them; and a procedure for testing that a backup system is working.

4.1 Pick Storage and Backup Systems

Description: Research data needs to be stored and backed up reliably so that important data is not lost. But storage is commonly a challenge, as institutions don’t always offer uniform options for storage and backup. This exercise prompts you to examine the storage and backup systems available to you before determining which is the best set of options for your data.

Instructions: Answer the questions and then fill out the table of information about each possible storage and backup systems. Examine all of the options, evaluating them based on the criteria listed below. Then select primary storage and backup systems and, optionally, an alternate backup.


What is the estimated total data storage you will need over the next five years?

Example: I estimate that I will generate 100 GB of data over the next five years of my project.

 

 

 

Does your data require meeting any specific security standards? If so, what level of security?

Example: My data will include some human subjects data, so my storage systems must have restrictions on access but it’s not medical data so they don’t have to be HIPAA compliant.

 

 

 

What storage and backup systems are available to you, such as through your institution, workplace, or elsewhere?

Example: I have the following systems available to me: my computer, a Time Machine backup, a departmental server, institution-licensed Box account, and Google Drive.

 

 

 

Fill out the information in the table for each storage and backup system you are considering:

Question Example
System name Departmental server
Is it storage or backup? Storage
What is the cost? No cost for 10GB and under. Cost is $5 per 10 GB per year after that.
What is the hardware type? Server, exact hardware type unknown.
Is the system backed up? No backup.
For backup systems, is backup automatic? N/A
What level of security does the system provide? Storage is password protected.
Is the system local or remote? System is local.
Is there a limit to storage capacity? Storage limit is 500GB per research group.
Who manages the system? Departmental IT manages the server.
Is it easy or difficult to use? Very easy to use once set up.

 

Question System
System name                                                                                                          
Is it storage or backup?                                                                                          
What is the cost?                                                                                                  
What is the hardware type?                                                          
Is the system backed up?                                                          
For backup systems, is backup automatic?                                                          
What level of security does the system provide?                                                          
Is the system local or remote?                                                          
Is there a limit to storage capacity?                                              
Who manages the system?                                                          
Is it easy or difficult to use?                                                          

Optimize your storage and backups on the following considerations:

  1. You need a primary storage system that:
    • will hold all of your data files,
    • meets your needed level of security.
  2. You need one backup that:
    • will hold all of your data files,
    • meets your needed level of security,
    • is reliable/managed by someone you trust,
    • is easy to use,
    • backs up automatically.
  3. At least one backup should be in a different location than your main storage system for disaster resiliency. If your main backup is nearby your primary storage and/or if your primary storage system is not reliable, you need a second backup that:
    • will hold all of your files,
    • meets your needed level of security,
    • is reliable/managed by someone you trust.

Pick your storage and backup systems:

Example: My primary storage will be my computer with added security restrictions. I will use Time Machine as my first automatic backup and institutional Box, which is controlled access, as my second backup because it is remote.

 

 

 

4.2 Test Your Backup

Description: Backups are super important for your data, so it’s always good to test that your backups are still working. Nothing is worse than losing your data from your primary storage and then realizing that your backup isn’t working either. Beyond checking that your backup is working, it’s also good to know how to recover your files so that you don’t have to learn this for the first time while panicking about lost data. This short exercise walks you through getting a file off your backup to test that it is working and to learn how the data-recovery process works.

Instructions: Pick a backup system and a file to recover and work through the steps. The hard part of this exercise is finding instructions for file recovery and recovering the file, which vary by backup system.


  1. Identify where your data is backed up.
  2. Find instructions for recovering data from your backup system.
  3. Pick a data file from your computer.
  4. Follow the instructions from step 2 to get a copy of the data file from step 3 out of your backup system.
  5. If this process didn’t work, fix your backup system. If this process did work, congrats your backup is working and you know how to recover your files!