Advanced Shell Scripting

Overview

Teaching: 60 min
Exercises: 0 min
Questions
  • How do I use if statements?

  • Can I add date to filenames in bash script?

  • Can I use file attributes like image size or properties to filter files?

Objectives
  • Learn how to use conditionals

  • Learn about getting system information into scripts

  • Learn about getting file information into scripts

Instructor note: there are intentional typos in these examples to show the importnace of spaces

Recapping scripting

We’re going to use a demonstration script from the shell-novice lesson.

Nelle’s Pipeline: Processing Files

Nelle is now ready to process her data files using goostats — a shell script written by her supervisor. This calculates some statistics from a protein sample file, and takes two arguments:

  1. an input file (containing the raw data)
  2. an output file (to store the calculated statistics)

Since she’s still learning how to use the shell, she decides to build up the required commands in stages. Her first step is to make sure that she can select the right input files — remember, these are ones whose names end in ‘A’ or ‘B’, rather than ‘Z’. Starting from her home directory, Nelle types:

$ cd north-pacific-gyre

And write a script in nano that will generate an input and output file name

$ for datafile in NENE*[AB].txt
> do
>     echo $datafile stats-$datafile
> done
NENE01729A.txt stats-NENE01729A.txt
NENE01729B.txt stats-NENE01729B.txt
NENE01736A.txt stats-NENE01736A.txt
...
NENE02043A.txt stats-NENE02043A.txt
NENE02043B.txt stats-NENE02043B.txt

She hasn’t actually run goostats yet, but now she’s sure she can select the right files and generate the right output filenames.

$ for datafile in NENE*[AB].txt
 do
     bash goostats.sh $datafile stats-$datafile
 done

When she presses Enter, the shell runs the modified command. However, nothing appears to happen — there is no output. After a moment, Nelle realizes that since her script doesn’t print anything to the screen any longer, she has no idea whether it is running, much less how quickly. She kills the running command by typing Ctrl-C, uses up-arrow to repeat the command, and edits it to read:

$ for datafile in NENE*[AB].txt
 do
     echo $datafile
     bash goostats $datafile stats-$datafile
 done

System information and variables

You can get the current date using the date command. There are lots of formatting options, but we’re going to go with the recommended year-month-day option.

date "+%F"

Let’s make a script that prints out the date. We can save the date in a variable like

date = $(date "+%F")

You probably got an error like

date: illegal time format

This is because we had extra spaces around the equals sign. This is a bit confusing, because the error is coming from the variable name we used ‘date’. Since there is a space, bash thinks that ‘date’ variable name is a command we want to run. If you use

date=$(date "+%F")
echo $date

You should get the date printed as expected

Conditionals

You can use conditional statements to test whether something is true or false and do a programmatic behavior as a result. Let’s go into the molecules directory and make a script that will show us molecules with at least a certain number of lines. Make a new script called is_big.sh

We know that wc -l gives us the number of lines in a file. Let’s save that to a variable.

num=$(wc -l $1)

We build an if statement like a loop

if ["$num" -gt "5"]
then
    echo $1 "is big enough"
fi

Does that work? You’ll probably get an error

[      30: command not found

This is again a spacing issue, but the opposite of the earlier one we saw. You need a space after the [, otherwise bash thinks it is a command. Once we fix the spacing

if [ "$num" -gt "5" ]
then
    echo $1 "is big enough"
fi

We get a different error

is_big.sh: line 2: :       30 octane.pdb: integer expression expected

We forgot to check our import. wc -l gives us the size and the file name which isn’t a number. If we redirect the file into wc it will work

num=$(wc -l < $1)

We can add else to have the script always print something

if [ "$num" -gt "5" ]
then
    echo $1 "is big enough"
else
    echo $1 "is not big enough"
fi

Activity: Make the size cutoff generalizabla

System Variables

You can set variables outside of your script that you can use in the script. This is useful for saving passwords or things that you don’t want to put in the script and don’t want to have to type at the command line every time. Let’s go back to molecules and have the size cutoff be an environment variable. First we’ll set the variable.

$ export CUTOFF=5

Then add the variable to your script

if [ "$num" -gt $CUTOFF ]

If you want variables to be set wevery time you log in, you can add them to the .bash_profile file (or .zshrc file if you’re using the most recent OSX version

Key Points

  • Shell scripts can be used to more complicated programming tasks