Quick menu:
Biocontainers? What’s that?
Installing software on a Linux machine is almost a rite of passage. There are many forms of software installation, making it a painful task for many people. The ONLY reason I got over the doubt and anxiety of software installation is because we at NCGAS manage hundreds of packages, and that means a lot of installing and updating. Rather than spending 5 years practicing the art of software installation… wouldn’t it be nice if we had a single way to do get what we need, preferrbly without what is affectionately known as “dependency hell”?
If that sounds exciting to you, the following is the most promising community development on this front – Biocontainers.
First, let’s talk about what a container is. Containers are similar to software modules on a cluster in that they allow us to dynamically load and unload software as needed. I often describe modules as program legos – you can mix and combine as needed to build your software stack. Super convenient… for the user. Modules still require someone to go through the installation procedure (whatever that might be) for all dependencies and then the program of interest, then make a module file. All the annoying stuff is handled for you on the admin end, but it still exists.
I feel that containers bring the convenience of modules to the installation process itself. Now, I know some of you may have experience with containers and are questioning my sanity. However, we seem to have converged on a reasonably painless way via biocontainers and some brilliance from the PSC side of NCGAS.
Containers are often described more as tupperware than legos. They are self-containered bins that contain a stripped down version of an operating system, the software, and the dependencies/libraries needed to run the software. You can “bind” your directories to the container, so that the container can read in from your normal working directory, run the software in the container, and then output the results to the working directory, without having to interact with the system-at-large. No more esoteric errors about libgcc. It’s all handled in the container.
To be fair, someone still has to build the container – but, much like tupperware, the containers are easy to share across systems. Module streamline loading, containers streamline installation. So much so, I have faith that you can install them youself ^_^.
The Biocontainers project is a community effort to make a respository of software containers that are all based on a uniform set of specifications. This gives us one place to look for pre-made software containers that generally look and act the same way – meaning we have to learn ONE way to install software. Better yet, containers SHIP WITH THEIR DEPENDENCIES – meaning we only have to install one container, rather than a bunch of dependencies and the target program.
The Biocontainers project includes groups like Galaxy have contributed heavily to this project and as a result, most software we’ve needed in the last year is in the registry. Another member of the project is BioConda, meaning all containers have not only container support, but also conda installation instructions. More on that later.
Installing the Raven Biocontainer
Enough background – let’s run through installing a package and making a couple quick files to make our lives easier.
First Time Setup
First, we want to set up a place to install our software. Let’s set up a software directory. I generally install software in ~/local/bin. There are reasons for this that I will skip for now, but it’s a good start if you don’t have other strong opinions.
#make the dir if you don’t have it already
cd ~
mkdir local
mkdir bin
Now that we have a place to install stuff, let’s tell the computer where this is using PATH. PATH is like your computer’s treasure map to find software installations. It will let us do things like:
raven -h
versus
~/local/bin/raven-h
All we have to do is add one line to one file once, and then just stuff all of our installs in that folder. The file you want to edit is called ~/.bashrc. Open this file, and at the end of this file, add the following line:
export PATH=$PATH:~/local/bin
Save and exit, and that will tell the computer where to look if it cannot find the software in any of the default locations. You only have to do this once, and now everything you put in ~/local/bin will be found.
Install Using Templates
Next, we need to load in singularity. Singularity is a container handler for HPC systems. Docker is a common container handler for local OS (e.g. Ubuntu on my laptop). Docker requires some extra permissions that most HPC admins are rightfully weary of, so you are unlikely to see it on a cluster. Singularity is getting pretty common though. See the FAQs for options if you don’t have singularity.
Load singularity however you have to on your system. Most likely, it’s something similar to:
module load singularity
Now for the software. For each software package, there are a minimum of two files you will need. The first is a file to pull the container and name it in a sane fashion. The second is at least one tool wrapper. We provide templates for both below, using the raven assembler as an example. Download both of these and transfer them to your space on a cluster, or simply copy and paste the contents into a file on the system – they aren’t large files.
Let’s first look at the pull_raven.sh file:
less pull_raven.sh
This file is simply loads singularity (in case you forget) and then pulls the container from the
internet. Once it’s on the system, it renames it in a standard way – <container type>-<software>-<version>.sif.
Let’s run this and pull the container.
bash pull_raven.sh
ls
You should see a singularity-raven-*.sif file! That’s the software. You could stop here and use the container as normal. However, the syntax is a bit weird, so let’s clean that up with a wrapper script. A wrapper script is simply a script that just calls other scripts or programs in a more concise way. Let’s look at the raven wrapper script:
less raven
Here, again we are loading singularity on a cluster if we forget, and then saving some meta data about the container in variables. Then there is a line at the bottom that executes the command using the container. While this looks a bit odd, basically this allows us to use:
raven -h
#or ~/local/bin/raven -h if we hadn’t added it to PATH
instead of:
singularity exec -B /N/u/ /N/u/ss93/Carbonate/singularity-raven-1.5.1.sif raven -h
This is the part that makes most people cringe when someone mentions containers. That syntax does not just roll off the fingers. However, that wrapper makes that problem disappear. You use the software the same way you normally would and won’t even notice it’s running in a container. Let’s give this a try:
raven -h
cd ~
raven -h
Because we added our ~/local/bin to our PATH earlier, we can now use the software from anywhere on the system. If we added more containers and wrappers to this folder, then we’d automatically get access to them as well. This is why we set up a designated location for all software!
Modify and Repeat for minipolish
Great… what if you want a different container though? Let’s modify this template and grab another package that’s a wee bit more complex – minipolish:
#copy and rename
cp pull_raven.sh pull_minipolish.sh
cp raven minipolish
First, let’s get that pull file worked out. We’ll need the command to pull the container… so let’s go to biocontainers.pro and find it. You can get to the repo directly with biocontainers.pro/registry. Then search for miniasm. Find a container you like, and click on the name. That will bring you to a page that has commands for various options, one of which is singularity.
Grab the singularity command and paste it where the one for raven is. Then, swap singularity run for singularity pull. This will save the container as a file on the system, which we can then easily reference with our wrapper. You will also want to update the line that moves the container to a new, standard name. Here’s what the final file should look like:
#!/bin/bash
if [ ! $(command -v singularity) ]; then
module load singularity
fi
singularity pull https://depot.galaxyproject.org/singularity/minipolish:0.1.2–py_0
mv -v minipolish*0 singularity-minipolish-0.1.2.sif
You can see that Galaxy made this container – thanks Galaxy! Okay, let’s save and give it a whirl:
bash pull_minipolish.sh
ls
You should see a singularity-minipolish-0.1.2.sif file now. Remember the version number, we’ll need it for the wrapper.
First, update the variables in the wrapper script (minipolish file) – version (0.1.2), package (minipolish), tool (minipolish), directory (~/local/bin/). The storage variable is the only tricky one – this needs to include your working space. For instance, on IU’s Carbonate we have home, scratch, and project space. We can bind all three – “/home/u/ss93,/N/slate/ss93/,/N/project/”. The permissions all stay the same, so you can bind the full /N/slate if you want to, but in general, it’s best to be specific if you can.
You don’t need to change the last two lines, they handle all the singularity syntax with just the easy to identify variables we just updated. Now time to try it:
cd ~
minipolish -h
Now the “complex” part – sometimes we need more commands from a package (i.e. miniasm) than just the main command (i.e. miniasm). For instance, we may need minimap2, which is part of minipolish, to complete this workflow. Easy! We just make another wrapper!
Wrappers are named for the tool, not the package. Up until now, those have been the same thing – raven command with the raven package, minipolish with the minipolish package. However, this time we want to make a wrapper for minimap2 with the minipolish package. So:
cd ~/local/bin
cp minipolish minimap2
Now all we have to do is change the TOOL variable to minimap2! The version, package, directory, and storage all stay the same. Save and test:
minimap2 -h
FAQ
Is it really this simple? Just a pull file and a wrapper script from a template??
Mostly, yeah! There are some gotchas…
Gotcha 1: What if I get an error that the command is not found when I run a wrapper script?
First, you need one wrapper for each command in a software package. Sometimes it’s a bit more complicated than just changing the tool. Inside the container, there is also a PATH variable. Very seldom the full set of software is not included in the PATH inside the container. We’ve seen this in Trinity if you want to use some of the Trinity wrapper scripts for things like differential expression. What do you do here?
Let’s do a quick example. Pull trinity’s container:
wget -nc https://data.broadinstitute.org/Trinity/TRINITY_SINGULARITY/trinityrnaseq.v2.11.0.simg
mv -v trinityrnaseq.v2.11.0.simg singularity-Trinity-2.11.0.sif
(Notice this isn’t a singularity pull command? No worries – this is the command recommended from biocontainers!)
You can access the inside of the container by using “singularity run” and then the name of the sif file:
singularity run singularity-Trinity-2.11.0.sif
Your prompt will change to “Singularity>”, and you can poke around in the container. The easiest place to start is to figure out where commands that work are installed. Start with something really basic to the package, as it is likely to be in the container’s PATH. So, let’s try Trinity:
which Trinity
/usr/local/bin/trinityrnaseq/Trinity
What about the program we wanted to use, say run_DE_analysis.pl?
which run_DE_analysis.pl
Womp womp. Nothing found. Okay, let’s look in the Trinity directory:
cd /usr/local/bin/trinityrnaseq
Trinity’s documentation points us to “$TRINITYHOME/Analysis/DifferentialExpression/” for extra scripts.. so let’s give that a shot:
cd Analysis/DifferentialExpression/
ls
Found it! So, let’s grap the path:
pwd
and then exit the container
exit
and then we’d just need to add the full path to the TOOL variable in a new wrapper script if we wanted to run this script:
TOOL=/usr/local/bin/trinityrnaseq/Analysis/DifferentialExpression/run_DE_analysis.pl
Again, this is very uncommon to have to do, but I’m just being thorough! We’ve done this for Trinity, and you can view and download our scripts here. We’re also happy to help with this process and are likely to have videos for this soon.
Gotch 2: Can I link containers together?
It’s a bit tricky to get software talking to each other this way. This can be less space efficient, since things like samtools will be in a bunch of containers. We’re working on that, but for now, a slight space issue is preferable to having to spend 5 years perfecting the art of software installation ^_^.
I don’t have singularity on my machine… can I install it?
You can put in on any virtual machine or Linux computer with the following tutorial:
https://singularity-tutorial.github.io/01-installation/
We also have an automated script for installing on Ubuntu on our github.
You can also use docker instead of singularity – you just have to grab the docker command from the repository, rather than the singularity command. Unfortunately, if you are on a cluster, you cannot install it yourself. That requires admin access :\
My admin told me they weren’t okay installing singularity on the system… what now?
Well, you can also install software with conda – it’s found on the same page with the singularity and docker commands. You will have to change the wrapper script, and we’ll have tips on that soon.
I keep getting this warning: “WARNING: Skipping mount */resolv.conf [files]: /etc/resolv.conf doesn’t exist in container”
Ignore it, it’s just looking for a file that isn’t included with the container. There are ways to fix it, but it doesn’t make a difference in the least.