Installing software on HPC
Installing programs with pre-compiled binary code
Installing programs from source code
Troubleshooting errors
Installing Software
NCGAS does help with installing and maintaining software system-wide. To request for a program to be installed, fill this form.
In some cases, if the program will be used by less than 5 users or one lab, then we reply with instructions on how to install the program locally to a project directory. The reason we don’t install all the requested programs is that all the system-wide programs are saved here – “/N/soft/rhel/”. This directory contains not just bioinformatics programs, but programs used by mathematicians, physicists, chemists ….
This directory does have limited space, adding in all the user requested programs will cause some problems,
- running out of disk space in /N/soft/rhel,
- slows down finding programs installed quickly.
Here is how we install programs on our HPC systems without root or superpowers! If there is a program you would like to install locally to your directory, these notes can be used to go about this as well.
Pep Talk
For folks in my position, installing software for others in a shared HPC environment is part of the job. I’m not a system administrator, nor do I have any special privileges on the system; my function is to act as a facilitator to other users on these systems, advocate for them, relay (aggressively) the needs of the users to the policy people, relay (gently) the needs of the system admins to the users, and generally help them get their research done. Researchers with a need for powerful computing hardware also need a software environment that fits their domain of study. In an HPC environment, there is often a way to deploy software that is separate than just installing it all on the whole system, since this could lead to conflicts between software packages, not to mention a lot of redundant space needed to house that software on every computer in the cluster. If your case is similar to mine, you’ll have to support your HPC users by creating flexible software environments you can customize to their needs.
Our job is to support biologists, especially research that falls under the category of genomics. The landscape of genomics software runs from crappy scripts that someone (maybe even me) wrote to hack something together quickly, to sophisticated multi-node MPI-enabled monstrosities with a hundred dependencies. You’ll get C/C++, Java, Perl, Python, R, the occasional Fortran programs, and nowadays you see more and more web-focused work with HTML outputs and beautiful visualizations using javascript. Yep, you need to support all that. I’m sure other domains are seeing these challenges and more.
It helps to have teammates and other groups around you to get you out of a pinch, but hopefully, this guide will get you started with the basics. I’ll focus on major hangups you’ll run into and how to solve them. I’ll first start with a primer on the environment you install the software in.
Download and install programs with pre-compiled binaries
First example – Let’s walk through installing SPAdes assembler since the binaries are pre-compiled and available for this program.
- Step 1 – Download the program to your local directory. In the SPAdes webpage, under Downloads, there are the following options,
- Download SPAdes binaries for Linux (64-bit only)
- Download SPAdes binaries for MacOS
- Download SPAdes source code
Since most HPC systems are LINUX based, download that file to your local directory.
Now, what do binaries mean? Pre-compiled code that is ready to run on your machine. If there is a binary version of the program, I prefer to download this version since they are likely the easiest to install :).
Run the command to download the program to the cluster
wget http://cab.spbu.ru/files/release3.13.0/SPAdes-3.13.0-Linux.tar.gz
- Step 2 – Installing the program. In most cases, in the manual like SPAdes manual provides the steps to install (Looks under “Installation”). If it’s a binary file, then the only step you need to run is “decompress” the downloaded binary file.
tar -xzf SPAdes-3.13.0-Linux.tar.gz
Download and install program from source code
Next example – when you don’t have binary files available and have to compile the program from source code. Let’s walk through SAMtools, a common program we all use,
- Step 1 – Download the program SAMtools to your local directory
wget https://github.com/samtools/samtools/releases/download/1.9/samtools-1.9.tar.bz2
Next, decompress the file
tar -xvf samtools-1.9.tar.bz2
- Step 2 – Compile the source code, once again if you look at the manual these steps will be mentioned. For example, look at the README in SAMtools GitHub page
./configure --prefix=`pwd` make make install
DONE! Samtools is installed now
Now a little more information, on what each of those steps did!
- ./configure sets up environmental variables, checks your system requirements, and well, configures things. Basically, checking your set up and program lists (in $PATH) to determine how to install things on YOUR SPECIFIC set up. It completes the Makefile to match your file locations, system settings, etc.
- make follows the instructions of the Makefile and convert source code into binary for the computer to read.
- make install installs the program by copying the binaries into the correct places as defined by ./configure and the Makefile.
Troubleshooting
- Once the programs are installed, how do you add the scripts to your PATH variable or environment? This is done so that there is no need to write the file path to the scripts every time you have to run the program. Take a look at this blog post for more information on this topic.
- Where can I find more information on make files, or what make and make install are doing? Here is another blog post we wrote on this topic to help.
- If you are running into library errors when compiling your code, click here to learn more about these errors and how to fix them.
Running into errors while running these commands, Stay tuned!
We are writing up common errors we run into and the workarounds. Meanwhile, send us an email to help@ncgas.org with the errors and any other questions and we are happy to help.