The Broad’s RNA assembler Trinity is one of the most popular tools for assembling short reads (~100-150 base pairs) of RNA into the transcripts (~200-15000bp) produced by an organism. According to recent git pull stats, over 16,000 people have downloaded the program over 4.4 million times since 2015. And that number is increasing every year, with approximately 1000 downloads to unique ip addresses per month in 2017 thus far, up from ~770/month in 2016 and ~500/month in 2015. This trend is interesting, especially in light of the heavy use of shared installations of Trinity on HPC resources, which only count as one pull request despite being used by hundreds of individuals.
The Trinity software requires 1GB of RAM for every million reads input, which means running the software on a single lane of Illumina data (low end for experimental design) requires ~180GB of RAM. With sequencing cost continuing to decrease, the scale or RNA sequencing projects has increased – resulting in many lanes of Illumina and TB scale memory requirements. Many biologists must run Trinity on HPC compute resources, such as IU’s Mason or through web interfaces such as IU’s Trinity Galaxy. IU’s Trinity Galaxy alone is being used by 665 users across 486 institutions in 51 countries (see map) – and again, this high volume use only accounts for a couple of the 4.4 million downloads of the software.
When hundreds of people are using a single instance of a software, issues and complications become evident much faster than when single users are toiling with their individual installations. The feedback from users is agglomerated by NCGAS into direct feedback to the Trinity developer partners. This partnership between developers and user facing centers like NCGAS contribute significantly to the continued success of software as it becomes more efficient and better handles biological complexities.
Earlier in NCGAS’s partnership with Trinity, NCGAS made improvements boost speeds by a factor of 4. Now, NCGAS is working on means of passing jobs to different clusters to handle TB scale memory jobs in a timely manner.