Life Science and Bioinformatics

Description

Life science benefits greatly from the application of computational resources and tools, from large-scale genomic analysis and complex molecular simulations, to sophisticated statistical analysis and image post-processing. We provide access to many bioinformatics and life science tools, the compute power to utilize them, as well as the technical expertise to help researchers access our resources and services.

Access

You can request access to these services via SCC, NHR or KISSKI as explained here.

Applications

The software stack available in our compute clusters is comprehensive, and consequently rather complex. You can find (many) more details in the software stacks page, and the many subpages of list of modules. In this page we will briefly mention how you can install your own programs if need be, and also highlight the more relevant domain programs (some of them with their own, more in-depth pages).

Installing your own programs

Users do not have root permissions, which can make installing your own software a bit more complicated than in your own computer. But still, there are a number of possibilities to achieve your own software stack:

Compiling from source: This is usually the hardest approach, but should work in most cases. You will have to take care of many dependencies on your own. Most Linux-based programs will follow a configure, make, make install loop, so if you have done this once for a program you can expect a similar procedure for future programs. The only note here is that at the configure step, you will have to define an installation directory that is somewhere in your user space.
pip: For Python packages, pip is usually the way to go. Make sure the pip you are using matches the python compiler you are actually using (which python and which pip should point to similar locations). To make absolutely sure, you can also use python -m pip install instead of raw pip. Sometimes adding the --user option is necessary to tell pip to install packages to a local folder. Finally, there can sometimes be incompatibilities between packages when using pip, and Python’s module system can be very complex, so we recommend using conda (see next point) or its variants.
conda: conda is a package and environment manager, particularly popular in the Bioinformatics domain, that makes handling complex environments and dependency stacks easy and transferable. We have a dedicated page with more information about python package management. Do be aware we discourage using conda init when setting up the package manager for the first time, please check our documentation for more details on how to properly use this tool.
spack: Spack is another package manager, similar to Conda. It can be more complex to use, but at the same time much more powerful. We provide information on how users can make use of Spack for installing their own software in its dedicated page.
Containers: Containers are a good option for particularly problematic programs with complex requirements, are very transferable, and are increasingly provided as an option directly from developers. We offer access to Apptainer on our cluster, and some of our tools are provided as containers already.
Jupyter-Hub: Our Jupyter-Hub offers the possibility to run your own custom containers in interactive mode, which can also be reused as containers for batch jobs. This includes Jupyter and Python containers, RStudio (you can also install extra modules on top of the base Python and RStudio containers without having to modify the container itself), and even containers for graphical applications on the HPC-Desktops.
Cluster-wide installation: In some cases, we can install software for the whole cluster in our module system, depending on how useful and widely used we believe it would be. Contact us at our usual support addresses if you require this.

Genomics

ABySS: De-novo, parallel, paired-end sequence assembler that is designed for short reads.
bedtools: Tools for a wide-range of genomics analysis tasks. - BLAST-plus: Basic Local Alignment Search Tool.
Bowtie: Ultrafast, memory-efficient short read aligner for short DNA sequences (reads) from next-gen sequencers.
bwa: Burrow-Wheeler Aligner for pairwise alignment between DNA sequences.
DIAMOND: Sequence aligner for protein and translated DNA searches, designed for high performance analysis of big sequence data.
GATK: Genome Analysis Toolkit Variant Discovery in High-Throughput Sequencing Data.
HISAT2: fast and sensitive alignment program for mapping next-generation sequencing reads (whole-genome, transcriptome, and exome sequencing data) against the general human population (as well as against a single reference genome).
IGV: The Integrative Genomics Viewer is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets.
IQ-TREE: Efficient software for phylogenomic inference.
JELLYFISH: A tool for fast, memory-efficient counting of k-mers in DNA.
Kraken2: System for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.
MaSuRCA: Whole genome assembly software. It combines the efficiency of the de Bruijn graph and Overlap-Layout-Consensus (OLC) approaches.
MetaHipMer (MHM): De-novo metagenome short-read assembler.
MUSCLE: Widely-used software for making multiple alignments of biological sequences.
OpenBLAS: An optimized BLAS library.
RepeatMasker: Screen DNA sequences for interspersed repeats and low complexity DNA sequences.
RepeatModeler: De-novo repeat family identification and modeling package.
revbayes: Bayesian phylogenetic inference using probabilistic graphical models and an interpreted language.
Salmon: Tool for quantifying the expression of transcripts using RNA-seq data.
samtools: Provides various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format.

Additionally, we offer some tools as containers (which as such might not appear explicitly in the module system), among them: agat, cutadapt, deeptools, homer, ipyrad, macs3, meme, minimap2, multiqc, Trinity, umi_tools. See the tools in containers page for more information.

Molecular Simulations

Classic molecular simulations: LAMMPS, GROMACS, NAMD.
Various ab-initio codes: Gaussian, Turbomole, Quantum Espresso, CP2K, CPMD, Psi4, VASP (own license required).
Alphafold and other protein folding programs: See for example the Protein-AI service.

Imaging

Freesurfer: an open source software suite for processing and analyzing brain MRI images.
RELION: (REgularised LIkelihood OptimisatioN, pronounce rely-on) Empirical Bayesian approach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM).

Workflow Tools

Snakemake.
Nextflow.

Other

R, Matlab, Python, Octave, and many other programming languages.
Environment and package handling tools:
- Conda, Miniforge, uv: see Python.
- For other software: Spack.
Neuromorphic computing tools.

Support

For ways to contact us, see support.