Table des matières

Les logiciels de CALI

Sur le cluster

Vous trouverez dans cette rubrique une liste des logiciels installés sur CALI et les informations nécessaires pour leur utilisation sous slurm.

Notez que l'environnement utilisateur est géré à travers des modules : ces modules donnent accès à tel ou tel logiciel, dans des versions spécifiques. L'utilisation d'un logiciel nécessite donc en général le chargement d'un module associé. Lire la page de présentation des modules
page desc
start Les logiciels de BIO sur CALI Sur le cluster Vous trouverez dans cette rubrique une liste des logiciels de BIO installés sur CALI et les informations nécessaires pour leur utilisation sous slurm. Notez que l'environnement utilisateur est géré à travers des
bcl2fastq Logiciel de transformation de format des données de séquençage génétique. * Versions installées : * 2.17.1.14 Utilisation Sélection de la version Pour sélectionner la version voulue : utiliser les modules bcl2fastq a été placé dans l'ensemble
Beagle Beagle version 4.0 performs genotype calling, genotype phasing, imputation of ungenotyped markers, and identity-by-descent segment detection. * Site web : * Versions installées : * 4.0 Les données human genetic maps et human reference panel
Bedtools Collectively, the bedtools utilities are a swiss-army knife of tools for a wide-range of genomics analysis tasks. The most widely-used tools enable genome arithmetic: that is, set theory on the genome. For example, bedtools allows one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF.
Bio++ Bio++ is a set of C++ libraries for Bioinformatics, including sequence analysis, phylogenetics, molecular evolution and population genetics. Bio++ is fully Object Oriented and is designed to be both easy to use and computer efficient. * Site web :
blat BLAT on DNA is designed to quickly find sequences of 95% and greater similarity of length 25 bases or more. It may miss more divergent or shorter sequence alignments. It will find perfect sequence matches of 20 bases. BLAT on proteins finds sequences of 80% and greater similarity of length 20 amino acids or more. In practice DNA BLAT works well on primates, and protein blat on land vertebrates.
Bowtie-2 Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. * Site web : * Versions installées : * 2.2.3 Utilisation Sélection de la version Pour sélectionner la version voulue : utiliser les
Bowtie Bowtie is an ultrafast, memory-efficient short read aligner geared toward quickly aligning large sets of short DNA sequences (reads) to large genomes. * Site web : * Versions installées : * 1.1.1 Citation If you use Bowtie for your published research, please cite the
BWA BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome. * Site web : * Versions installées : * 0.7.10 Citation Les auteurs demandent la citation de leur article si vous utilisez des résultats de BWA dans vos propres publications :
Cufflinks Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one, taking into account biases in library preparation protocols.
cutadapt cutadapt removes adapter sequences from high-throughput sequencing data. This is usually necessary when the read length of the sequencing machine is longer than the molecule that is sequenced, for example when sequencing microRNAs. * Site web :
deeptools “tools for exploring deep sequencing data” : deepTools is a suite of python tools particularly developed for the efficient analysis of high-throughput sequencing data, such as ChIP-seq, RNA-seq or MNase-seq. * Site web : * Versions installées :
fastq-tools Small utilities for working with fastq sequence files. * Site web : * Versions installées : * 0.7 Utilisation Sélection de la version Pour sélectionner la version voulue : utiliser les modules fastq-tools a été placé dans l'ensemble
fastqc FastQC is an application which takes a FastQ file and runs a series of tests on it to generate a comprehensive QC report * Site web : * Versions installées : * 0.11.2 Utilisation Sélection de la version Pour sélectionner la version voulue : utiliser les
LOGICIEL The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing. * Site web : * Versions installées : * 0.0.13 Utilisation Sélection de la version Pour sélectionner la version voulue : utiliser les
gatk The Genome Analysis Toolkit or GATK is a software package developed at the Broad Institute to analyze high-throughput sequencing data. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
LOGICIEL Gblocks is a computer program written in ANSI C language that eliminates poorly aligned positions and divergent regions of an alignment of DNA or protein sequences. * Site web : * Versions installées : * 0.91b Utilisation Sélection de la version
GS3 GS3 is a program that estimates fixed and random effects, breeding values and SNP effects for genomic selection. It includes normal, mixture, or double exponential distributions for SNP effects, i.e. GBLUP, the so-called BayesCPi, and the Bayesian Lasso. It allows estimation of the variances and effects of SNPs, polygenic and environmental effects, and also the inclusion of heterogeneous variances as for the analysis of DYD's.
HTSeq HTSeq is a Python package that provides infrastructure to process data from high-throughput sequencing assays. * Site web : * Versions installées : * 0.6.1 Utilisation Sélection de la version Pour sélectionner la version voulue : utiliser les
IGV The Integrative Genomics Viewer (IGV) is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets. It supports a wide variety of data types, including array-based and next-generation sequence data, and genomic annotations.
MAFFT * Site web : * Versions installées : * 7.453 avec extensions * Licence : voir Les extensions sont compilées, ainsi que le module MPI. Utilisation Sélection de la version Pour sélectionner la version voulue : utiliser les
MapSplice MapSplice is a software for mapping RNA-seq data to reference genome for splice junction discovery that depends only on reference genome, and not on any further annotations. * Site web : * Versions installées : * 2.1.8 Utilisation
MIRA MIRA is a whole genome shotgun and EST sequence assembler for Sanger, 454, Solexa (Illumina), IonTorrent data and PacBio (the later at the moment only CCS and error-corrected CLR reads). It can be seen as a Swiss army knife of sequence assembly developed and used in the past 16 years to get assembly jobs done efficiently - and especially accurately. That is, without actually putting too much manual work into finishing the assembly.
Picard A set of Java command line tools for manipulating high-throughput sequencing data (HTS) data and formats. Picard is implemented using the HTSJDK Java library HTSJDK, supporting accessing of common file formats, such as SAM and VCF, used for high-throughput sequencing data.
Samtools Samtools is a suite of programs for interacting with high-throughput sequencing data. It consists of three separate software suites : Samtools, BCFtools and the HTSlib library. * Samtools : Reading/writing/editing/indexing/viewing SAM/BAM/CRAM format
TopHat TopHat is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.
vcftools VCFtools is a program package designed for working with VCF files, such as those generated by the 1000 Genomes Project. The aim of VCFtools is to provide easily accessible methods for working with complex genetic variation data in the form of VCF files.
start Math Kernel Library Cette rubrique vous donnera des indications sur l'utilisation de la librairie Intel Math Kernel Library ou MKL.
MKL et parallélisme Nous avons vu dans l'introduction à la MKL que cette librairie est par défaut en mode multi-threadé. L'objet de cet article est d'exposer dans les grandes lignes : * les gains que vous pouvez en attendre * les interactions avec votre code si vous utilisez déjà du threading
Librairie MKL La librairie Intel Math Kernel Library (MKL) est une librairie de développement, pour des calculs mathématiques optimisées. Elle peut être utilisée par des personnes qui produisent leur programme de calcul en écrivant ces programmes dans les langages C, C++ ou Fortran.
amber Amber Assisted Model Building with Energy Refinement : “Amber” refers to two things: a set of molecular mechanical force fields for the simulation of biomolecules (AmberTools, software in the public domain) ; and a package of molecular simulation programs.
bamutil bamUtil bamUtil is a repository that contains several programs that perform operations on SAM/BAM files. All of these programs are built into a single executable, bam. * Site web : * Versions installées : * 1.0.12 Utilisation Sélection de la version
chromopainter ChromoPainter * Versions installées : * ChromoPainterv2 Utilisation Sélection de la version Pour sélectionner la version voulue : utiliser les modules Par exemple : module load ChromoPainter
cmake cmake CMake est un outils d'aide à la compilation des logiciels. Si vous êtes développeur et que vous souhaitez distribuer votre logiciel, il permet de simplifier son installation sur différentes plate-formes. * Site web : * Versions installées :
code_saturne Code Saturne Code Saturne is a system designed to solve the Navier-Stokes equations in the cases of 2D, 2D axisymmetric or 3D flows. * Site web : * Versions installées : * 4.2.4 - Compilé avec la suite intel/composer/xe_2017, via l'outils
comsol COMSOL Logiciel élément finis alliant généricité (mécaniques des fluides, électromagnétisme, mécaniques des structures, thermique) et puissance de calcul, permettant d'étudier un nombre illimité d'interactions entre différentes physiques (fluide caloporteur par exemple).
confab Confab Confab is a command-line application to systematically generate diverse low-energy conformers for molecules. Confab is an open source conformation generator whose goal is the systematic coverage of conformational space. The algorithm starts with an input 3D structure which, after some initialisation steps, is used to generate multiple conformers which are filtered on-the-fly to identify diverse low energy conformers.
crystal Crystal CRYSTAL is a general-purpose program for the study of crystalline solids, and the first which has been distributed publicly. The CRYSTAL program computes the electronic structure of periodic systems within Hartree Fock, density functional or various hybrid approximations (global, range-separated and double-hybrids). The Bloch functions of the periodic systems are expanded as linear combinations of atom centred Gaussian functions. Powerful screening techniques are used to exploit real …
devtoolset Red Hat Developper Toolset La suite Developper Toolset contient un environnement de développement (suite de compilateurs GCC, debogueur gdb, eclipse, etc.) avec des versions plus récentes que celles de la distribution Linux “standard”. Cette suite permet donc en particulier de disposer des dernières versions de GCC.
diffpack DiffPack Diffpack is an object oriented development framework for the solution of partial differential equations. Customers select Diffpack because it supports their need for flexibility, insight and control. Diffpack allows easy modification and combination of all numerical building blocks, resulting in few restrictions on the types of PDEs you can solve.
dl_poly_classic DL_POLY_CLASSIC DL POLY Classic is a molecular simulation package designed to facilitate molecular dynamics simulations of macromolecules, polymers, ionic systems, solutions and other molecular systems on a distributed memory parallel computer. DL_POLY_CLASSIC is based on the source code from DL_POLY2
fluent Fluent * Site web : * Versions installées : * 16.2 (v162) * 17.1 (v171) * 2021 R2 (v212) Utilisation Sélection de la version Pour sélectionner la version voulue : utiliser les modules. Le module Par exemple : module load ansys Travailler avec slurm
freebayes FreeBayes FreeBayes is a Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs (single-nucleotide polymorphisms), indels (insertions and deletions), MNPs (multi-nucleotide polymorphisms), and complex events (composite insertion and substitution events) smaller than the length of a short-read sequencing alignment.
gamess GAMESS The General Atomic and Molecular Electronic Structure System (GAMESS) is a general ab initio quantum chemistry package. * Site web : * Versions installées : * 2013/05/01 Citation Vous devez donc respecter ces conditions si vous utilisez des résultats obtenus avec GAMESS
gaussian GAUSSIAN Logiciel de modélisation moléculaire et chimie théorique. Gaussian 09 is the latest version of the Gaussian® series of electronic structure programs, used by chemists, chemical engineers, biochemists, physicists and other scientists worldwide. Starting from the fundamental laws of quantum mechanics, Gaussian 09 predicts the energies, molecular structures, vibrational frequencies and molecular properties of molecules and reactions in a wide variety of chemical environments. Gaussian 09…
gcc GCC (GNU Compiler Collection) * Site web : * Versions installées : * 4.4.7 (version système) * 4.8.2 (devtoolset version 2) * 4.9.2 (devtoolset version 3) * 4.9.4 * 5.4.0 * 6.4.0 * 8.3.0 * 9.3.0 La suite de compilateur GNU constitue l'environnement de développement usuel sous Linux. Sur le cluster,
gnuplot GNUPLOT Gnuplot is a portable command-line driven graphing utility for Linux, OS/2, MS Windows, OSX, VMS, and many other platforms. The source code is copyrighted but freely distributed (i.e., you don't have to pay for it). It was originally created to allow scientists and students to visualize mathematical functions and data interactively, but has grown to support many non-interactive uses such as web scripting. It is also used as a plotting engine by third-party applications like Octave. Gn…
gromacs Gromacs GROMACS is an engine to perform molecular dynamics simulations and energy minimization. * Site web * Versions installées : * 5.0.2 avec support OpenMP et thread-MPI * 5.0.2 avec support OpenMP et MPI * 5.0.2 avec support OpenMP et GPU NVidia (
gulp Gulp GULP is a program for performing a variety of types of simulation on materials using boundary conditions of 0-D (molecules and clusters), 1-D (polymers), 2-D (surfaces, slabs and grain boundaries), or 3-D (periodic solids). * Site web : *
igor IGOR IGoR is a C++ software designed to infer V(D)J recombination related processes from sequencing data such as: * Recombination model probability distribution * Hypermutation model * Best candidates recombination scenarios * Generation probabilities of sequences (even hypermutated)
intel-composer Intel Composer XE Cette suite contient les compilateurs Intel, la librairie mathématiques MKL, une librairie MPI optimisée et quelques autres outils. Cette suite contient entre autre : * les compilateurs C, C++, Fortran * la librairie MKL (math)
java Java * Site web : * Versions installées : * 1.7.0 (version système) * 1.8.0 Utilisation Sélection de la version Par défaut, le système est livré avec la version 1.7.0. Si vous voulez utiliser une autre version, vous devez la sélectionner avec les
lammps lammps LAMMPS is a classical molecular dynamics simulation code designed to run efficiently on parallel computers. * Site web * Version installée : * 2014-12-09 : compilateurs Intel, MPI, OpenMP LAMMPS est donc compilé pour tirer profit de MPI, avec OpenMP.
matlab Matlab MATLAB® est un logiciel commercial développé par la société Mathworks. Matlab est un langage de haut niveau et un environnement interactif pour le calcul numérique, la visualisation et la programmation. * Site web : * Versions installées :
mmseqs2 MMseqs2 * Site web : * Versions installées : * 2f66ae897fc813450fa5ef0c78123bd3c41c4717 (en date du 1er février 2019) : la version installée est la version pré-compilée pour architecture sse4.1 Utilisation Sélection de la version
nwchem NWChem NWChem aims to provide its users with computational chemistry tools that are scalable both in their ability to treat large scientific computational chemistry problems efficiently, and in their use of available parallel computing resources from high-performance parallel supercomputers to conventional workstation clusters.
octave Octave GNU Octave is a high-level interpreted language, primarily intended for numerical computations. It provides capabilities for the numerical solution of linear and nonlinear problems, and for performing other numerical experiments. It also provides extensive graphics capabilities for data visualization and manipulation. Octave is normally used through its interactive command line interface, but it can also be used to write non-interactive programs. The Octave language is quite similar to M…
openmole OpenMole OpenMole n'est pas un logiciel de calcul. C'est un outil qui permet de construire un workflow de calculs à effectuer et de soumettre ensuite les différentes phases du calcul sur un ou plusieurs cluster. OpenMole fonctionne en général directement sur votre PC, et non pas sur le cluster. Il est installé sur CALI uniquement pour dissocier la partie cliente de la partie serveur d'OpenMole et pouvoir éteindre votre PC dès qu'OpenMole lance le workflow.
orca ORCA * Site web : * Versions installées : * 3.0.2 La licence d'Orca permet son utilisation dans un cadre de recherche seulement. Citation Extrait de la licence d'utilisation : 4.If results obtained with ORCA package are published in scientific literature, you will reference the program as F. Neese: The ORCA program system (WIREs Comput Mol Sci 2012, 2: 73-78). Using specific methods included in ORCA may require citing additional articles…
pymol PyMOL PyMOL is a user-sponsored molecular visualization system on an open-source foundation. Logiciel de visualisation destiné à être utilisé sur le frontal. * Site web : * Versions installées : * 1.7.4.0 Utilisation Sélection de la version
python Python Python est un langage de programmation * Site web : * La version installée avec le système (CentOS 5) est la 2.6.6. D'autres versions sont installées en plus : * 3.6-anaconda3 : version 3.6 installée avec Anaconda 3 * 3.4 * 2.7 N.B. : pour voir la version précise installée, vous pouvez charger le module puis faire
qd Librairie QD La librairie QD est une librairie de calcul en précision étendue (double-double et quad-double), disponible en C++ et Fortran 90. * Site web : * Versions installées : * qd 2.3.17 Utilisation Sélection de la version Pour sélectionner la version voulue : utiliser les
r R R is a free software environment for statistical computing and graphics. * Site web : * Versions : * 3.0.3 * 3.1.1 (version par défaut) * 3.1.2 * 3.2.2 * 3.2.3-gnu * 3.4.2 * 3.5.3-gnu+mkl * 4.0.0-gnu+mkl * 4.1.0-gnu+mkl
saga SAGA-GIS SAGA's first objective is to give (geo-)scientists an effective but easy learnable platform for the implementation of geoscientific methods. This is achieved by SAGA's unique Application Programming Interface (API). The second is to make these methods accessible in a user friendly way, what is first of all done by its Graphical User Interface (
siesta Siesta Siesta (Spanish Initiative for Electronic Simulations with Thousands of Atoms) is both a method and its computer program implementation, to perform electronic structure calculations and ab initio molecular dynamics simulations of molecules and solids.
sparta SPARTA SPARTA is an acronym for Stochastic PArallel Rarefied-gas Time-accurate Analyzer. SPARTA is a parallel DSMC code for performing simulations of low-density gases in 2d or 3d. Particles advect through a hierarchical Cartesian grid that overlays the simulation box. The grid is used to group particles by grid cell for purposes of performing collisions and chemistry. Physical objects with triangulated surfaces can be embedded in the grid, creating cut and split grid cells. The grid is also u…
structure Structure The program structure is a free software package for using multi-locus genotype data to investigate population structure. Its uses include inferring the presence of distinct populations, assigning individuals to populations, studying hybrid zones, identifying migrants and admixed individuals, and estimating population allele frequencies in situations where many individuals are migrants or admixed. It can be applied to most of the commonly-used genetic markers, including SNPS, microsa…
vasp VASP The Vienna Ab initio Simulation Package (VASP) is a computer program for atomic scale materials modelling, e.g. electronic structure calculations and quantum-mechanical molecular dynamics, from first principles. VASP n'est pas un logiciel libre : VASP is copyright-protected software, the copright owner and sole distributor worldwide is the University of Vienna, Austria, represented by Prof. Dr. Georg KRESSE at the Faculty of Physics. It is necessary to have an appropriate license to use V…
vmd VMD VMD is a molecular visualization program for displaying, animating, and analyzing large biomolecular systems using 3-D graphics and built-in scripting Ce logiciel de visualisation n'est pas un outil de calcul. Cependant, il peut utiliser les cartes graphiques NVidia Tesla des noeuds GPU. Il faut alors l'utiliser sous slurm, en demandant l'accès aux noeuds GPU et en mode interactif.
xcrysden XCrysDen XCrySDen is a crystalline and molecular structure visualisation program aiming at display of isosurfaces and contours, which can be superimposed on crystalline structures and interactively rotated and manipulated. * Site web : * Versions installées :

Outils sur le frontal

Certains outils dédiés aux pré ou post-traitement, à la visualisation, sont installés uniquement sur le frontal de CALI et ne sont pas prévus pour fonctionner sous le gestionnaire de travaux slurm. Voici une liste partielle :

Librairies

En dehors des logiciels proprement dits ou des outils de développement principaux (compilateurs, MKL et OpenMPI), voici une liste non exhaustive de librairies disponibles sur le cluster.

page desc
boost Librairie BOOST Boost est un ensemble de librairies C++. Parmi celles-ci, 10 ont déjà été intégrées aux librairies standards C++ (version C++11), les autres librairies sont proposées pour devenir un standard dans la norme C++17. * Site web :
gmp GMPlib GMP is a free library for arbitrary precision arithmetic, operating on signed integers, rational numbers, and floating-point numbers. There is no practical limit to the precision except the ones implied by the available memory in the machine GMP runs on. GMP has a rich set of functions, and the functions have a regular interface.
mpfr MPFR The MPFR library is a C library for multiple-precision floating-point computations with correct rounding. * Site web : * Versions installées : * 3.1.2 : compilé avec les suites Gnu et Intel. Utilisation Sélection de la version Pour sélectionner la version voulue : utiliser les