Différences

Ci-dessous, les différences entre deux révisions de la page.

--- logiciels:gromacs [2014/10/15 11:42]
fabre21 [Exemple de fichier batch slurm sur CPU]
+++ logiciels:gromacs [2019/01/28 11:27] (Version actuelle)
fabrep03 [Performance]
@@ Ligne 9: / Ligne 9: @@
     * 5.0.2 avec support OpenMP et MPI
     * 5.0.2 avec support OpenMP et GPU NVidia (:!: compilé avec Intel composer 2013)
+    * 5.0.4, en simple et double précision, avec MPI ou sans
+    * 2016 avec support OpenMP et MPI
+    * 2018.4 avec support OpenMP et MPI
+    * 2019 avec support OpenMP, MPI, nouvelles SIMD et GPU avec CUDA 10.0
 ===== Utilisation =====
-Voir les manuels d'utilisation sur http://www.gromacs.org/Documentation/Manual
+Voir les manuels d'utilisation sur http://manual.gromacs.org/documentation/current/index.html
-Pour une discussion sur la parallélisation :  http://www.gromacs.org/Documentation/Acceleration_and_parallelization
 ==== Sélection de la version ====
 Pour sélectionner la version voulue : utiliser les [[..:modules]]
 Par exemple :
-  module load gromacs/5.0.2-mpi
+  module load gromacs/2019
-:!: Si vous utilisez la version GPU, vous devez faire
-  module unload intel/composer
-  module load intel/composer/xe_2013_sp1.2.144
-  module load gromacs/5.0.2-gpu
+Un seul fichier module existe. Le nom de l'exécutable change en fonction du mode de fonctionnement souhaité :
+  * ''gmx'' pour la version sans MPI en simple précision
+  * ''mdrun_mpi'' pour la version avec MPI et partition ''normal'' et assimilées
+  * ''mdrun_avx2'' pour la version avec MPI et partition ''cluster-e5v4'' et assimilées
+  * ''mdrun_gpu'' pour la version GPU pour les K20 (non testée)
+  * ''mdrun_gpu_avx2'' pour la version GPU pour les K40 (non testé) et les GTX1080TI
-===== Benchmarks =====
+===== Performance =====
-Les benchmarks suivants ont été réalisés sur une simulation de DHFR (une petite protéine), qui est couramment utilisée pour des benchmarks. Le système contient environ 23000 atomes.
+La dernière version (2019 ou plus) est à privilégier, car les performances sont bien meilleures qu'avec les anciennes.
-Il est possible de paralléliser de deux manières différentes :
-  * avec MPI, qui est performant pour partager un job sur plusieurs noeuds
-  * avec openMP, qui peut être performant pour partager des coeurs au sein d'un processus MPI sur un même noeud.
-Voici un tableau des performances relevées avec Gromacs 5.0.2 et les compilateurs intel :
-  nb_coeurs      ntasks(MPI)     cpus-per-task(openMP)    GPU      Performance(ns/day)
-               2                          1      0       10.955
-               4                          1      0       14.817
-               8                          1      0       21.747
-              16                          1      0       64.457
-              32                          1      0      110.067
-              64                          1      0      156.836
-             128                          1      0      268.282
-             256                          1      0      255.346  #ici, on dépasse la limite de 100 atomes par coeur. Ça devrait mieux fonctionner avec un plus gros système
-             128                          2      0      291.507 # pour un grand nombre de coeurs, il devient intéressant de diminuer le nombre de processus MPI au profit d'openMP
-               1                          8      1       82.490
-               2                          4      2       82.217 # la puissance à 2 GPU est limitée par le nombre de coeurs
-               2                          8      2      118.781
-               1                         16      1      112.530 # pas d'intérêt d'utiliser les 2 CPU pour 1 GPU, mais on constate également la limite du nombre de CPU.
+J'ai effectué de nombreux benchmarks et on en tire des enseignements intéressants. Le mieux est de consulter la présentation de ces résultats de benchmark. M'écrire à [[gabin.fabre@unilim.fr]] pour en discuter.
+{{:logiciels:gromacs_2019_benchmarks.pdf|}}
 ===== Exemples de fichiers batch slurm =====
-==== Exemple de fichier batch slurm sur CPU ====
+==== Sur CPU ====
   #!/bin/bash
-  #SBATCH --partition=cluster
+  #SBATCH --partition=normal
-  #SBATCH --qos=cluster
+  #SBATCH --ntasks=16
-  #SBATCH --ntasks=32
   #SBATCH --cpus-per-task=1
   #SBATCH --threads-per-core=1
   #SBATCH --mem-per-cpu=1000
-  #SBATCH --time=10-00:00:00
+  #SBATCH --time=2-00:00:00
+  #SBATCH --nodes=1-4
-  module load gromacs
+  # the --nodes option sets the minimum and maximum number of cores. It is good to set the maximum to ntasks/16 to limit jobs spread on many many nodes.
+  module load gromacs
+  export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
+  mdrun="srun --kill-on-bad-exit=1 mdrun_mpi" #without the option, the job hangs and is not terminated when it fails.
+  grompp="gmx grompp" # for gromacs version 5 and later
+  if [ -r "state.cpt" ] ; then
+  # if state.cpt exists in the current directory, that means we are in scratch and the calculation has run already
+  restart="TRUE"
+  else
+  restart="FALSE"
+  fi
+  # restart="FALSE" # you can force a restart or not with this line
+  if [ "$restart" == "TRUE" ] ; then
+  MIN=false # we assume min step is done the first day
+  #WORKDIR has to be where the data is, i.e. in scratch / inside the results.* folder. So submit this batch file from there.
+  WORKDIR="$PWD"
+  sleep 30 # to make sure all processes from the previous job are killed
+  restartoptions="-cpi state.cpt -append"
+  else # options for the beginning of the run
+  MIN=true
+  simname=`basename $PWD`
+  WORKDIR="$HOME/scratch/gromacs/run.$SLURM_JOB_ID.gromacs.$simname"
+  #
+  # Directory used to store the results
+  #
+  mkdir -p $WORKDIR || {
+  echo "ERROR Creating the working directory"
+  exit 1
+  }
+  cp * $WORKDIR
+  ln -sfn $WORKDIR "results.$SLURM_JOB_ID" # create symbolic link to scratch
+  cd $WORKDIR
+  restartoptions=""
+  fi
+  cd $WORKDIR
+  if [ $MIN = "true" ]; then
+  $grompp -f em.mdp -c md.gro -n md.ndx -p md.top -o em.tpr
+  $mdrun -s em.tpr -o em.trr -c em.gro -e em.edr -g em.log
+  $grompp -f pr.mdp -c em.gro -n md.ndx -p md.top -r em.gro -o pr.tpr
+  $mdrun -s pr.tpr -o pr.trr -c pr.gro -e pr.edr -g pr.log
+  mv pr.gro 0.gro
+  fi
+  #use the following line if you want to prolong a simulation that crashed or terminated normally.
+  #if you just want to finish it after a crash, comment it.
+  #adjust the -until option to the total amount of ps you want to have.
+  #gmx convert-tpr -s md.tpr -o md.tpr -until 1000000
+  # tricky part: submit the same job, that will only be run after the current one crashes.
+  sbatch -d afternotok:$SLURM_JOB_ID $0
+  if [ "$restart" == "FALSE" ] ; then
+  $grompp -f md.mdp -c 0.gro -n md.ndx -p md.top -o md.tpr
+  fi
+  #the -cpi option will use your checkpoint to restart the calculation and continue writing to your files.
+  $mdrun -s md.tpr -o md.trr -x md.xtc -c md_out.gro -e md.edr -g md.log $restartoptions
+  rm -f md.trr # remove the big file (>1GB) after the calculation
+  rm -f #* # remove the backup files
+===== Exemple de fichier batch slurm sur GPU =====
+==== Sur 1 GPU ====
+  #!/bin/bash
+  #SBATCH --partition=gpu
+  #SBATCH --qos=gpu
+  #SBATCH --gres=gpu:1
+  #SBATCH --ntasks=1
+  #SBATCH --cpus-per-task=8
+  #SBATCH --threads-per-core=1
+  #SBATCH --mem-per-cpu=1000
+  #SBATCH --time=7-00:00:00
+  module load gromacs/5.0.2-gpu
   export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
@@ Ligne 72: / Ligne 139: @@
   MIN=true
   grompp=grompp
-  mdrun="srun mdrun_mpi"
+  mdrun="mdrun"
+  # Directory used to store the results
+  mkdir $WORKDIR || {
+          echo "ERROR Creating the working directory"
+          exit 1
+  }
+  cp * $WORKDIR
+  cd $WORKDIR
+  if [ $MIN = "true" ]; then
+    $grompp -f em.mdp -c md.gro -n md.ndx -p md.top -o em.tpr
+    $mdrun -s em.tpr -o em.trr -c em.gro -e em.edr -g em.log
+    $grompp -f pr.mdp -c em.gro -n md.ndx -p md.top -r em.gro -o pr.tpr
+    $mdrun -s pr.tpr -o pr.trr -c pr.gro -e pr.edr -g pr.log
+    mv pr.gro 0.gro
+  fi
+  $grompp -f md.mdp -c 0.gro -n md.ndx -p md.top -o md.tpr
+  $mdrun -s md.tpr -o md.trr -x md.xtc -c md_out.gro -e md.edr -g md.log
+  rm -f md.trr # remove the uncompressed trajectory after the calculation
+  rm -f \#* # remove backup files
+==== Sur 2 GPUs ====
+  #!/bin/bash
+  #SBATCH --partition=gpu
+  #SBATCH --qos=gpu
+  #SBATCH --gres=gpu:2
+  #SBATCH --ntasks=2
+  #SBATCH --cpus-per-task=8
+  #SBATCH --threads-per-core=1
+  #SBATCH --mem-per-cpu=1000
+  #SBATCH --time=7-00:00:00
+  module load gromacs/5.0.2-gpu
+  export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
+  WORKDIR="$HOME/scratch/$SLURM_JOB_ID"
+  MIN=true
+  grompp=grompp
+  mdrun="mdrun"
   # Directory used to store the results
@@ Ligne 99: / Ligne 212: @@
   rm -f md.trr # remove the uncompressed trajectory after the calculation
   rm -f \#* # remove backup files
-===== Exemple de fichier batch slurm sur GPU =====
 ~~DISCUSSION~~

Calcul en Limousin

Outils pour utilisateurs

Outils du site

Différences

Outils de la page