GPU jobs

Single-GPU

All GPU nodes in deucalion (gnx[501-533]) are exclusive, meaning that one cannot allocate (and consequently be billed) for any number of GPU below 4, which is the number of A-100 NVidia GPU per node. Since any job will be billed for the whole node, if your code can only use one GPU at a time you can start four simultaneous simulations:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=32
#SBATCH --time=4:00:00
#SBATCH --partition normal-a100-40
#SBATCH --mem=0
#SBATCH --account=<slurm_account>


ml  OpenMPI/5.0.3-GCC-13.3.0 CUDA/11.8.0 NCCL/2.20.5-GCCcore-13.3.0-CUDA-12.4.0

CUDA_VISIBLE_DEVICES=0 srun -n1 code input0 &
CUDA_VISIBLE_DEVICES=1 srun -n1 code input1 &
CUDA_VISIBLE_DEVICES=2 srun -n1 code input2 &
CUDA_VISIBLE_DEVICES=3 srun -n1 code input3 &
wait

In this case the job will only finish after every process ended.

Multi-GPU in single node

Even though some python codes automatically grab every GPU available (without requiring the explicit use of srun), it is useful to look at the output of nvidia-smi to guarantee that you are running at least one process per GPU. The following jobscript runs the HPL benchmark in 4 GPU using srun (single node).

#!/bin/bash

#SBATCH -A <slurm_account>
#SBATCH -t 00:30:00
#SBATCH -p normal-a100-40
#SBATCH -N 1
#SBATCH --tasks-per-node 4
#SBATCH --output=results/%j.out

ml  OpenMPI/5.0.3-GCC-13.3.0 CUDA/11.8.0 NCCL/2.20.5-GCCcore-13.3.0-CUDA-12.4.0
export CUDA_VISIBLE_DEVICES=0,1,2,3

srun  hpl.sh --dat sample-dat/HPL-4GPUs-40.dat

Multi-node

For multi-node, you should notice that every GPU in every node is being used (as otherwise you are misusing the resources). The following jobscript runs the HPL benchmark in 16 GPU:

#!/bin/bash

#SBATCH -A <slurm_account>
#SBATCH -t 00:30:00
#SBATCH -p normal-a100-40
#SBATCH -N 4
#SBATCH --tasks-per-node 4
#SBATCH --output=results/%j.out

ml  OpenMPI/5.0.3-GCC-13.3.0 CUDA/11.8.0 NCCL/2.20.5-GCCcore-13.3.0-CUDA-12.4.0
export CUDA_VISIBLE_DEVICES=0,1,2,3

srun  hpl.sh --dat sample-dat/HPL-16GPUs-40.dat