Skip to content

GPU jobs

Single-GPU

All GPU nodes in deucalion (gnx[501-533]) are non-exclusive, meaning that one can allocate (and consequently be billed) for any number of GPU they ask. If your code can only use one GPU at a time you can use this template for a batch script to start:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --gpus=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=32
#SBATCH --time=4:00:00
#SBATCH --partition normal-a100-40
#SBATCH --gpus=1
#SBATCH --account=<slurm_account> ##should end in G


ml  OpenMPI/5.0.3-GCC-13.3.0 CUDA/11.8.0 NCCL/2.20.5-GCCcore-13.3.0-CUDA-12.4.0

srun -n1 code input0 
For every GPU you ask you must ask for 32 cpus (for a start, you can always put --ntasks and --ngpus as the same value and put --cpus-per-task to 32). If you do ask for more than 32 cpu you will be billed for more than 1 GPU at the time. For example, if you ask for 96 CPU (32 CPU x 3) you will be billed as if you are using 3 GPU.

Multi-GPU in single node

Even though some python codes automatically grab every GPU available (without requiring the explicit use of srun), it is useful to look at the output of nvidia-smi to guarantee that you are running at least one process per GPU. The following jobscript runs the HPL benchmark in 4 GPU using srun (single node). If you do not select -N 1 slurm can allocate your GPU in different nodes.

#!/bin/bash

#SBATCH -A <slurm_account>
#SBATCH -t 00:30:00
#SBATCH -p normal-a100-40
#SBATCH -N 1
#SBATCH --gpus=4
#SBATCH --tasks-per-node 4
#SBATCH --cpus-per-task=32
#SBATCH --output=results/%j.out

ml  OpenMPI/5.0.3-GCC-13.3.0 CUDA/11.8.0 NCCL/2.20.5-GCCcore-13.3.0-CUDA-12.4.0

srun  hpl.sh --dat sample-dat/HPL-4GPUs-40.dat

Multi-node

For multi-node, you should notice that every GPU in every node is being used (as otherwise you are misusing the resources). The following jobscript runs the HPL benchmark in 16 GPU and 4 nodes:

#!/bin/bash

#SBATCH -A <slurm_account>
#SBATCH -t 00:30:00
#SBATCH -p normal-a100-40
#SBATCH -N 4
#SBATCH --gpus=16
#SBATCH --tasks-per-node 4
#SBATCH --cpus-per-task=32
#SBATCH --output=results/%j.out

ml  OpenMPI/5.0.3-GCC-13.3.0 CUDA/11.8.0 NCCL/2.20.5-GCCcore-13.3.0-CUDA-12.4.0


srun  hpl.sh --dat sample-dat/HPL-16GPUs-40.dat