GPU jobs
Single-GPU
All GPU nodes in deucalion (gnx[501-533]) are non-exclusive, meaning that one can allocate (and consequently be billed) for any number of GPU they ask. If your code can only use one GPU at a time you can use this template for a batch script to start:
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --gpus=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=32
#SBATCH --time=4:00:00
#SBATCH --partition normal-a100-40
#SBATCH --gpus=1
#SBATCH --account=<slurm_account> ##should end in G
ml OpenMPI/5.0.3-GCC-13.3.0 CUDA/11.8.0 NCCL/2.20.5-GCCcore-13.3.0-CUDA-12.4.0
srun -n1 code input0
--ntasks and --ngpus as the same value and put --cpus-per-task to 32). If you do ask for more than 32 cpu you will be billed for more than 1 GPU at the time. For example, if you ask for 96 CPU (32 CPU x 3) you will be billed as if you are using 3 GPU.
Multi-GPU in single node
Even though some python codes automatically grab every GPU available (without requiring the explicit use of srun), it is useful to look at the output of nvidia-smi to guarantee that you are running at least one process per GPU. The following jobscript runs the HPL benchmark in 4 GPU using srun (single node). If you do not select -N 1 slurm can allocate your GPU in different nodes.
#!/bin/bash
#SBATCH -A <slurm_account>
#SBATCH -t 00:30:00
#SBATCH -p normal-a100-40
#SBATCH -N 1
#SBATCH --gpus=4
#SBATCH --tasks-per-node 4
#SBATCH --cpus-per-task=32
#SBATCH --output=results/%j.out
ml OpenMPI/5.0.3-GCC-13.3.0 CUDA/11.8.0 NCCL/2.20.5-GCCcore-13.3.0-CUDA-12.4.0
srun hpl.sh --dat sample-dat/HPL-4GPUs-40.dat
Multi-node
For multi-node, you should notice that every GPU in every node is being used (as otherwise you are misusing the resources). The following jobscript runs the HPL benchmark in 16 GPU and 4 nodes:
#!/bin/bash
#SBATCH -A <slurm_account>
#SBATCH -t 00:30:00
#SBATCH -p normal-a100-40
#SBATCH -N 4
#SBATCH --gpus=16
#SBATCH --tasks-per-node 4
#SBATCH --cpus-per-task=32
#SBATCH --output=results/%j.out
ml OpenMPI/5.0.3-GCC-13.3.0 CUDA/11.8.0 NCCL/2.20.5-GCCcore-13.3.0-CUDA-12.4.0
srun hpl.sh --dat sample-dat/HPL-16GPUs-40.dat