Using NVIDIA GPUs with Julia
Basic Tutorial
Step 1: Get an interactive session on a GPU node
srun -A <YOUR_PROJECT> -N 1 -n 128 --exclusive --gres=gpu:a100:4 -p gpu -t 1:00:00 --pty bash
Step 2: Load the JuliaHPC module
ml lang
ml JuliaHPC
Step 3: Install CUDA.jl in a local Julia project environment
mkdir jlcuda
cd jlcuda
julia --project=.
Once the Julia REPL pops up:
As of April 2022, you should see an output like this:
Note that CUDA.jl is automatically using the local CUDA installation (because the JuliaHPC exports the necessary environment variables) and all four NVIDIA A100 GPUs are detected.
Step 4: Run a matrix multiplication on one of the NVIDIA A100 GPUs
Notice that the multiplication is much faster on the GPU.
CUDA-aware OpenMPI
Allows you to send GPU arrays (i.e. CuArray
from CUDA.jl) via point-to-point and collective MPI operations.
Example
Proceed as in the basic CUDA tutorial above but also ] add MPI
, i.e. install MPI.jl next to CUDA.jl. After using MPI
you can then use MPI.has_cuda()
to check whether the used MPI has been compiled with CUDA support. (The easiest way to get a CUDA-aware MPI that works with Julia is to use the JuliaHPC module and export OMPI_MCA_opal_cuda_support=1
.)
You should now be able to run the following code (if stored in a file cuda_mpi_test.jl
) from the shell via mpirun -n 5 julia --project cuda_mpi_test.jl
.
Code
Output
Useful References
An Introduction to CUDA-Aware MPI (NVIDIA blog post)
Source of CUDA-aware MPI code example