Content Comparison

Versions Compared

Version	Old Version 11	New Version 12
Changes made by	Robert Schade	Robert Schade
Saved on	Feb 08, 2023	Feb 08, 2023

Key

This line was added.
This line was removed.
Formatting was changed.

...

If you need instructions or scripts for VASP with plugins, please let us know at pc2-support@uni-paderborn.de.

Performance Recommendations

...

Since most VASP calculations are very demanding on the memory bandwidth, we recommend using nodes exclusively.
Some VASP calculations (e.g. GW and BSE) can be very memory hungry. If the memory of normal compute nodes (192 GB on Noctua 1) is not sufficient, then use the largemem-nodes on Noctua 2 (1 TB).
...
Builder
Compiler
BLAS
MPI
Threading (OMP)
GPU support
Runtime for CuC_vdW benchmark on a single node
Reference runtime on AMD 7763: 296 s (source https://www.hpc.co.jp/library/wp-content/uploads/sites/8/2022/08/NVIDIA-VASP-updates-July-2022.pdf page 13)
6.3.2-builder-for-foss-2022a
gcc-11.3.0
OpenBLAS-0.3.20
OpenMPI 4.1.4
yes
no
256.8 s (8 threads per rank, NCORE=4)
6.3.2-builder-for-foss-2022a_mkl
gcc-11.3.0
MKL-2022.2.1
OpenMPI 4.1.4
yes
no
249.9 s (8 threads per rank, NCORE=4)
6.3.2-builder-for-foss-2022a_aocl
gcc-11.3.0
AOCL-4.0.0
OpenMPI 4.1.4
yes
no
244.3 s (8 threads per rank, NCORE=4)
6.3.2-builder-for-intel-2022.00
intel-2022.1.0
MKL-2022.1.0
IntelMPI 2021.6
yes
no
313253.9 0 s (1 thread 8 threads per rank, NCORE=84)
coming soon 6.3.2-builder-for-nvidia
NVHPC-22.11
OpenBLAS-0.3.20
OpenMPI 3.1.5 (CUDA aware)
yes
yes
...
The product of the number of MPI ranks per node (`ntasks-per-node`) and the number of OpenMP threads per rank (`cpus-per-task`) is the number of allocated CPU cores per node. For a full node of Noctua 2 this prodcut should equal 128 because there are 128 physical cpu-cores per node.

General Performance Recommendations

...

Since most VASP calculations are very demanding on the memory bandwidth, we recommend using nodes exclusively.
Some VASP calculations (e.g. GW and BSE) can be very memory hungry. If the memory of normal compute nodes (256 GB on Noctua 2) is not sufficient, then use the largemem-nodes (1 TB) or the hugemem-node (2 TB).
Adapt parameters like NPAR, NCORE, and KPAR to your calculation, see also https://www.vasp.at/wiki/index.php/Performance_issues,_try_NCORE,_KPAR,_ALGO,_LREAL
For VASP >=6: The newer VASP versions support OpenMP threading in addition to MPI parallelization. You can speed up your calculation by trying different numbers of threads per MPI rank, e.g. 32 MPI ranks per node and 4 threads per MPI rank.
...