Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

If you need instructions or scripts for VASP with plugins, please let us know at pc2-support@uni-paderborn.de.

Performance Recommendations

...

  • Since most VASP calculations are very demanding on the memory bandwidth, we recommend using nodes exclusively.

  • Some VASP calculations (e.g. GW and BSE) can be very memory hungry. If the memory of normal compute nodes (192 GB on Noctua 1) is not sufficient, then use the largemem-nodes on Noctua 2 (1 TB).

...

Builder

Compiler

BLAS

MPI

Threading (OMP)

GPU support

Runtime for CuC_vdW benchmark on a single node
Reference runtime on AMD 7763: 296 s (source https://www.hpc.co.jp/library/wp-content/uploads/sites/8/2022/08/NVIDIA-VASP-updates-July-2022.pdf page 13)

6.3.2-builder-for-foss-2022a

gcc-11.3.0

OpenBLAS-0.3.20

OpenMPI 4.1.4

yes

no

256.8 s (8 threads per rank, NCORE=4)

6.3.2-builder-for-foss-2022a_mkl

gcc-11.3.0

MKL-2022.2.1

OpenMPI 4.1.4

yes

no

249.9 s (8 threads per rank, NCORE=4)

6.3.2-builder-for-foss-2022a_aocl

gcc-11.3.0

AOCL-4.0.0

OpenMPI 4.1.4

yes

no

244.3 s (8 threads per rank, NCORE=4)

6.3.2-builder-for-intel-2022.00

intel-2022.1.0

MKL-2022.1.0

IntelMPI 2021.6

yes

no

313253.9 0 s (1 thread 8 threads per rank, NCORE=84)

coming soon 6.3.2-builder-for-nvidia

NVHPC-22.11

OpenBLAS-0.3.20

OpenMPI 3.1.5 (CUDA aware)

yes

yes

...

  • The product of the number of MPI ranks per node (ntasks-per-node) and the number of OpenMP threads per rank (cpus-per-task) is the number of allocated CPU cores per node. For a full node of Noctua 2 this prodcut should equal 128 because there are 128 physical cpu-cores per node.

General Performance Recommendations

...

  • Since most VASP calculations are very demanding on the memory bandwidth, we recommend using nodes exclusively.

  • Some VASP calculations (e.g. GW and BSE) can be very memory hungry. If the memory of normal compute nodes (256 GB on Noctua 2) is not sufficient, then use the largemem-nodes (1 TB) or the hugemem-node (2 TB).

  • Adapt parameters like NPAR, NCORE, and KPAR to your calculation, see also https://www.vasp.at/wiki/index.php/Performance_issues,_try_NCORE,_KPAR,_ALGO,_LREAL

  • For VASP >=6: The newer VASP versions support OpenMP threading in addition to MPI parallelization. You can speed up your calculation by trying different numbers of threads per MPI rank, e.g. 32 MPI ranks per node and 4 threads per MPI rank.

...