...
If you need instructions or scripts for VASP with plugins, please let us know at pc2-support@uni-paderborn.de.
Performance Recommendations
...
Since most VASP calculations are very demanding on the memory bandwidth, we recommend using nodes exclusively.
Some VASP calculations (e.g. GW and BSE) can be very memory hungry. If the memory of normal compute nodes (192 GB on Noctua 1) is not sufficient, then use the largemem-nodes on Noctua 2 (1 TB).
...
Builder | Compiler | BLAS | MPI | Threading (OMP) | GPU support | Runtime for CuC_vdW benchmark on a single node |
---|---|---|---|---|---|---|
6.3.2-builder-for-foss-2022a | gcc-11.3.0 | OpenBLAS-0.3.20 | OpenMPI 4.1.4 | yes | no | 256.8 s (8 threads per rank, NCORE=4) |
6.3.2-builder-for-foss-2022a_mkl | gcc-11.3.0 | MKL-2022.2.1 | OpenMPI 4.1.4 | yes | no | 249.9 s (8 threads per rank, NCORE=4) |
6.3.2-builder-for-foss-2022a_aocl | gcc-11.3.0 | AOCL-4.0.0 | OpenMPI 4.1.4 | yes | no | 244.3 s (8 threads per rank, NCORE=4) |
6.3.2-builder-for-intel-2022.00 | intel-2022.1.0 | MKL-2022.1.0 | IntelMPI 2021.6 | yes | no | 313253.9 0 s (1 thread 8 threads per rank, NCORE=84) |
coming soon 6.3.2-builder-for-nvidia | NVHPC-22.11 | OpenBLAS-0.3.20 | OpenMPI 3.1.5 (CUDA aware) | yes | yes |
...
The product of the number of MPI ranks per node (
ntasks-per-node
) and the number of OpenMP threads per rank (cpus-per-task
) is the number of allocated CPU cores per node. For a full node of Noctua 2 this prodcut should equal 128 because there are 128 physical cpu-cores per node.
General Performance Recommendations
...
Since most VASP calculations are very demanding on the memory bandwidth, we recommend using nodes exclusively.
Some VASP calculations (e.g. GW and BSE) can be very memory hungry. If the memory of normal compute nodes (256 GB on Noctua 2) is not sufficient, then use the largemem-nodes (1 TB) or the hugemem-node (2 TB).
Adapt parameters like NPAR, NCORE, and KPAR to your calculation, see also https://www.vasp.at/wiki/index.php/Performance_issues,_try_NCORE,_KPAR,_ALGO,_LREAL
For VASP >=6: The newer VASP versions support OpenMP threading in addition to MPI parallelization. You can speed up your calculation by trying different numbers of threads per MPI rank, e.g. 32 MPI ranks per node and 4 threads per MPI rank.
...