NVIDIA A100 40GB with NVLINK
Specification
Property | gpu |
GPUs | 4x NVIDIA A100 TENSOR CORE GPU |
Chip | GA100 |
Transistors | 54 Billion |
Manufacturing | 7 nm |
Chip Size | 826 mm² |
FP32-ALUs | 6.912 |
INT32-ALUs | 6.912 |
SMs | 108 |
Tensor Cores | 432 |
INT4-Performance | 1.248 TOPS / 2.496 TOPS* |
INT8-Performance | 624 TOPS / 1.248 TOPS* |
FP16-Performance | 312 TFLOPS / 624 TFLOPS* |
Bfloat16-Performance | 312 TFLOPS / 624 TFLOPS* |
FP32-Performance | 19,5 TFLOPS |
TF32-TensorCore-Performance | 156 TFLOPS / 312 TFLOPS* |
FP64-Performance | 9,7 TFLOPS |
FP64-TensorCore-Performance | 19,5 TFLOPS |
GPU Memory | 40 GB |
GPU Memory Type | HBM2 |
Memory Interface | 5.120 Bit |
Memory | 1.555 GB/s |
Max Thermal | 400 W |
*With sparsity
Topology
GPU topology is important to efficient GPU usage for some applications, e.g. Gaussian. The Table below shows the GPUs in one GPU compute node, the CPU affinity, and the NUMA affinity.
GPUs | CPU affinity | NUMA affinity |
---|---|---|
NVIDIA A100 GPU 0 | CPU cores 48-63 | NUMA node 3 |
NVIDIA A100 GPU 1 | CPU cores 16-31 | NUMA node 1 |
NVIDIA A100 GPU 2 | CPU cores 112-127 | NUMA node 7 |
NVIDIA A100 GPU 3 | CPU cores 80-95 | NUMA node 5 |
The figure below shows the GPUs and their NUMA affinity.