Using HACC Nodes
Next to our main FPGA installation in Noctua 2, the cluster also contains three nodes provided by AMD as part of the HACC initiative. Each node consists of
2x Xilinx Alveo U55C cards
2x Xilinx VCK5000 Versal development cards
4x AMD Instinct MI210 GPU cards
In contrast to the main fpga
partition, the FPGA boards are configured with a fixed shells (not selectable by the user) and XRT version 2.16 is installed.
Allocate HACC Node
The three HACC nodes are contained in a separate hacc
partition. To be able to submit jobs to this partition, your compute project needs to be enabled to use the HACC resources. Please send a brief mail to pc2-support@uni-paderborn.de stating your compute project and what resources you would like to use.
The HACC nodes are handed out exclusively, i.e., sharing a node between multiple jobs is not possible. Because the shell is not user-selectable, nodes can be allocated without any constraint:
[tester@n2login1 ~]$ srun -p hacc -A $YOUR_PROJECT_ID -t 00:10:00 --pty bash
[...]
# Show available FPGAs
[tester@n2hacc03 ~]$ /opt/xilinx/xrt/bin/xbutil examine
System Configuration
OS Name : Linux
Release : 4.18.0-477.51.1.el8_8.x86_64
Version : #1 SMP Fri Mar 1 11:21:44 EST 2024
Machine : x86_64
CPU Cores : 128
Memory : 515287 MB
Distribution : Red Hat Enterprise Linux 8.8 (Ootpa)
GLIBC : 2.28
Model : AS -4124GS-TNR
XRT
Version : 2.16.204
Branch : 2023.2
Hash : fa4c0045003fed0acea4593788dce5ef6d0b66ee
Hash Date : 2023-10-12 06:45:18
XOCL : 2.16.204, fa4c0045003fed0acea4593788dce5ef6d0b66ee
XCLMGMT : 2.16.204, fa4c0045003fed0acea4593788dce5ef6d0b66ee
Devices present
BDF : Shell Logic UUID Device ID Device Ready*
---------------------------------------------------------------------------------------------------------------------------
[0000:81:00.1] : xilinx_u55c_gen3x16_xdma_base_3 97088961-FEAE-DA91-52A2-1D9DFD63CCEF user(inst=134) Yes
[0000:a1:00.1] : xilinx_vck5000_gen4x8_qdma_base_2 05DCA096-76CB-730B-8D19-EC1192FBAE3F user(inst=135) Yes
[0000:c1:00.1] : xilinx_u55c_gen3x16_xdma_base_3 97088961-FEAE-DA91-52A2-1D9DFD63CCEF user(inst=133) Yes
[0000:e1:00.1] : xilinx_vck5000_gen4x8_qdma_base_2 05DCA096-76CB-730B-8D19-EC1192FBAE3F user(inst=132) Yes
* Devices that are not ready will have reduced functionality when using XRT tools
# Show available GPUs.
[tester@n2hacc03 ~]$ /usr/bin/rocm-smi
========================================= ROCm System Management Interface =========================================
=================================================== Concise Info ===================================================
Device Node IDs Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU%
(DID, GUID) (Edge) (Avg) (Mem, Compute, ID)
====================================================================================================================
0 10 0x740f, 12261 44.0°C 38.0W N/A, N/A, 0 800Mhz 1600Mhz 0% auto 300.0W 0% 0%
1 11 0x740f, 42047 41.0°C 42.0W N/A, N/A, 0 800Mhz 1600Mhz 0% auto 300.0W 0% 0%
2 9 0x740f, 57300 41.0°C 42.0W N/A, N/A, 0 800Mhz 1600Mhz 0% auto 300.0W 0% 0%
3 8 0x740f, 1997 38.0°C 42.0W N/A, N/A, 0 800Mhz 1600Mhz 0% auto 300.0W 0% 0%
====================================================================================================================
=============================================== End of ROCm SMI Log ================================================
In the example output you can also see
xbutil examine
: the four FPGA boards, along with their user BDFs. Make sure to select the correct device type in your host code.rocm-smi
: the four MI210 GPUs.
Software Modules
FPGAs
Depending on what kind of board you are targeting, you need to load the matching FPGA shell module. Note that by default the module for Noctua 2’s Alveo U280 boards will be loaded. On the HACC nodes, you need to swap it against the one for your target board:
module load fpga
module load xilinx/xrt/2.16
# Use one of the following commands to swap the shell module against the one you need
module swap xilinx/u280 xilinx/u55c
# OR
module swap xilinx/u280 xilinx/vck5000
Additional Notes on the Use of AI Engines
The VCK5000 boards contain AI Engines that can be programmed using Vitis. To use the AIE compiler and the AIE simulator, additional modules and licenses are required:
Vitis 23.2, which is automatically loaded with XRT 2.16, is currently not able to perform hardware synthesis for the VCK5000 board. If you target hardware synthesis, you may load the module
xilinx/xrt/2.15
instead, or swap out Vitis against an older version:ml swap xilinx/vitis/23.2 xilinx/vitis/23.1
In addition to the modules listed above, you may need to load the Graphviz module:
The AIE compiler and simulator require separate software licenses that are not included in Vitis. If you need help in acquiring these licenses from AMD, please get in touch with us.
MI210 GPUs
ROCm (Radeon Open Compute)
Load modules
Get ROCm examples
Build rocm-examples/HIP-Basic/bandwidth
example
Run example on all four GPU cards
AdaptiveCPP
Load modules
Get examples (here oneAPI-examples
)
Build vector-add
example with acpp
compiler
Please note
we are using the
generic
LLVM JIT compiler, if we do not explicitly specify--acpp-targets
. This is AdaptiveCpp's default, most portable and usually most performant compilation flow.the generated binary is usable across various backend devices.
Run same binary on various backend devices:
On AMD MI210 (by allocating a node in thehacc
partition)
Note the Running on device
output in line 16 is AMD Instinct MI210
.
On NVIDIA A100 GPU (by allocating a single A100 GPU in thegpu
partition)
Note the Running on device
output in line 4 is NVIDIA A100-SXM4-40GB
.
On host CPU (AMD EPYC 7763 processor in normal
partition)
Note the Running on device
output in line 6 is AdaptiveCpp OpenMP host device
indicating the host CPU system.