Otus FPGA Pilot Phase
We are currently in the pilot phase for the FPGA partition. In this stage, we have acquired a small number of FPGA boards and are evaluating their suitability for our requirements. More FPGAs of different vendors will be added in the near future.
- 1 Hardware Setup
- 2 Software Stacks
- 2.1 SLASH VRT (V80 Runtime)
- 2.1.1 Example Applications
- 2.1.1.1 Load required VRT module
- 2.1.1.2 Get example repository
- 2.1.1.3 Build examples for emulation
- 2.1.2 Hardware Execution
- 2.1.2.1 Allocate node with V80 card
- 2.1.2.2 Load the required modules
- 2.1.2.3 Check FPGA status
- 2.1.2.4 Execute example in hardware
- 2.1.2.5 (Optional) Repeat steps for simulation and hardware build
- 2.1.1 Example Applications
- 2.1 SLASH VRT (V80 Runtime)
- 3 Early Access and Troubleshooting
Hardware Setup
Currently we have
3x Alveo V80 FPGAs (see official product page)
in our FPGA pilot phase of Otus.
An overview of the current hardware setup is given in the following table:
| Accelerator and PCIe slot |
| ||
|---|---|---|---|---|
Hostname | 61:00.0 | 71:00.0 | 91:00.0 | note |
fpga1611 | Alveo U280 FPGA | Alveo U55C | Alveo V80 FPGA | Currently configured with SLASH VRT version 0.1.0, based on AVED 25.1 |
fpga1612 | Alveo V80 FPGA | Alveo U55C | Nvidia A40 GPU | Currently configured with SLASH VRT version 0.1.0, based on AVED 25.1 |
fpga1613 | Nvidia A40 GPU | Nvidia A40 GPU | Alveo V80 FPGA | Currently configured with VRT based on AVED 24.1 |
Please note:
you only need to allocate one of the nodes for hardware execution. You can compile, emulate, simulate and synthesize on any other Otus node.
the Nvidia A40 GPUs were used for thermal evaluations of the nodes and will be removed in future.
the other FPGA cards (Alveo U55C and Alveo U280) are for test purposes and will be removed in future.
Software Stacks
Currently, we test the following software stacks in our pilot phase.
SLASH VRT (V80 Runtime)
The V80 cards can be used via VRT (similar abstraction to Xilinx XRT)
see technical details at official repository: https://github.com/Xilinx/SLASH
the latest code base from
devbranch is used.
The VRT tool flow can be used with modules on any nodes of Otus:
module reset
module load fpga
module load fpga/xilinx/vrt/0.2Example Applications
VRT comes with example applications (see https://github.com/Xilinx/SLASH/tree/dev/examples) to test the basic functionality.
In order to run them you can do the following:
Load required VRT module
module reset
module load fpga
module load fpga/xilinx/vrt/0.2loads the main VRT tool flow and all required dependencies
Get example repository
git clone https://github.com/Xilinx/SLASH
cd SLASH
export SLASHBASE=`pwd`
git switch dev
git submodule update --init --recursive --remoteBuild examples for emulation
cd $SLASHBASE/examples/00_axilite
# Prepare build directory. Remove -G Ninja for othe
cmake -B build -S . -G Ninja -DSLASH_USE_REPO=OFF
# Build host application
cmake --build build
# Build FPGA artefacts
cmake --build build --target hls # compile HLS kernels
cmake --build build --target axilite_emu # link into a emulation vrtbin
# Run
./build/00_axilite 61:00 build/axilite_emu.vbin
# Expected output
VRT Version: 0.1.0
EMU_EXEC: [startup] bound REP socket to tcp://*:5555 (verbose=off)
EMU_EXEC: [manifest] manifest loaded schema=ok kernels=2 regs=14 callable=2 autostart=0 fetch.scalar=6
Generating data...
Time taken for waits: 254 us
Expected: 1542.261475
Got: 1542.261475
Absolute error: 0 (effective tolerance 0.001542261452, abs 0.001000000047, rel 9.999999975e-07)
Test passed!
EMU_EXEC: [exit] received exit; fast_exit=false
INFO [HLS SIM]: The maximum depth reached by any hls::stream() instance in the design is 102400_axilite: example to test linking and AXI-Lite controlemu: everything runs on the CPU. See below for simulation and hardware build.61:00: Usually the PCIe ID (BDF) parameter of the V80 card to use. Can be any reasonable value in emulation.
Hardware Execution
If you want to execute a design in hardware, you need to allocate an FPGA node that has at least one V80 card attached.
Allocate node with V80 card
You can use this command to get node fpga1612 for one hour:
srun --partition=fpga -t 01:00:00 -w fpga1612 --pty bashLoad the required modules
module reset
module load fpga
module load fpga/xilinx/vrt/0.2Check FPGA status
Get the current status of the FPGAs via ami_tool. For example on fpga1612 with two V80 cards:
$ ami_tool overview
AMI
-------------------------------------------------------------
Version | 2.4.0 (0)
Branch
Hash | 839b4ad6a75433ab6a43f9f95790a61c2b85bb16
Hash Date | 20250821
Driver Version | 2.4.0 (0)
BDF | Device | UUID | AMC | State
----------------------------------------------------------------------------------------
61:00.0 | ALVEO V80 PQ | 3bc1e0b4c0e8c0c1179791b59272d560 | 2.4.0 (0) | READY the most important output is the
Statecolumn. Only FPGAs inREADYstate should be used.you can use the
BDF61:00to identify one cardUse
ami_tool -hfor all options
Execute example in hardware
To speed-up the process and save resources for unnecessary synthesis we have pre-synthesized the design for example 00_axilite. Copy the vrtbin file into your build directory:
cp /opt/software/FPGA/Xilinx/VRT/vrt_0.2/examples/00_axilite/axilite_hw.vbin .You can run the design in hardware with
./00_axilite 61:00 ./axilite_hw.vbinuses device
61:00and the pre-synthesized design00_axilite_hw.vrtbinif you do not have the host code, see the emulation build and execution
If you want to synthesize a design yourself, see the description below.
(Optional) Repeat steps for simulation and hardware build
Hardware simulation can be performed with
# Simulation
cmake --build build --target axilite_sim
# execute example
cd build
./00_axilite 61:00 00_axilite_sim.vrtbinIn order to synthesize a design instead of using the pre-synthesized version, you can use these steps:
#!/bin/sh
# synthesis_script.sh
#SBATCH -t 24:00:00
#SBATCH --cpus-per-task=8
#SBATCH --mem=64G
#SBATCH -A <your_project_acronym>
#SBATCH -p normal
module reset
module load fpga
module load fpga/xilinx/vrt/0.2
# Hardware
cmake --build build --target axilite_hwThen, we submit the synthesis_script.sh to the slurm workload manager:
sbatch ./synthesis_script.shAfterwards you can use the generated 00_axilite_hw.vrtbin as described above for hardware execution.
Early Access and Troubleshooting
If you are interested in getting early access to the FPGA partition in the pilot phase or have issues/questions with the current setup, please contact us via Email.