Otus FPGA Pilot Phase

Otus FPGA Pilot Phase

We are currently in the pilot phase for the FPGA partition. In this stage, we have acquired a small number of FPGA boards and are evaluating their suitability for our requirements. More FPGAs of different vendors will be added in the near future.

Hardware Setup

Currently we have

in our FPGA pilot phase of Otus.

An overview of the current hardware setup is given in the following table:

 

Accelerator and PCIe slot

 

 

Accelerator and PCIe slot

 

Hostname

61:00.0

71:00.0

91:00.0

note

fpga1611

Alveo U280 FPGA

Alveo U55C

Alveo V80 FPGA

Currently configured with SLASH VRT version 0.1.0, based on AVED 25.1

fpga1612

Alveo V80 FPGA

Alveo U55C

Nvidia A40 GPU

Currently configured with SLASH VRT version 0.1.0, based on AVED 25.1

fpga1613

Nvidia A40 GPU

Nvidia A40 GPU

Alveo V80 FPGA

Currently configured with VRT based on AVED 24.1

Please note:

  • you only need to allocate one of the nodes for hardware execution. You can compile, emulate, simulate and synthesize on any other Otus node.

  • the Nvidia A40 GPUs were used for thermal evaluations of the nodes and will be removed in future.

  • the other FPGA cards (Alveo U55C and Alveo U280) are for test purposes and will be removed in future.

Software Stacks

Currently, we test the following software stacks in our pilot phase.

SLASH VRT (V80 Runtime)

The V80 cards can be used via VRT (similar abstraction to Xilinx XRT)

The VRT tool flow can be used with modules on any nodes of Otus:

module reset module load fpga module load fpga/xilinx/vrt/0.2

Example Applications

VRT comes with example applications (see https://github.com/Xilinx/SLASH/tree/dev/examples) to test the basic functionality.

In order to run them you can do the following:

Load required VRT module

module reset module load fpga module load fpga/xilinx/vrt/0.2
  • loads the main VRT tool flow and all required dependencies

Get example repository

git clone https://github.com/Xilinx/SLASH cd SLASH export SLASHBASE=`pwd` git switch dev git submodule update --init --recursive --remote

Build examples for emulation

cd $SLASHBASE/examples/00_axilite # Prepare build directory. Remove -G Ninja for othe cmake -B build -S . -G Ninja -DSLASH_USE_REPO=OFF # Build host application cmake --build build # Build FPGA artefacts cmake --build build --target hls # compile HLS kernels cmake --build build --target axilite_emu # link into a emulation vrtbin # Run ./build/00_axilite 61:00 build/axilite_emu.vbin # Expected output VRT Version: 0.1.0 EMU_EXEC: [startup] bound REP socket to tcp://*:5555 (verbose=off) EMU_EXEC: [manifest] manifest loaded schema=ok kernels=2 regs=14 callable=2 autostart=0 fetch.scalar=6 Generating data... Time taken for waits: 254 us Expected: 1542.261475 Got: 1542.261475 Absolute error: 0 (effective tolerance 0.001542261452, abs 0.001000000047, rel 9.999999975e-07) Test passed! EMU_EXEC: [exit] received exit; fast_exit=false INFO [HLS SIM]: The maximum depth reached by any hls::stream() instance in the design is 1024
  • 00_axilite: example to test linking and AXI-Lite control

  • emu: everything runs on the CPU. See below for simulation and hardware build.

  • 61:00: Usually the PCIe ID (BDF) parameter of the V80 card to use. Can be any reasonable value in emulation.

Hardware Execution

If you want to execute a design in hardware, you need to allocate an FPGA node that has at least one V80 card attached.

Allocate node with V80 card

You can use this command to get node fpga1612 for one hour:

srun --partition=fpga -t 01:00:00 -w fpga1612 --pty bash

Load the required modules

module reset module load fpga module load fpga/xilinx/vrt/0.2

Check FPGA status

Get the current status of the FPGAs via ami_tool. For example on fpga1612 with two V80 cards:

$ ami_tool overview AMI ------------------------------------------------------------- Version | 2.4.0 (0) Branch Hash | 839b4ad6a75433ab6a43f9f95790a61c2b85bb16 Hash Date | 20250821 Driver Version | 2.4.0 (0) BDF | Device | UUID | AMC | State ---------------------------------------------------------------------------------------- 61:00.0 | ALVEO V80 PQ | 3bc1e0b4c0e8c0c1179791b59272d560 | 2.4.0 (0) | READY
  • the most important output is the State column. Only FPGAs in READY state should be used.

  • you can use the BDF 61:00 to identify one card

  • Use ami_tool -h for all options

Execute example in hardware

To speed-up the process and save resources for unnecessary synthesis we have pre-synthesized the design for example 00_axilite. Copy the vrtbin file into your build directory:

cp /opt/software/FPGA/Xilinx/VRT/vrt_0.2/examples/00_axilite/axilite_hw.vbin .

You can run the design in hardware with

./00_axilite 61:00 ./axilite_hw.vbin
VRT Version: 0.1.0 [2026-05-04 10:21:06.020] [DEBUG] void vrt::Vrtbin::extract() : Extracting vrtbin: axilite_hw.vbin [2026-05-04 10:21:06.234] [DEBUG] void vrt::Vrtbin::copy(const std::string&, const std::string&) : Copying file /pc2/users/d/deffel/.cache/SLASH/vrt/vrtbin_61_00/system_map.xml to /pc2/users/d/deffel/.cache/SLASH/vrt/metadata_61_00/system_map.xml [2026-05-04 10:21:06.239] [DEBUG] void vrt::Vrtbin::copy(const std::string&, const std::string&) : Copying file /pc2/users/d/deffel/.cache/SLASH/vrt/vrtbin_61_00/report_utilization_axilite_hw.xml to /pc2/users/d/deffel/.cache/SLASH/vrt/metadata_61_00/report_utilization.xml [2026-05-04 10:21:06.249] [INFO ] void vrt::impl::Device::programDevice() : Programming PDI via vrtd design writer /pc2/users/d/deffel/.cache/SLASH/vrt/vrtbin_61_00/images/top_i_slash_slash_axilite_hw_inst_0_partial.pdi [2026-05-04 10:21:07.470] [DEBUG] void vrt::Buffer<T>::initAllocate() [with T = float] : Allocated buffer final_space_bytes=4096 phys_addr=0x4020000000 Generating data... [2026-05-04 10:21:07.471] [DEBUG] void vrt::Kernel::writeBatch() : Kernel increment_0, reg at offset 0x10, value: 0x400 [2026-05-04 10:21:07.471] [DEBUG] void vrt::Kernel::writeBatch() : Kernel increment_0, reg at offset 0x18, value: 0x20000000 [2026-05-04 10:21:07.471] [DEBUG] void vrt::Kernel::writeBatch() : Kernel increment_0, reg at offset 0x1c, value: 0x40 [2026-05-04 10:21:07.471] [DEBUG] void vrt::Kernel::write(uint32_t, uint32_t) : Writing to device 61:00 kernel: increment_0 at offset: 0 value: 0x1 [2026-05-04 10:21:07.471] [DEBUG] void vrt::Kernel::writeBatch() : Kernel accumulate_0, reg at offset 0x10, value: 0x400 [2026-05-04 10:21:07.471] [DEBUG] void vrt::Kernel::write(uint32_t, uint32_t) : Writing to device 61:00 kernel: accumulate_0 at offset: 0 value: 0x1 Time taken for waits: 12 us [2026-05-04 10:21:07.471] [DEBUG] uint32_t vrt::Kernel::read(uint32_t) : Reading from device 61:00 kernel: accumulate_0 at offset: 0x1c [2026-05-04 10:21:07.471] [DEBUG] uint32_t vrt::Kernel::read(uint32_t) : Reading from device 61:00 kernel: accumulate_0 at offset: 0x18 Expected: 1540.868286 Got: 1540.867432 Absolute error: 0.0008544921875 (effective tolerance 0.00154086831, abs 0.001000000047, rel 9.999999975e-07) Test passed!

If you want to synthesize a design yourself, see the description below.

(Optional) Repeat steps for simulation and hardware build

Hardware simulation can be performed with

# Simulation cmake --build build --target axilite_sim # execute example cd build ./00_axilite 61:00 00_axilite_sim.vrtbin

In order to synthesize a design instead of using the pre-synthesized version, you can use these steps:

#!/bin/sh # synthesis_script.sh #SBATCH -t 24:00:00 #SBATCH --cpus-per-task=8 #SBATCH --mem=64G #SBATCH -A <your_project_acronym> #SBATCH -p normal module reset module load fpga module load fpga/xilinx/vrt/0.2 # Hardware cmake --build build --target axilite_hw

Then, we submit the synthesis_script.sh to the slurm workload manager:

sbatch ./synthesis_script.sh

Afterwards you can use the generated 00_axilite_hw.vrtbin as described above for hardware execution.

Early Access and Troubleshooting

If you are interested in getting early access to the FPGA partition in the pilot phase or have issues/questions with the current setup, please contact us via Email.