Otus FPGA Pilot Phase

Otus FPGA Pilot Phase

We are currently in the pilot phase for the FPGA partition. In this stage, we have acquired a small number of FPGA boards and are evaluating their suitability for our requirements. More FPGAs of different vendors will be added in the near future.

Hardware Setup

Currently we have

in our FPGA pilot phase of Otus.

An overview of the current hardware setup is given in the following table:

 

Accelerator and PCIe slot

 

 

Accelerator and PCIe slot

 

Hostname

61:00.0

71:00.0

91:00.0

note

fpga1611

Nvidia A40 GPU

Alveo V80 FPGA

Nvidia A40 GPU

Currently configured with AVED 25.1 without VRT

fpga1612

Alveo V80 FPGA

Alveo V80 FPGA

Nvidia A40 GPU

Currently configured with VRT based on AVED 24.1

Please note:

  • you only need to allocate one of the nodes for hardware execution. You can compile, emulate, simulate and synthesize on any other Otus node.

  • the Nvidia A40 GPUs were used for thermal evaluations of the nodes and will be removed in future.

Software Stacks

Currently, we test the following software stacks in our pilot phase.

SLASH VRT (V80 Runtime)

The V80 cards can be used via VRT (similar abstraction to Xilinx XRT)

The VRT tool flow can be used with modules on any nodes of Otus:

module reset module load fpga module load fpga/xilinx/vrt/0.1

Please note:

  • the Lmod warning regarding ncurses is not critical.

Example Applications

VRT comes with example applications (see https://github.com/Xilinx/SLASH/tree/dev/examples) to test the basic functionality.

In order to run them you can do the following:

Load required VRT module

module reset module load fpga module load fpga/xilinx/vrt/0.1
  • loads the main VRT tool flow and all required dependencies

  • sets up the required environment variables (mainly AMI_HOME)

Get example repository

git clone https://github.com/pc2/SLASH cd SLASH export SLASHBASE=`pwd` git switch dev git submodule update --init --recursive --remote
  • uses tested VRT/SLASH version maintained by PC2

Build examples for emulation

cd $SLASHBASE/examples/00_axilite # build for emulation make emu_all # execute example cd build ./00_axilite 61:00.0 00_axilite_emu.vrtbin # expected output VRT Version: v1.0.0 Generating data... Time taken for waits: 0 us Expected: 1541.83 Got: 1541.83 Test passed!
  • 00_axilite: example to test linking and AXI-Lite control

  • emulation: everything runs on the CPU. See below for simulation and hardware build.

  • 61:00.0: Usually the PCIe ID (BDF) parameter of the V80 card to use. Can be any reasonable value for emulation.

Hardware Execution

If you want to execute a design in hardware, you need to allocate an FPGA node that has at least one V80 card attached.

Allocate node with V80 card

You can use this command to get node fpga1612 for one hour:

srun --partition=fpga -t 01:00:00 -w fpga1612 --pty bash

Load the required modules

module reset module load fpga module load fpga/xilinx/vrt/0.1

Check FPGA status

Get the current status of the FPGAs via ami_tool. For example on fpga1612 with two V80 cards:

$ ami_tool overview AMI ------------------------------------------------------------- Version | 2.3.0 (0) Branch Hash | 0bab29e568f64a25f17425c0ffd1c0e89609b6d1 Hash Date | 20240307 Driver Version | 2.3.0 (0) BDF | Device | UUID | AMC | State ---------------------------------------------------------------------------------------- 61:00.0 | ALVEO V80 PQ | c451a8335000954c2f45abc32d98c87e | 2.3.0 (0) | READY 91:00.0 | ALVEO V80 PQ | 4a424e194c90ae9bc94fdc95d3c191fe | 2.3.0 (0) | READY
  • the most important output is the State column. Only FPGAs in READY state should be used.

  • you can use the BDF 61:00.0 or 91:00.0 to identify one of the cards

  • Use ami_tool -h for all options

Execute example in hardware

To speed-up the process and save resources for unnecessary synthesis we have pre-synthesized the design for example 00_axilite. The vrtbin file with the hardware design is located at:

/opt/software/FPGA/Xilinx/VRT/vrt_0.1/examples/00_axilite_hw.vrtbin

You can run the design in hardware with

./00_axilite 91:00.0 /opt/software/FPGA/Xilinx/VRT/vrt_0.1/examples/00_axilite_hw.vrtbin
./00_axilite 91:00.0 /opt/software/FPGA/Xilinx/VRT/vrt_0.1/examples/00_axilite_hw.vrtbin VRT Version: v1.0.0 [2025-11-28 08:29:33.186] [DEBUG] vrt::Vrtbin::Vrtbin(std::string, const string&) : AMI_HOME: /pc2/users/d/deffel/.ami [2025-11-28 08:29:33.186] [DEBUG] vrt::Vrtbin::Vrtbin(std::string, const string&) : Running command: mkdir -p /pc2/users/d/deffel/.ami/91:00.0 [2025-11-28 08:29:33.195] [DEBUG] void vrt::Vrtbin::extract() : Extracting vrtbin: /opt/software/FPGA/Xilinx/VRT/vrt_0.1/examples/00_axilite_hw.vrtbin [2025-11-28 08:29:33.249] [DEBUG] void vrt::Vrtbin::copy(const string&, const string&) : Copying file /pc2/users/d/deffel/.cache/SLASH/vrt/system_map.xml to /pc2/users/d/deffel/.ami/91:00.0/system_map.xml [2025-11-28 08:29:33.252] [DEBUG] void vrt::Vrtbin::copy(const string&, const string&) : Copying file /pc2/users/d/deffel/.cache/SLASH/vrt/version.json to /pc2/users/d/deffel/.ami/91:00.0/version.json [2025-11-28 08:29:33.256] [DEBUG] void vrt::Vrtbin::copy(const string&, const string&) : Copying file /pc2/users/d/deffel/.cache/SLASH/vrt/report_utilization.xml to /pc2/users/d/deffel/.ami/91:00.0/report_utilization.xml [2025-11-28 08:29:33.260] [DEBUG] void vrt::Vrtbin::extractUUID() : Extracting UUID from version.json [2025-11-28 08:29:33.261] [DEBUG] void vrt::Vrtbin::extractUUID() : UUID is: 4a424e194c90ae9bc94fdc95d3c191fe [2025-11-28 08:29:33.264] [INFO ] void vrt::Device::programDevice() : Programming device 91:00.0 in SEGMENTED mode...This might take a while [2025-11-28 08:29:33.264] [INFO ] void vrt::Device::programDevice() : Current UUID: 4a424e194c90ae9bc94fdc95d3c191fe [2025-11-28 08:29:33.264] [INFO ] void vrt::Device::programDevice() : New UUID: 4a424e194c90ae9bc94fdc95d3c191fe [2025-11-28 08:29:33.264] [INFO ] void vrt::Device::programDevice() : Device already programmed with the same image [2025-11-28 08:29:33.264] [INFO ] void vrt::Device::programDevice() : Refreshing qdma handle [2025-11-28 08:29:33.264] [DEBUG] void vrt::PcieDriverHandler::execute(vrt::PcieDriverHandler::Command) : Executing command: hotplug for PCIe device 91:00.0 [2025-11-28 08:29:35.675] [DEBUG] void vrt::ClkWiz::setRateHz(uint64_t, bool) : Starting dynamic reconfiguration [2025-11-28 08:29:35.675] [DEBUG] void vrt::Kernel::write(uint32_t, uint32_t) : Writing to device kernel: clk_wiz at offset: 0x3f0 value: 0 [2025-11-28 08:29:35.675] [DEBUG] void vrt::ClkWiz::calculateDivisorsHz(uint64_t) : M: 115, D: 3, O: 16 [2025-11-28 08:29:35.676] [DEBUG] void vrt::Kernel::write(uint32_t, uint32_t) : Writing to device kernel: clk_wiz at offset: 0x338 value: 0x1a00 [2025-11-28 08:29:35.676] [DEBUG] void vrt::Kernel::write(uint32_t, uint32_t) : Writing to device kernel: clk_wiz at offset: 0x33c value: 0x404 [2025-11-28 08:29:35.676] [DEBUG] void vrt::ClkWiz::updateO() : O value is: 16 [2025-11-28 08:29:35.676] [DEBUG] void vrt::Kernel::write(uint32_t, uint32_t) : Writing to device kernel: clk_wiz at offset: 0x380 value: 0x400 [2025-11-28 08:29:35.676] [DEBUG] void vrt::Kernel::write(uint32_t, uint32_t) : Writing to device kernel: clk_wiz at offset: 0x384 value: 0x101 [2025-11-28 08:29:35.676] [DEBUG] void vrt::ClkWiz::updateD() : D value is: 3 [2025-11-28 08:29:35.676] [DEBUG] void vrt::Kernel::write(uint32_t, uint32_t) : Writing to device kernel: clk_wiz at offset: 0x3f0 value: 0 [2025-11-28 08:29:35.676] [DEBUG] void vrt::Kernel::write(uint32_t, uint32_t) : Writing to device kernel: clk_wiz at offset: 0x334 value: 0x3939 [2025-11-28 08:29:35.676] [DEBUG] void vrt::Kernel::write(uint32_t, uint32_t) : Writing to device kernel: clk_wiz at offset: 0x330 value: 0x1700 [2025-11-28 08:29:35.676] [DEBUG] void vrt::ClkWiz::updateM() : M value is: 115 [2025-11-28 08:29:35.676] [DEBUG] void vrt::Kernel::write(uint32_t, uint32_t) : Writing to device kernel: clk_wiz at offset: 0x378 value: 0x2e [2025-11-28 08:29:35.676] [DEBUG] void vrt::Kernel::write(uint32_t, uint32_t) : Writing to device kernel: clk_wiz at offset: 0x398 value: 0xe80 [2025-11-28 08:29:35.676] [DEBUG] void vrt::Kernel::write(uint32_t, uint32_t) : Writing to device kernel: clk_wiz at offset: 0x39c value: 0x4271 [2025-11-28 08:29:35.676] [DEBUG] void vrt::Kernel::write(uint32_t, uint32_t) : Writing to device kernel: clk_wiz at offset: 0x3a0 value: 0x43e9 [2025-11-28 08:29:35.676] [DEBUG] void vrt::Kernel::write(uint32_t, uint32_t) : Writing to device kernel: clk_wiz at offset: 0x3a8 value: 0x1c [2025-11-28 08:29:35.676] [DEBUG] void vrt::Kernel::write(uint32_t, uint32_t) : Writing to device kernel: clk_wiz at offset: 0x3fc value: 0x1 [2025-11-28 08:29:35.676] [DEBUG] void vrt::Kernel::write(uint32_t, uint32_t) : Writing to device kernel: clk_wiz at offset: 0x14 value: 0x3 [2025-11-28 08:29:35.676] [DEBUG] uint32_t vrt::ClkWiz::waitForLock() : Waiting for clock lock [2025-11-28 08:29:35.676] [DEBUG] uint32_t vrt::Kernel::read(uint32_t) : Reading from device kernel: clk_wiz at offset: 0x33c [2025-11-28 08:29:35.676] [DEBUG] uint32_t vrt::ClkWiz::waitForLock() : Clock locked [2025-11-28 08:29:35.676] [DEBUG] uint32_t vrt::Kernel::read(uint32_t) : Reading from device kernel: clk_wiz at offset: 0x330 [2025-11-28 08:29:35.676] [DEBUG] uint32_t vrt::Kernel::read(uint32_t) : Reading from device kernel: clk_wiz at offset: 0x334 [2025-11-28 08:29:35.676] [DEBUG] uint32_t vrt::Kernel::read(uint32_t) : Reading from device kernel: clk_wiz at offset: 0x384 [2025-11-28 08:29:35.676] [DEBUG] uint32_t vrt::Kernel::read(uint32_t) : Reading from device kernel: clk_wiz at offset: 0x380 [2025-11-28 08:29:35.676] [DEBUG] uint64_t vrt::ClkWiz::getVco() : VCO value is: 3833333333 done Generating data... size of float: 4 bytes [2025-11-28 08:29:35.676] [DEBUG] void vrt::QdmaIntf::write_buff(char*, uint64_t, uint64_t) : Writing buffer with size: 0x1000 to /dev/qdma91001-MM-0 at address 0x4000000000 [2025-11-28 08:29:35.676] [DEBUG] void vrt::Kernel::writeBatch() : Kernel increment_0, reg at offset 0x10, value: 0x400 [2025-11-28 08:29:35.676] [DEBUG] void vrt::Kernel::writeBatch() : Kernel increment_0, reg at offset 0x14, value: 0 [2025-11-28 08:29:35.676] [DEBUG] void vrt::Kernel::writeBatch() : Kernel increment_0, reg at offset 0x18, value: 0 [2025-11-28 08:29:35.676] [DEBUG] void vrt::Kernel::writeBatch() : Kernel increment_0, reg at offset 0x1c, value: 0x40 [2025-11-28 08:29:35.676] [DEBUG] void vrt::Kernel::write(uint32_t, uint32_t) : Writing to device 91:00.0 kernel: increment_0 at offset: 0 value: 0x1 [2025-11-28 08:29:35.676] [DEBUG] void vrt::Kernel::writeBatch() : Kernel accumulate_0, reg at offset 0x10, value: 0x400 [2025-11-28 08:29:35.676] [DEBUG] void vrt::Kernel::writeBatch() : Kernel accumulate_0, reg at offset 0x14, value: 0 [2025-11-28 08:29:35.676] [DEBUG] void vrt::Kernel::writeBatch() : Kernel accumulate_0, reg at offset 0x18, value: 0 [2025-11-28 08:29:35.676] [DEBUG] void vrt::Kernel::writeBatch() : Kernel accumulate_0, reg at offset 0x1c, value: 0 [2025-11-28 08:29:35.676] [DEBUG] void vrt::Kernel::write(uint32_t, uint32_t) : Writing to device 91:00.0 kernel: accumulate_0 at offset: 0 value: 0x1 Time taken for waits: 35 us [2025-11-28 08:29:35.676] [DEBUG] uint32_t vrt::Kernel::read(uint32_t) : Reading from device 91:00.0 kernel: accumulate_0 at offset: 0x18 [2025-11-28 08:29:35.676] [DEBUG] uint32_t vrt::Kernel::read(uint32_t) : Reading from device 91:00.0 kernel: accumulate_0 at offset: 0x10 Output size from accumulate: 1024 Expected: 1517.92 Got: 1517.92 Test passed!

If you want to synthesize a design yourself, see the description below.

(Optional) Repeat steps for simulation and hardware build

Hardware simulation can be performed with

# Simulation make sim_all # execute example cd build ./00_axilite 61:00.0 00_axilite_sim.vrtbin

In order to synthesize a design instead of using the pre-synthesized version, you can use these steps:

#!/bin/sh # synthesis_script.sh #SBATCH -t 24:00:00 #SBATCH --cpus-per-task=8 #SBATCH --mem=64G #SBATCH -A <your_project_acronym> #SBATCH -p normal module reset module load fpga module load fpga/xilinx/vrt/0.1 # Hardware make hw_all

Then, we submit the synthesis_script.sh to the slurm workload manager:

sbatch ./synthesis_script.sh

Afterwards you can use the generated 00_axilite_hw.vrtbin as described above for hardware execution.

Early Access and Troubleshooting

If you are interested in getting early access to the FPGA partition in the pilot phase or have issues/questions with the current setup, please contact us via Email.