Otus FPGA Pilot Phase

Otus FPGA Pilot Phase

We are currently in the pilot phase for the FPGA partition. In this stage, we have acquired a small number of FPGA boards and are evaluating their suitability for our requirements. More FPGAs of different vendors will be added in the near future.

Hardware Setup

Currently we have

in our FPGA pilot phase of Otus.

An overview of the current setup is given in the following table:

 

Accelerator and PCIe slot

 

 

Accelerator and PCIe slot

 

Hostname

61:00.0

71:00.0

91:00.0

note

fpga1611

Nvidia A40 GPU

Alveo V80 FPGA

Nvidia A40 GPU

 

fpga1612

Alveo V80 FPGA

Alveo V80 FPGA

Nvidia A40 GPU

 

Please note: the Nvidia A40 GPUs were used for thermal evaluations of the nodes and will be removed in future.

Software Stacks

Currently we test the following software stacks in our pilot phase.

SLASH VRT (V80 Runtime)

The V80 cards can be used via VRT (similar abstraction to Xilinx XRT)

System Status

VRT is deployed on both nodes (fpga1611 and fpga1612).

Get the current status of the FPGAs via ami_tool. For example on fpga1612 with two V80 cards:

$ ami_tool overview AMI ------------------------------------------------------------- Version | 2.3.0 (0) Branch Hash | 0bab29e568f64a25f17425c0ffd1c0e89609b6d1 Hash Date | 20240307 Driver Version | 2.3.0 (0) BDF | Device | UUID | AMC | State ---------------------------------------------------------------------------------------- 71:00.0 | ALVEO V80 PQ | 451a004ee295528fa752794a6f9fbbff | 2.3.0 (0) | READY 61:00.0 | ALVEO V80 PQ | bf8583506e178f1ca1e495b83575a0c1 | 2.3.0 (0) | READY

Use ami_tool -h for all options.

Example Applications

VRT comes with example applications (see https://github.com/Xilinx/SLASH/tree/dev/examples) to test the basic functionality. In order to run them you can do the following:

Allocate an FPGA node that has V80 cards attached. Here fpga1611 for one hour:

srun --partition=fpga -t 01:00:00 -w fpga1611 --pty bash

Load required modules

module reset ml fpga ml xilinx/vitis/24.2 ml devel/CMake/3.29.3-GCCcore-13.3.0

Setup environment

export AMI_HOME=$HOME/.ami export LD_LIBRARY_PATH=/opt/software/FPGA/Xilinx/Vivado/2024.2/lib/lnx64.o/:$LD_LIBRARY_PATH

Get example repository

git clone https://github.com/pc2/SLASH cd SLASH export SLASHBASE=`pwd` git switch dev git submodule update --init --recursive --remote

Build examples, here done for

  • 00_axilite: example to test linking and AXI-Lite control

  • emulation: everything runs on the CPU. See below for simulation and hardware build.

  • 61:00.0: PCIe ID (BDF) of the V80 card to use. See ami_tool overview to get the right BDF.

cd $SLASHBASE/examples/00_axilite # build for emulation make emu_all # execute example cd build ./00_axilite 61:00.0 00_axilite_emu.vrtbin # expected output VRT Version: v1.0.0 Generating data... Time taken for waits: 0 us Expected: 1541.83 Got: 1541.83 Test passed!

Repeat for simulation and hardware build

# Simulation. make sim_all # Hardware make hw_all

Early Access and Troubleshooting

If you are interested in getting early access to the FPGA partition in the pilot phase or have issues/questions with the current setup, please contact us via Email.