Intel oneAPI Quick Start Guide

This guide will walk you through the six steps required to use the Intel oneAPI FPGA toolkit on Noctua 2.

1. Get the examples.

The latest Intel oneAPI-samples are available on github. The master branch of the repository is always under development for the next release and might be incompatible with the latest version installed on our systems. You can checkout a version specific branch that matches the oneAPI version you are going to use.

You can copy the files from our file system to speed-up the process with

cp -r /opt/software/FPGA/IntelFPGA/oneapi/24.1.0/oneAPI-samples . cd oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile

or clone the repository and checkout the correct version with

git clone https://github.com/oneapi-src/oneAPI-samples.git cd oneAPI-samples/DirectProgramming/C++SYCL_FPGA/Tutorials/GettingStarted/fast_recompile git checkout release/2024.1

Intel grouped the FPGA examples (directory oneAPI-samples/DirectProgramming/C++SYCL_FPGA) into

  • ReferenceDesigns, demonstrates the implementation highly optimized algorithms and applications on an FPGA

    • anr: Adaptive Noise Reduction

    • crr: CRR Binomial Tree Model for Option Pricing

    • db: Database Query Acceleration

    • gzip: GZIP Compression

    • merge_sort: Merge Sort

    • mvdr_beamforming: MVDR Beamforming

    • qrd: QR Decomposition of Matrices

    • qri: QR-based inversion of Matrices

  • Tutorials, which itself consist of

    • DesignPatterns (double buffering, I/O streaming, loop optimizations, …)

    • Features (loop unrolling, pipes, usage of pragmas, …)

    • GettingStarted (our guide is based on this tutorial)

    • Tools (collect data for profiling)

In this quick start we pick the GettingStarted example from the tutorials, but feel free to explore the other options.

2. Setup the local software environment on Noctua2.

module reset module load fpga devel compiler module load intel/oneapi/24.1.0 module load bittware/520n/20.4.0_hpc module load CMake module load GCC

With module reset, previously loaded modules are cleaned up. The first three modules loaded are the gateway modules to the actual modules loaded in lines 3-6. Without version number provided, the latest versions will be loaded. To use a specific version, you can append the version, e.g. intel/oneapi/24.1.0. With the given commands the following modules are loaded

  • intel/oneapi: software stack for oneAPI development

  • bittware_520n: drivers and board support package (BSP) for the Intel Stratix 10 card

  • CMake: helps with compilation and build

  • GCC: the GNU Compiler Collection

Together, these modules setup paths and environment variables, some of which are used in the examples Makefile to specify the Stratix 10 as target card. Observe for example:

Supported oneAPI versions and bittware BSP versions:

The build directory (for cmake) is created and configured with the correct target board (using the environment variables $AOCL_BOARD_PACKAGE_ROOT and $FPGA_BOARD_NAME populated via modules in the previous step).

Expected output

3. Build and test the example in emulation.

Builds the emulation binary called fast_recompile.fpga_emu.

Under the hood, make performs two main steps here (simplified from CMake generated calls)

  • Creating the object files from source files

  • the flag -fintelfpga instructs the compiler to compile for FPGA.

  • the flag -DFPGA_EMULATOR is used in the host code as a pre-processor macro to use either the emulator or real FPGA device for execution.

  • Linking the FPGA emulation binary

      • produces the emulation binary called fast_recompile.fpga_emu.

Note, that the FPGA Emulation Device is selected.

Executes the emulation binary.

Note: the emulation in emulation gives no indication at all about the performance that is to be expected from hardware execution on a real FPGA.

4. Create and inspect reports as indicator of expected HW performance.

5. Build the hardware design (bitstream)

This hardware build step (so-called hardware synthesis) can take lots of time (hours!) and compute resources, so we create a batch script to submit the job to the slurm workload manager. Make sure to put your actual project acronym at the placeholder.

Then, we submit the synthesis_script.sh to the slurm workload manager:

To speed-up the process and save resources for unnecessary synthesis we have pre-synthesized the design. Expand the box below to copy the pre-synthesized design for hardware execution.

6. Execute the hardware design on an FPGA.

After the hardware synthesis, we can allocate a suitably configured and equipped FPGA node and for execution.

To run the design, we load the proper modules and use the corresponding make command on the allocated FPGA node

How to proceed

For more information using the tools, refer to