Intel OpenCL Quick Start Guide

This guide will walk you walk you through the six steps required to use the Intel OpenCL FPGA toolkit on Noctua 2.

1. Get the latest examples.

We will use the vector_add example code that is shipped with the Intel FPGA SDK for OpenCL.

We recommend working in /scratch/ because FPGA designs consume a considerable amount of disk space. Navigate to the directory assigned to your project under /scratch/ and create a working directory for this example:

cd /scratch/[DIRECTORY_ASSIGNED_TO_YOUR_PROJECT] mkdir getting_started_with_fpgas cd getting_started_with_fpgas

After, copy the vector_add example into your getting_started_with_fpgas workspace:

cp -r /opt/software/FPGA/IntelFPGA/opencl_sdk/21.4.0/hld/examples_aoc/common . cp -r /opt/software/FPGA/IntelFPGA/opencl_sdk/21.4.0/hld/examples_aoc/vector_add .
  • common: Includes helper and utility functions to interface with the FPGA

  • vector_add: Includes the actual application code

Other available examples to try are:

asian_option channelizer compression compute_score double_buffering fd3d fft1d fft1d_offchip fft2d hello_world jpeg_decoder library_example1 library_example2 library_hls_sot library_matrix_mult local_memory_cache loopback_hostpipe mandelbrot matrix_mult multithread_vector_operation n_way_buffering optical_flow sobel_filter tdfir vector_add <--- used in this guide video_downscaling web

2. Setup the local software environment on Noctua2.

With module reset, previously loaded modules are cleaned up. The first module loaded, fpga, is a gateway module to the actual modules loaded in lines 3-4. Without version number provided, the latest versions will be loaded. To use a specific version, you can append the version, e.g. intel/opencl_sdk/21.4.0. All available versions can be queried with module avail intel/opencl_sdk With the given commands the following modules are loaded

  • intel/opencl_sdk: Loads the compilation infrastructure for Intel OpenCL FPGA code

  • bittware/520n: Loads the drivers and board support package (BSP) for the Intel Stratix 10 card

Together, these modules setup paths and environment variables, some of which are used in the examples Makefile to specify the Stratix 10 as target card. Observe for example:

If you have a project that was only validated with an older BSP, you can explicitly load the module for an older version of xrt, e.g. bittware_520n/19.4.0_hpc.

The table below shows the full mapping of valid SDK to BSP versions for the Intel OpenCL design flow. Make sure to match the allocated constraint for real hardware execution.

3. Build and test the example in emulation.

The compilation is divided into two parts:

  • host code: executed on the CPU. Performs initialization, data handling and FPGA device setup. Host code is compiled with a regular GCC compiler.

  • kernel code: executed on the FPGA or often in emulation on the CPU.

In this step we will first compile the host code and then compile the kernel code for emulation on the CPU.

Builds the emulation binary called host in the subdirectory bin.

Behind the scenes the Makefile triggers the following command, putting together the correct OpenCL headers and libraries, to produce an executable bin/host:

g++ -O2 -fstack-protector -D_FORTIFY_SOURCE=2 -Wformat -Wformat-security -fPIE -fPIC -fPIC -I../common/inc -I/opt/software/FPGA/IntelFPGA/opencl_sdk/21.4.0/hld/host/include host/src/main.cpp ../common/src/AOCLUtils /opencl.cpp ../common/src/AOCLUtils/options.cpp -L/opt/software/FPGA/IntelFPGA/opencl_sdk/21.4.0/hld/host/linux64/lib -z noexecstack -Wl,-z,relro,-z,now -Wl,-Bsymbolic -pie -lOpenCL -lrt -lpthread -o bin/host

Further behind the scenes, the Makefile determines some of these compile parameters by invoking the command line tool aocl according to the actutal environment as set up with modules. You can look at these parameters by invoking these commands yourself and use them in your own build process:

Now that the host code is generated, we can compile the kernel code:

In contrast to the compilation of CPU or GPU code, the compilation (or often called synthesis) of FPGA kernel code will take several hours or even days. Before synthesis and hardware execution, it is highly recommended to check the functionality of your design in emulation. The emulation compiles the FPGA kernel code for an emulator for the CPU (as the same suggests) and can be done within seconds to minutes.

  • -march=emulator: Tells the compiler to compile for CPU emulation.

Having the host and kernel compiled, we can execute the program:

Note, that the FPGA Emulation Device is selected.

Note: the emulation in emulation gives no indication at all about the performance that is to be expected from hardware execution on a real FPGA.

4. Create and inspect reports as indicator of expected HW performance.

To check if the kernel can be translated into an efficient FPGA design, intermediate files and an .html report can be generated with the following command:

5. Build the hardware design (bitstream)

In this step we build the kernel code for an executed on the FPGA. This hardware build step (so-called hardware synthesis) can take lots of time (hours!) and compute resources, so we create a batch script to submit the job to the slurm workload manager.

Then, we submit the synthesis_script.sh to the slurm workload manager:

 

To speed-up the process and save resources for unnecessary synthesis we have pre-synthesized the design. Expand the box below to copy the pre-synthesized design for hardware execution.

6. Execute the hardware design on an FPGA.

After the hardware synthesis (and host code compilation), we can allocate a suitably configured and equipped FPGA node and for execution.

To run the design, we load the proper modules and use the corresponding command on the allocated FPGA node

Congratulations. You have executed a real program on an FPGA.

How to proceed

Now that you have successfully compiled and ran the example code on our FPGAs you can proceed in various directions

  • look into the source code of the vector_add example.

  • try one of the other examples mentioned above. Start with an example that is as close as possible to the actual problem that you try to accelerate using FPGAs.

  • visit our main FPGA documentation page to learn more about the used parameters, other options and troubleshooting common problems.

  • do not hesitate to drop us an Email if you face any problems, need support or have any questions. Look for staff with Scientific Advisor FPGA Acceleration as their domain to contact the right person.