This guide will walk you walk you through the six steps required to use the Intel oneAPI OpenCL FPGA toolkit on Noctua 2.
1. Get the latest examples.
We will use the vector_add
example code that is shipped with the Intel FPGA SDK for OpenCL.
We recommend working in /scratch/
because FPGA designs consume a considerable amount of disk space. Navigate to the directory assigned to your project under /scratch/
and create a working directory for this example:
Code Block | ||
---|---|---|
| ||
git clone https://github.com/oneapi-src/oneAPI-samples.git cd oneAPI-samples/DirectProgramming/DPC++FPGA/Tutorials/GettingStarted/fpga_compilecd /scratch/[DIRECTORY_ASSIGNED_TO_YOUR_PROJECT] mkdir getting_started_with_fpgas cd getting_started_with_fpgas |
After, copy the vector_add
example into your getting_started_with_fpgas
workspace:
Code Block |
---|
cp -r /cm/shared/opt/intelFPGA_pro/21.4.0/hld/examples_aoc/common .
cp -r /cm/shared/opt/intelFPGA_pro/21.4.0/hld/examples_aoc/vector_add . |
Expand | ||
---|---|---|
|
...
Intel grouped the FPGA examples (directory oneAPI-samples/DirectProgramming/DPC++FPGA
) into
ReferenceDesigns
, demonstrates the implementation highly optimized algorithms and applications on an FPGAanr
: Adaptive Noise Reductioncrr
: CRR Binomial Tree Model for Option Pricingdb
: Database Query Accelerationgzip
: GZIP Compressionmerge_sort
: Merge Sortmvdr_beamforming
: MVDR Beamformingqrd
: QR Decomposition of Matricesqri
: QR-based inversion of Matrices
Tutorials
, which itself consist ofDesignPatterns
(double buffering, I/O streaming, loop optimizations, …)Features
(loop unrolling, pipes, usage of pragmas, …)GettingStarted
(our guide is based on this tutorial)Tools
(collect data for profiling)
In this quick start we pick the GettingStarted
example from the tutorials, but feel free to explore the other options.
Other available examples to try are:
|
2. Setup the local software environment on Noctua2.
Code Block |
---|
module load intel/oneAPIintelFPGA_pro module load bittware_520n module load develtoolchain/CMake gompi |
Expand | ||
---|---|---|
| ||
Without version number provided, the latest versions will be loaded. To use a specific version, you can append the version, e.g.
Together, these modules setup paths and environment variables, some of which are used in the examples Makefile to specify the Stratix 10 as target card. Observe for example:
If you have a project that was only validated with an older BSP, you can explicitly load the module for an older version of xrt, e.g. Supported oneAPI OpenCL SDK versions and bittware BSPversionsBSP versions: TBD |
3. Build and test the example in emulation.
The compilation is divided into two parts:
host code: executed on the CPU. Performs initialization, data handling and FPGA device setup. Host code is compiled with a regular GCC compiler.
kernel code: executed on the FPGA or often in emulation on the CPU.
In this step we will first compile the host code
...
and then compile the kernel code for emulation on the CPU.
Code Block |
---|
cd vector_add
make all |
Builds the emulation binary called host
in the subdirectory bin
.
Expand | ||
---|---|---|
| ||
The build directory (for Expected output Code Block | Behind the scenes the Makefile triggers the following command, putting together the correct OpenCL headers and libraries, to produce an executable
|
3. Build and test the example in emulation.
Code Block |
---|
make fpga_emu |
Builds the emulation binary called fpga_compile.fpga_emu
.
Expand | ||
---|---|---|
| ||
Under the hood, the
|
Code Block | ||
---|---|---|
./fpga_compile.fpga_emu
Further behind the scenes, the Makefile determines some of these compile parameters by invoking the command line tool
|
Now that the host code is generated, we can compile the kernel code:
Code Block |
---|
aoc -march=emulator -no-interleaving=default device/vector_add.cl -o bin/vector_add.aocx |
Expand | ||
---|---|---|
| ||
In contrast to the compilation of CPU or GPU code, the compilation (or often called synthesis) of FPGA kernel code will take several hours or even days. Before synthesis and hardware execution, it is highly recommended to check the functionality of your design in emulation. The emulation compiles the FPGA kernel code for an emulator for the CPU (as the same suggests) and can be done within seconds to minutes.
|
Having the host and kernel compiled, we can execute the program:
Code Block |
---|
./bin/host -emulator |
Expand | ||
---|---|---|
| ||
Note, that the FPGA Emulation Device is selected. |
Executes the emulation binary.
Note: the emulation in emulation gives no indication at all about the performance that is to be expected from hardware execution on a real FPGA.
4. Create and inspect reports as indicator of expected HW performance.
...
To check if the kernel can be translated into an efficient FPGA design, intermediate files and an .html
report can be generated with the following command:
Code Block |
---|
aoc -rtl -v -board=p520_max_sg280l -board-package=/cm/shared/opt/intelFPGA_pro/20.4.0/hld/board/bittware_pcie/s10 device/vector_add.cl -o vector_add_report |
Expand | ||||
---|---|---|---|---|
| ||||
In addition to the compilation flags
The generated report is a html file (called <file_name>.prj/reports/report.html Background:
In order to inspect the report, you may want to copy the report to your local file system or mount your working directory, for more information refer to [Noctua2-FileSystems]. For example you can compress the report on Noctua 2:
Then copy and decompress it from your local command line (e.g. Linux, MacOS, or Windows Subsystem for Linux):
Open and inspect
The Area Analysis for our example Open and inspect |
5. Build the hardware design (bitstream)
In this step we build the kernel code for an executed on the FPGA. This hardware build step (so-called hardware synthesis) can take lots of time (hours!) and compute resources, so we create a batch script to submit the job to the slurm workload manager.
Code Block |
---|
#!/bin/sh # synthesis_script.sh script module load intel/oneAPIintelFPGA_pro module load bittware_520n module load devel/CMake make fpga toolchain/gompi aoc -board=p520_max_sg280l -board-package=/cm/shared/opt/intelFPGA_pro/20.4.0/hld/board/bittware_pcie/s10 device/vector_add.cl -o bin/vector_add.aocx |
Then, we submit the synthesis_script.sh
to the slurm workload manager:
...
Expand | ||||
---|---|---|---|---|
| ||||
Under the hood, the make performs a step that we already know from the emulation and report generation
the only difference is the fpga_compile.fpga is generated. It will handle the initialization and execution of the code on an FPGA (see next step
Expected output
Note, that the build of the hardware design will create another report similar to the report that we discussed in the previous step. In contrast to the previous report, the new report contains the actual resource utilization of the design. More details on the analysis of the actual image can be found in Intel’s documentation. |
6. Execute the hardware design on an FPGA.
After the hardware synthesis (and host code compilation), we can allocate a suitably configured and equipped FPGA node and for execution.
Code Block |
---|
srun --partition=fpga -A <your_project_acronym> --constraint=20.4.0_hpc -t 2:00:00 --pty bash |
Expand | ||
---|---|---|
| ||
Background information:
|
To run the design, we load the proper modules and use the corresponding make command on the allocated FPGA node
Code Block |
---|
module load intel/oneAPIintelFPGA_pro module load bittware_520n module load develtoolchain/CMakegompi ./fpga_compile.fpgabin/host |
Expand | ||
---|---|---|
| ||
|
How to proceed
For more information using the tools, refer to
...
DPC++ FPGA Code Samples Guide,
...
Intel’s oneAPI Programming Guide and especially the FPGA flow,
...
...
|
Congratulations. You have executed a real program on an FPGA.
How to proceed
Now that you have successfully compiled and ran the example code on our FPGAs you can proceed in various directions
look into the source code of the
vector_add
example.try one of the other examples mentioned above. Start with an example that is as close as possible to the actual problem that you try to accelerate using FPGAs.
visit our main FPGA documentation page to learn more about the used parameters, other options and troubleshooting common problems.
do not hesitate to drop us an Email if you face any problems, need support or have any questions. Look for staff with Scientific Advisor FPGA Acceleration as their domain to contact the right person.