This guide will walk you walk you through the six steps required to use the Intel OpenCL FPGA toolkit on Noctua 2.
...
Expand |
---|
|
Without version number provided, the latest versions will be loaded. To use a specific version, you can append the version, e.g. intelFPGA_pro/20.4.0_hpc . All available versions can be queried with module avail intelFPGA_pro With the given commands the following modules are loaded intelFPGA_pro : Loads the compilation infrastructure for FPGA code
bittware_520n : drivers and board support package (BSP) for the Intel Stratix 10 card
toolchain/gompi : Loads the compilation infrastructure for the host code (most current C++ compilers will work)
Together, these modules setup paths and environment variables, some of which are used in the examples Makefile to specify the Stratix 10 as target card. Observe for example: Code Block |
---|
echo $FPGA_BOARD_NAME
p520_hpc_sg280l
echo $AOCL_BOARD_PACKAGE_ROOT
/cm/shared/opt/intelFPGA_pro/20.4.0/hld/board/bittware_pcie/s10_hpc_default |
If you have a project that was only validated with an older BSP, you can explicitly load the module for an older version of xrt, e.g. bittware_520n/19.4.0_hpc . The table below shows the full mapping of valid SDK to BSP versions for the Intel OpenCL design flow. Make sure to match the allocated constraint for real hardware execution. Include Page |
---|
| Intel OpenCL SDK to BSP MappingIntel OpenCL SDK to BSP MappingFPGA SDK for OpenCL and Bittware BSP version combinations |
---|
| Intel FPGA SDK for OpenCL and Bittware BSP version combinations |
---|
|
|
3. Build and test the example in emulation.
...
Expand |
---|
|
Background: -rtl : Tells the compiler to stop after report generation.
-v : Shows more details during the generation
-board=p520_max_sg280l : Specifies the target FPGA board (Bittware 520N with Intel Stratix 10 GX 2800).
-board-package=/cm/shared/opt/intelFPGA_pro/20.4.0/hld/board/bittware_pcie/s10 : Specifies the BSP in the correct version.
device/vector_add.cl : Kernel file for vector_add written in OpenCL.
-o vector_add_report : Output directory.
In order to inspect the report, you may want to copy the report to your local file system or mount your working directory, for more information refer to [Noctua2-FileSystems]. For example you can compress the report on Noctua 2: Code Block |
---|
tar -caf vector_add_report.tar.gz vector_add_report/reports |
Then copy and decompress it from your local command line (e.g. Linux, MacOS, or Windows Subsystem for Linux): Code Block |
---|
TBD
rsync -azv -e 'ssh -J <your-username>@fe.noctua.pc2.uni-paderborn.de' <your-username>@ln-0001:/scratch/<DIRECTORY_ASSIGNED_TO_YOUR_PROJECT>/getting_started_with_fpgas/vector_add/vector_add_report.tar.gz .
tar -xzf vector_add_report.tar.gz |
Open and inspect fpga_compile_report.prj/reports/report.html in your browser. The whole analysis contains little information, since the example is very simple. The main blocks of the report are Throughput Analysis -> Loop Analysis : displays information about all loops and their optimization status (is it pipelined? what is the initiation interval (II) of the loop?, …).
Area Analysis (of System) : details about the area utilization with architectural details into the generated hardware.
Views -> System Viewer : gives an overall overview of your kernels, their connections between each other and to external resources like memory.
Views -> Kernel Memory Viewer : displays the data movement and synchronization in your code.
Views -> Schedule Viewer : shows the scheduling of the generated instructions with corresponding latencies.
Bottleneck Viewer : identifies bottlenecks that reduce the performance of the design (lower maximum clock frequency of the design (Fmax), increases the initiation interval (II), …).
Open and inspect vector_add_report/reports/report.html in your browser. The throughput analysis contains little information, since the example is very simple and ND-Range kernels as the one used in this example yield less details in the report than Single Work Item Kernels. The area analysis shows that the kernel system uses at most 1% of the available resources, much more complex or parallel kernels could fit on the FPGA. The system viewer shows two 32-bit Burst-coalesced load and one 32-bit Burst-coalesced store operations. Refer to Intel's documentation (in particular Programming and Best Practice guides) about the Intel FPGA for OpenCL to learn more about the properties and optimization goals in the report. |
...
To speed-up the process and save resources for unnecessary synthesis we have pre-synthesized the design. Expand the box below to copy the pre-synthesized design for hardware execution.
Expand |
---|
title | Copy Use pre-synthesized design |
---|
|
In order to still use the slurm workload manager, we use a modified batch script copy_pre-synthesed_design_script.sh and submit it. Code Block |
---|
#!/bin/sh
# copy_pre-synthesed_design_script.sh
# Instead of starting the actual synthesis we use pre-synthezed results.
mv bin/vector_add_fpga.aocx bin/vector_add.aocx |
Then, we submit the copy_pre-synthesed_design_script.sh to the slurm workload manager: Code Block |
---|
sbatch --partition=all -A <your_project_acronym> -t 00:10:00 ./copy_pre-synthesed_design_script.sh |
We submit into --partition=all . With -t 00:10:00 , we allocate a small amount of time to this file copy job. You can check the progress of your job via squeue and after the job completes, check the complete job output in slurm-<jobid>.out .
Please notice that the compiled kernel has the same name for emulation and FPGA execution (that is vector_add.aocx ). If you override the pre-synthesized design accidentally, you can submit the script again. |
...
look into the source code of the vector_add
example.
try one of the other examples mentioned above. Start with an example that is as close as possible to the actual problem that you try to accelerate using FPGAs.
visit our main FPGA documentation page to learn more about the used parameters, other options and troubleshooting common problems.
do not hesitate to drop us an Email if you face any problems, need support or have any questions. Look for staff with Scientific Advisor FPGA Acceleration as their domain to contact the right person.