Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This guide will walk you walk you through the six steps required to use the Intel OpenCL FPGA toolkit on Noctua 2.

...

Expand
titleDetails

Background:

  • -rtl: Tells the compiler to stop after report generation.

  • -v: Shows more details during the generation

  • -board=p520_max_sg280l: Specifies the target FPGA board (Bittware 520N with Intel Stratix 10 GX 2800).

  • -board-package=/opt/software/FPGA/IntelFPGA/opencl_sdk/20.4.0/hld/board/bittware_pcie/s10_hpc_default: Specifies the BSP in the correct version. Normally this argument is not required as the compiler uses the environment variable AOCL_BOARD_PACKAGE_ROOT. Only if you you intentionally want to generate a report on an FPGA node allocated with a different constraint, this argument is needed.

  • device/vector_add.cl: Kernel file for vector_add written in OpenCL.

  • -o vector_add_report: Output directory.

In order to inspect the report, you may want to copy the report to your local file system or mount your working directory, for more information refer to [Noctua2-FileSystems]. For example you can compress the report on Noctua 2:

Code Block
tar -caf vector_add_report.tar.gz vector_add_report/reports

Then copy and decompress it from your local command line (e.g. Linux, MacOS, or Windows Subsystem for Linux):

Code Block
TBD
rsync -azv -e 'ssh -J <your-username>@fe.noctua.pc2.uni-paderborn.de' <your-username>@ln-0001:/scratch/<DIRECTORY_ASSIGNED_TO_YOUR_PROJECT>/getting_started_with_fpgas/vector_add/vector_add_report.tar.gz .

tar -xzf vector_add_report.tar.gz

Open and inspect fpga_compile_report.prj/reports/report.html in your browser. The whole analysis contains little information, since the example is very simple. The main blocks of the report are

  • Throughput Analysis -> Loop Analysis: displays information about all loops and their optimization status (is it pipelined? what is the initiation interval (II) of the loop?, …).

  • Area Analysis (of System): details about the area utilization with architectural details into the generated hardware.

  • Views -> System Viewer: gives an overall overview of your kernels, their connections between each other and to external resources like memory.

  • Views -> Kernel Memory Viewer: displays the data movement and synchronization in your code.

  • Views -> Schedule Viewer: shows the scheduling of the generated instructions with corresponding latencies.

  • Bottleneck Viewer: identifies bottlenecks that reduce the performance of the design (lower maximum clock frequency of the design (Fmax), increases the initiation interval (II), …).

Open and inspect vector_add_report/reports/report.html in your browser. The throughput analysis contains little information, since the example is very simple and ND-Range kernels as the one used in this example yield less details in the report than Single Work Item Kernels. The area analysis shows that the kernel system uses at most 1% of the available resources, much more complex or parallel kernels could fit on the FPGA. The system viewer shows two 32-bit Burst-coalesced load and one 32-bit Burst-coalesced store operations. Refer to Intel's documentation (in particular Programming and Best Practice guides) about the Intel FPGA for OpenCL to learn more about the properties and optimization goals in the report.

...

Code Block
srun --partition=fpga -A <your_project_acronym> --constraint=bittware_520n_20.4.0_hpc -t 2:00:00 --pty bash
Expand
titleDetails

Background information:

  • -A [YOUR_PROJECT_ACCOUNT]: Specify your project ID to charge compute time.

  • --constraint=bittware_520n_20.4.0_maxhpc: Specifies the correct version of the FPGA drivers (see BSP).

  • -N 1 -p fpga: Allocate one Noctua node with FPGAs. Two FPGAs are attached to one Noctua node.

  • -t 2:00:00: Allocate the node for 2 hours.

  • --pty bash: Get SSH terminal to allocated node.

...