Sanity Checks and Troubleshooting

Xilinx Alveo U280

Relevant with current tools and shells

Software emulation of compute unit(s) exited unexpectedly

When emulating designs with medium to large local memory buffers, the stack size may not be sufficient, which leads to the above error message. There are different limits set on login nodes and all other nodes (in particular compute and fpga nodes) set.

  • Login nodes: a maximum of 8 MB stack size is possible. All of these 8 MB are usable for emulation by default. If this is not sufficient for your emulation, you need to allocate a compute or fpga node.

  • All other nodes: there is no hard limit except for the memory size you allocated for your job. By default, only 2 MB are usable for emulation (https://man7.org/linux/man-pages/man3/pthread_create.3.html). When this is insufficient, change the stack size inside your job to a fitting limit:

# higher limit if needed, here 16 MB ulimit -s 16384 # same 8 MB limit as on login nodes ulimit -s 8192
  • Note that after setting a specific limit within a job, you can not increase it again, only further decrease it.

  • Note that for most designs that eventually should run in hardware, 8 MB is either approaching or already exceeding the limits of available on-chip memory resources. However, there can be use cases where higher limits are useful, for example when emulating multiple kernels together that should eventually run on separate FPGAs.

Intel Stratix 10

Relevant with current tools and shells

Impossible to allocate different boards from different MPI ranks

When trying to use the two Bittware 520N boards from different MPI ranks, likely an error like the following will occur:

Native API failed. Native API returns: -2 (PI_ERROR_DEVICE_NOT_AVAILABLE) -2 (PI_ERROR_DEVICE_NOT_AVAILABLE)

This is a limitation of the runtime. Details and an experimental workaround can be found at https://upb-pc2.atlassian.net/wiki/spaces/PC2DOK/pages/89882645.

.ERROR: UNRECOGNIZED ERROR CODE (-1001)

This error might occur if the default system gcc (version 4.8) is used, see required gcc versions.

Data corrupted during transfer from FPGA global memory to the host

A bug can cause corruption of data that is transferred from FPGA global memory to the host. The issue only occurs seldomly, about once in 100-300 TiB of transferred data. An automatic detection and workaround is available. See https://upb-pc2.atlassian.net/wiki/spaces/PC2DOK/pages/23232513 for more details.

Deadlock when emulating kernels using serial channels

Kernels that use the cl_intel_channels OpenCL extension and communicate via write_channel_intel and read_channel_intel might deadlock on emulation, depending on the order in which kernels attempt to read and write from a channel.

Workaround: load the module intel/channel_emulation_patch before running the emulation:

module load intel/channel_emulation_patch

This module uses the LD_PRELOAD mechanism to hook into libc library calls and implement a workaround. Therefore all programs started while this module is loaded are potentially influenced. While we try to minimize the impact, we suggest to load this module solely when executing the emulation. If you notice any issues while the module is loaded, please get in contact with us.

LOCALE settings forwarded from your computer

When an error message like this shows up in you synthesis output (e.g. in your slurm.out files)…

Error: Can't run the Timing Analyzer (quartus_sta) -- Fitter (quartus_fit) failed or was not run. Run the Fitter (quartus_fit) successfully before running the Timing Analyzer (create_timing_netlist). Error: Quartus Prime Timing Analyzer was unsuccessful. 1 error, 0 warnings Error: Quartus Fitter has failed! Breaking execution... Error (23035): Tcl error: Error (23031): Evaluation of Tcl script compile_script.tcl unsuccessful Error: Quartus Prime Compiler Database Interface was unsuccessful. 3 errors, 0 warnings For more details, full Quartus compile output can be found in files quartuserr.tmp and quartus_sh_compile.log. Error: Compiler Error, not able to generate hardware llvm-foreach: icpx: error: fpga compiler command failed with exit code 1 (use -v to see invocation) make: *** [Makefile:160: tidal_fpga] Error 1

you should check for details in thequartus_sh_compile.log Either of the following two error messages hints to a problem with LOCALE settings.

Possible Error Details A
Possible Error Details B
Resolution

The root cause of both errors is the same, but is only evident in the first exemplary excerpt fromquartus_sh_compile.log: parsing of a number as floating point failed. This is caused by locale settings that are transferred from the computer you connect with to Noctua 2. After connecting to Noctua 2, check your locale settings with locale, and possibly change them with export LC_NUMERIC="en_US.UTF-8".

Relevant only with legacy tools or shells

CL_INVALID_PROGRAM_EXECUTABLE with fast emulation

When using the fast emulator along with host code that was previously tested with the legacy emulator and/or hardware execution, you may encounter a problem with during execution that corresponds to the OpenCL error code CL_INVALID_PROGRAM_EXECUTABLE. To fix this issue, your host code needs to invoke clBuildProgram (C API) or program.build() (C++ API). This invocation is required for any normal OpenCL code, but with legacy emulation and hardware execution, it was not required and could be skipped.

FPGA programmed with bitstreams built with different SDK versions in the same session

Error message during bitstream programming from host code or with aocl program

This or similar error messages come up when invoking host code or aocl commands after a bitstream that was built with an earlier SDK version was configured. Workaround:

  • Load the latest intelFPGA_pro module (e.g. 19.3.0)

  • Configure the target bitstream (e.g. built with 19.2.0 SDK) using aocl program or your OpenCL host code

  • Optionally [reload the target intelFPGA_pro module that was used when building the bitstream]