Xilinx Alveo U280
Relevant with current tools and shells
Software emulation of compute unit(s) exited unexpectedly
When emulating designs with medium to large local memory buffers, the stack size may not be sufficient, which leads to the above error message. There are different limits set on login nodes and all other nodes (in particular compute and fpga nodes) set.
Login nodes: a maximum of 8 MB stack size is possible. All of these 8 MB are usable for emulation by default. If this is not sufficient for your emulation, you need to allocate a compute or fpga node.
All other nodes: there is no hard limit except for the memory size you allocated for your job. By default, only 2 MB are usable for emulation (https://man7.org/linux/man-pages/man3/pthread_create.3.html). When this is insufficient, change the stack size inside your job to a fitting limit:
# higher limit if needed, here 16 MB ulimit -s 16384 # same 8 MB limit as on login nodes ulimit -s 8192
Note that after setting a specific limit within a job, you can not increase it again, only further decrease it.
Note that for most designs that eventually should run in hardware, 8 MB is either approaching or already exceeding the limits of available on-chip memory resources. However, there can be use cases where higher limits are useful, for example when emulating multiple kernels together that should eventually run on separate FPGAs.
Intel Stratix 10
Relevant with current tools and shells
Impossible to allocate different boards from different MPI ranks
When trying to use the two Bittware 520N boards from different MPI ranks, likely an error like the following will occur:
Native API failed. Native API returns: -2 (PI_ERROR_DEVICE_NOT_AVAILABLE) -2 (PI_ERROR_DEVICE_NOT_AVAILABLE)
This is a limitation of the runtime. Details and an experimental workaround can be found at Using Bittware 520N Boards from Different Ranks.
.ERROR: UNRECOGNIZED ERROR CODE (-1001)
This error might occur if the default system gcc
(version 4.8) is used, see required gcc versions.
Data corrupted during transfer from FPGA global memory to the host
A bug can cause corruption of data that is transferred from FPGA global memory to the host. The issue only occurs seldomly, about once in 100-300 TiB of transferred data. An automatic detection and workaround is available. See Bittware 520N Data Transfer Issue and Workaround for more details.
Deadlock when emulating kernels using serial channels
Kernels that use the cl_intel_channels OpenCL extension and communicate via write_channel_intel
and read_channel_intel
might deadlock on emulation, depending on the order in which kernels attempt to read and write from a channel.
Workaround: load the module intel/channel_emulation_patch before running the emulation:
module load intel/channel_emulation_patch
This module uses the LD_PRELOAD mechanism to hook into libc library calls and implement a workaround. Therefore all programs started while this module is loaded are potentially influenced. While we try to minimize the impact, we suggest to load this module solely when executing the emulation. If you notice any issues while the module is loaded, please get in contact with us.
LOCALE settings forwarded from your computer
Error message in quartus_sh_compile.log
Internal Error: Sub-system: CFG_INI, File: /quartus/ccl/cfg_ini/cfg_ini_reader.cpp, Line: 1530 Couldn't parse ini setting qspc_nldm_max_step_size=10.0 as a floating point value Stack Trace: 0xb4fe: err_report_internal_error(char const*, char const*, char const*, int) + 0x1a (ccl_err) 0x17b45: cfg_get_double_value(std::string const&, double) + 0xe4 (ccl_cfg_ini) 0x8f788: CFG_INI_DOUBLE::refresh() + 0x48 (tsm_qspc) ... Error (23035): Tcl error: couldn't open "top.fit.rpt": no such file or directory while executing "open $report" (procedure "fetch_pseudo_panel" line 3) invoked from within "fetch_pseudo_panel $report "Found \[0-9\]* clocks" {1 0} 2" (procedure "fetch_clock_periods" line 6) invoked from within "fetch_clock_periods $report" (procedure "fetch_clock" line 2) invoked from within "fetch_clock "$revision_name.fit.rpt" $clkname" (procedure "get_fmax_from_report" line 8) invoked from within "get_fmax_from_report $k_clk_name 1 $recovery_multicycle $iteration" (procedure "get_kernel_clks_and_fmax" line 5) invoked from within "get_kernel_clks_and_fmax $k_clk_name $k_clk2x_name $recovery_multicycle $iteration" (file "/cm/shared/opt/intelFPGA_pro/19.4.0/hld/ip/board/bsp/adjust_plls.tcl" line 815) invoked from within "source "$sdk_root/ip/board/bsp/adjust_plls.tcl"" (file "scripts/post_flow_pr.tcl" line 59) Error (23031): Evaluation of Tcl script scripts/post_flow_pr.tcl unsuccessful ... Error: Quartus Fitter has failed! Breaking execution... Error (23035): Tcl error: while executing "qexec "quartus_cdb -t scripts/post_flow_pr.tcl \"$top_path\""" invoked from within "if {$revision_name eq "top"} { post_message "Compiling top revision..." # Load OpenCL BSP utility functions source "$sdk_root/ip/board/bsp/ope..." (file "compile_script.tcl" line 40) Error (23031): Evaluation of Tcl script compile_script.tcl unsuccessful Error: Quartus Prime Compiler Database Interface was unsuccessful. 3 errors, 0 warnings Error: Peak virtual memory: 1021 megabytes Error: Processing ended: Mon Mar 30 14:47:15 2020 Error: Elapsed time: 03:06:43 Error: System process ID: 21428
The root cause is outlined in the first message of the above export from quartus_sh_compile.log
: parsing of a number as floating point failed. This can be caused by locale settings that are transferred from the computer you connect with to Noctua 2. After connecting to Noctua 2, check your locale settings with locale
, and possibly change them with export LC_NUMERIC="en_US.UTF-8"
.
[tester@fe-1 matrix_mult]$ locale ... LC_NUMERIC="de_DE.UTF-8" // can cause above error [tester@fe-1 matrix_mult]$ export LC_NUMERIC="en_US.UTF-8" [tester@fe-1 matrix_mult]$ locale ... LC_NUMERIC="en_US.UTF-8" // known to work ...
Relevant only with legacy tools or shells
CL_INVALID_PROGRAM_EXECUTABLE with fast emulation
When using the fast emulator along with host code that was previously tested with the legacy emulator and/or hardware execution, you may encounter a problem with during execution that corresponds to the OpenCL error code CL_INVALID_PROGRAM_EXECUTABLE
. To fix this issue, your host code needs to invoke clBuildProgram
(C API) or program.build()
(C++ API). This invocation is required for any normal OpenCL code, but with legacy emulation and hardware execution, it was not required and could be skipped.
FPGA programmed with bitstreams built with different SDK versions in the same session
Error message during bitstream programming from host code or with aocl program
FAILED to read auto-discovery string at byte 2: Expected version is 19, found 20 Error: The currently programmed/flashed design is no longer supported in this release. Please recompile the design with the present version of the SDK and re-program/flash the board. acl_hal_mmd.c:1460:assert failure: Failed to initialize kernel interfacemain: acl_hal_mmd.c:1460: l_try_device: Assertion `0' failed.
This or similar error messages come up when invoking host code or aocl
commands after a bitstream that was built with an earlier SDK version was configured. Workaround:
Load the latest intelFPGA_pro module (e.g. 19.3.0)
Configure the target bitstream (e.g. built with 19.2.0 SDK) using aocl program or your OpenCL host code
Optionally [reload the target intelFPGA_pro module that was used when building the bitstream]