Page Comparison

Table of Contents

Xilinx Alveo U280

TDB

Intel Stratix 10

CL_INVALID_PROGRAM_EXECUTABLE with fast emulation

When using the fast emulator along with host code that was previously tested with the legacy emulator and/or hardware execution, you may encounter a problem with during execution that corresponds to the OpenCL error code CL_INVALID_PROGRAM_EXECUTABLE. To fix this issue, your host code needs to invoke clBuildProgram (C API) or program.build() (C++ API). This invocation is required for any normal OpenCL code, but with legacy emulation and hardware execution, it was not required and could be skipped.

FPGA programmed with bitstreams built with different SDK versions in the same session

Error message during bitstream programming from host code or with aocl program

Code Block

FAILED to read auto-discovery string at byte 2: Expected version is 19, found 20
Error: The currently programmed/flashed design is no longer supported in this release. Please recompile the design with the present version of the SDK and re-program/flash the board.
acl_hal_mmd.c:1460:assert failure: Failed to initialize kernel interfacemain: acl_hal_mmd.c:1460: l_try_device: Assertion `0' failed.

This or similar error messages come up when invoking host code or aocl commands after a bitstream that was built with an earlier SDK version was configured. Workaround:

Load the latest intelFPGA_pro module (e.g. 19.3.0)
Configure the target bitstream (e.g. built with 19.2.0 SDK) using aocl program or your OpenCL host code
Optionally [reload the target intelFPGA_pro module that was used when building the bitstream]

LOCALE settings forwarded from your computer

...

Relevant with current tools and shells

Software emulation of compute unit(s) exited unexpectedly

When emulating designs with medium to large local memory buffers, the stack size may not be sufficient, which leads to the above error message. There are different limits set on login nodes and all other nodes (in particular compute and fpga nodes) set.

Login nodes: a maximum of 8 MB stack size is possible. All of these 8 MB are usable for emulation by default. If this is not sufficient for your emulation, you need to allocate a compute or fpga node.
All other nodes: there is no hard limit except for the memory size you allocated for your job. By default, only 2 MB are usable for emulation (https://man7.org/linux/man-pages/man3/pthread_create.3.html). When this is insufficient, change the stack size inside your job to a fitting limit:

Code Block
# higher limit if needed, here 16 MB ulimit -s 16384 # same 8 MB limit as on login nodes ulimit -s 8192

Note that after setting a specific limit within a job, you can not increase it again, only further decrease it.
Note that for most designs that eventually should run in hardware, 8 MB is either approaching or already exceeding the limits of available on-chip memory resources. However, there can be use cases where higher limits are useful, for example when emulating multiple kernels together that should eventually run on separate FPGAs.

Intel Stratix 10

Relevant with current tools and shells

Impossible to allocate different boards from different MPI ranks

When trying to use the two Bittware 520N boards from different MPI ranks, likely an error like the following will occur:

Native API failed. Native API returns: -2 (PI_ERROR_DEVICE_NOT_AVAILABLE) -2 (PI_ERROR_DEVICE_NOT_AVAILABLE)

This is a limitation of the runtime. Details and an experimental workaround can be found at Using Bittware 520N Boards from Different Ranks.

.ERROR: UNRECOGNIZED ERROR CODE (-1001)

This error might occur if the default system gcc (version 4.8) is used, see required gcc versions.

Data corrupted during transfer from FPGA global memory to the host

A bug can cause corruption of data that is transferred from FPGA global memory to the host. The issue only occurs seldomly, about once in 100-300 TiB of transferred data. An automatic detection and workaround is available. See Bittware 520N Data Transfer Issue and Workaround for more details.

Deadlock when emulating kernels using serial channels

Kernels that use the cl_intel_channels OpenCL extension and communicate via write_channel_intel and read_channel_intel might deadlock on emulation, depending on the order in which kernels attempt to read and write from a channel.

Workaround: load the module intel/channel_emulation_patch before running the emulation:

Code Block
module load intel/channel_emulation_patch

This module uses the LD_PRELOAD mechanism to hook into libc library calls and implement a workaround. Therefore all programs started while this module is loaded are potentially influenced. While we try to minimize the impact, we suggest to load this module solely when executing the emulation. If you notice any issues while the module is loaded, please get in contact with us.

LOCALE settings forwarded from your computer

When an error message like this shows up in you synthesis output (e.g. in your slurm.out files)…

Code Block

Error: Can't run the Timing Analyzer (quartus_sta) -- Fitter (quartus_fit) failed or was not run. Run the Fitter (quartus_fit) successfully before running the Timing Analyzer (create_timing_netlist).
Error: Quartus Prime Timing Analyzer was unsuccessful. 1 error, 0 warnings
Error: Quartus Fitter has failed! Breaking execution...
Error (23035): Tcl error:
Error (23031): Evaluation of Tcl script compile_script.tcl unsuccessful
Error: Quartus Prime Compiler Database Interface was unsuccessful. 3 errors, 0 warnings
For more details, full Quartus compile output can be found in files quartuserr.tmp and quartus_sh_compile.log.
Error: Compiler Error, not able to generate hardware
  
llvm-foreach:
icpx: error: fpga compiler command failed with exit code 1 (use -v to see invocation)
make: *** [Makefile:160: tidal_fpga] Error 1

you should check for details in thequartus_sh_compile.log Either of the following two error messages hints to a problem with LOCALE settings.

Possible Error Details A

Code Block

Internal Error: Sub-system: CFG_INI, File: /quartus/ccl/cfg_ini/cfg_ini_reader.cpp, Line: 1530
Couldn't parse ini setting qspc_nldm_max_step_size=10.0 as a floating point value
Stack Trace:
     0xb4fe: err_report_internal_error(char const*, char const*, char const*, int) + 0x1a (ccl_err)
    0x17b45: cfg_get_double_value(std::string const&, double) + 0xe4 (ccl_cfg_ini)
    0x8f788: CFG_INI_DOUBLE::refresh() + 0x48 (tsm_qspc)
...
Error (23035): Tcl error: couldn't open "top.fit.rpt": no such file or directory
    while executing
"open $report"
    (procedure "fetch_pseudo_panel" line 3)
    invoked from within
"fetch_pseudo_panel $report "Found \[0-9\]* clocks" {1 0} 2"
    (procedure "fetch_clock_periods" line 6)
    invoked from within
"fetch_clock_periods $report"
    (procedure "fetch_clock" line 2)
    invoked from within
"fetch_clock "$revision_name.fit.rpt" $clkname"
    (procedure "get_fmax_from_report" line 8)
    invoked from within
"get_fmax_from_report $k_clk_name 1 $recovery_multicycle $iteration"
    (procedure "get_kernel_clks_and_fmax" line 5)
    invoked from within
"get_kernel_clks_and_fmax $k_clk_name $k_clk2x_name $recovery_multicycle $iteration"
    (file "/cm/shared/opt/intelFPGA_pro/19.4.0/hld/ip/board/bsp/adjust_plls.tcl" line 815)
    invoked from within
"source "$sdk_root/ip/board/bsp/adjust_plls.tcl""
    (file "scripts/post_flow_pr.tcl" line 59)
Error (23031): Evaluation of Tcl script scripts/post_flow_pr.tcl unsuccessful
...
Error: Quartus Fitter has failed! Breaking execution...
Error (23035): Tcl error: 
    while executing
"qexec "quartus_cdb -t scripts/post_flow_pr.tcl \"$top_path\"""
    invoked from within
"if {$revision_name eq "top"} {

  post_message "Compiling top revision..."

  # Load OpenCL BSP utility functions
  source "$sdk_root/ip/board/bsp/ope..."
    (file "compile_script.tcl" line 40)
Error (23031): Evaluation of Tcl script compile_script.tcl unsuccessful
Error: Quartus Prime Compiler Database Interface was unsuccessful. 3 errors, 0 warnings
    Error: Peak virtual memory: 1021 megabytes
    Error: Processing ended: Mon Mar 30 14:47:15 2020
    Error: Elapsed time: 03:06:43
    Error: System process ID: 21428

Possible Error Details B

Code Block

Internal Error: Sub-system: SIN, File: /quartus/tsm/sin/sin_simulation_interface.cpp, Line: 1768
near_target_voltage.is_track_half_vccio() || near_target_voltage.is_track_half_signal_swing() || near_target_voltage.is_double()
Processors in use: 16
Stack Trace:
  Quartus          0xd4cd4: SIN_SIMULATION_INTERFACE::build_netlist_key(SIN_SIMULATION_SETUP const*, IOO_PIN const*) + 0x334 (tsm_sin)
  Quartus          0xd4cd4: SIN_SIMULATION_INTERFACE::build_netlis  Quartus          0xd4cd4: SIN_SIMULATION_INTERFACE::build_netlist_k  Quartus          0xad9d8: SIN_JSPICE_SIMULATION_CACHE::get_simul  Quartus          0xad9d8: SIN_JSPICE_SIMULATION_CACHE::get_simulation_resul  Quartus         0x12d1ae: SIN_NADDER_MANAGER_BODY::get_simulatio  Quartus         0x12d1ae: SIN_NADDER_MANAGER_BODY::get_simulation_resu  Quartus          0xc664f: sin_parallel_run_simulation(SIN_PARALL  Quartus          0xc664f: sin_parallel_run_simulation(SIN_PARALLEL_INSTRUCTION*) + 0x3f (tsm_sin)
  Quartus           0x4da5: PUT_SPMD_JOB_VOID_PTR_IMPL::thread_loop(bool) + 0x9f (ccl_put)
  Quartus           0x4f  Quartus          0x42856: msg_thread_wrapper(void* (*)(void*), v  Quartus          0x42  Quartus          0x1543e: mem_thread_wrapper(void* (*)(void*), v  Quartus          0x15  Quartus           0xcd72: err_thread_wrapper(void* (*)(void*), v  Quartus           0xcd72: err_thread_wrapper(void* (*)(void*), void*) + 0x1e (ccl_err  Quartus           0x701e: thr_thread_begin + 0x2e (ccl_thr)
  System            0x81ca: start_thread + 0xea (pthread)
 
Error: Can't run the Timing Analyzer (quartus_sta) -- Fitter (quartus_fit) failed or was not run. Run the Fitter (quartus_fit) successfully before running the Timing Analyzer (create_timing_netlist).
Error: Quartus Prime Timing Analyzer was unsuccessful. 1 error, 0 warnings

Resolution

The root cause is outlined of both errors is the same, but is only evident in the first message of the above export exemplary excerpt fromquartus_sh_compile.log: parsing of a number as floating point failed. This can be is caused by locale settings that are transferred from the computer you connect with to Noctua 2. After connecting to Noctua 2, check your locale settings with locale, and possibly change them with export LC_NUMERIC="en_US.UTF-8".

Code Block

[tester@fe-1 matrix_mult]$ locale
...
LC_NUMERIC="de_DE.UTF-8" // can cause above error
[tester@fe-1 matrix_mult]$ export LC_NUMERIC="en_US.UTF-8"
[tester@fe-1 matrix_mult]$ locale
...
LC_NUMERIC="en_US.UTF-8" // known to work
...

ERROR: UNRECOGNIZED ERROR CODE (-1001)

This error might occur if the default system gcc (version 4.8) is used, see required gcc versions.

Deadlock when emulating kernels using serial channels

Kernels that use the cl_intel_channels OpenCL extension and communicate via write_channel_intel and read_channel_intel might deadlock on emulation, depending on the order in which kernels attempt to read and write from a channel.

Workaround: load the module intel/channel_emulation_patch before running the emulation:

Code Block
module load intel/channel_emulation_patch

...

Relevant only with legacy tools or shells

CL_INVALID_PROGRAM_EXECUTABLE with fast emulation

When using the fast emulator along with host code that was previously tested with the legacy emulator and/or hardware execution, you may encounter a problem with during execution that corresponds to the OpenCL error code CL_INVALID_PROGRAM_EXECUTABLE. To fix this issue, your host code needs to invoke clBuildProgram (C API) or program.build() (C++ API). This invocation is required for any normal OpenCL code, but with legacy emulation and hardware execution, it was not required and could be skipped.

FPGA programmed with bitstreams built with different SDK versions in the same session

Error message during bitstream programming from host code or with aocl program

Code Block

FAILED to read auto-discovery string at byte 2: Expected version is 19, found 20
Error: The currently programmed/flashed design is no longer supported in this release. Please recompile the design with the present version of the SDK and re-program/flash the board.
acl_hal_mmd.c:1460:assert failure: Failed to initialize kernel interfacemain: acl_hal_mmd.c:1460: l_try_device: Assertion `0' failed.

This or similar error messages come up when invoking host code or aocl commands after a bitstream that was built with an earlier SDK version was configured. Workaround:

Load the latest intelFPGA_pro module (e.g. 19.3.0)
Configure the target bitstream (e.g. built with 19.2.0 SDK) using aocl program or your OpenCL host code
Optionally [reload the target intelFPGA_pro module that was used when building the bitstream]

Versions Compared

Old Version 3

New Version Current

Key

Xilinx Alveo U280

Intel Stratix 10

CL_INVALID_PROGRAM_EXECUTABLE with fast emulation

FPGA programmed with bitstreams built with different SDK versions in the same session

LOCALE settings forwarded from your computer

Relevant with current tools and shells

Software emulation of compute unit(s) exited unexpectedly

Intel Stratix 10

Relevant with current tools and shells

Impossible to allocate different boards from different MPI ranks

.ERROR: UNRECOGNIZED ERROR CODE (-1001)

Data corrupted during transfer from FPGA global memory to the host

Deadlock when emulating kernels using serial channels

LOCALE settings forwarded from your computer

Possible Error Details A

Possible Error Details B

Resolution

ERROR: UNRECOGNIZED ERROR CODE (-1001)

Deadlock when emulating kernels using serial channels

Relevant only with legacy tools or shells

CL_INVALID_PROGRAM_EXECUTABLE with fast emulation

FPGA programmed with bitstreams built with different SDK versions in the same session