Bittware 520N Data Transfer Issue and Workaround
Data transfers from global memory to the host sometimes leave out one or several consecutive memory pages (aligned blocks of 4 KiB), leading to corruption of the read data. The issue seems to occur randomly and on average only once per 100-300 TiB of transferred data. However, if you plan long-running or parallel FPGA jobs on the Bittware 520N boards, you should be aware of this issue and use methods to detect and fix these corruptions. We provide a module that automatically implements a workaround for this issue.
Workaround Strategy
A possible strategy to detect errors caused by this bug is:
Before performing a read from global memory to the host, write a sufficiently long, known byte string to each page in the host buffer.
Perform the transfer.
After the transfer, validate that no page contains the byte string anymore. Otherwise, the page has been left out during the transfer.
Automatic Application of the Workaround
We developed a software module that automatically implements the described workaround and automatically issues a second transfer if an error has been detected. You can use it by simply loading the fpga/bittware/520n_reliable_transfers
module. Note that this module only becomes available after one of the fpga/bittware/520
BSP modules has been loaded.
When using an FPGA with the module loaded, you will see an output line like the following printed to stderr by your application:
PC2 Bittware 520n reliable data transfer patch active. mlock all pages not activated.
If a data transfer issue is detected, the following line will be printed to stderr and the transfer will be re-issued:
!!! Incomplete data transfer detected. Re-issuing transfer. !!!
Performance Impact
The performance impact of the workaround heavily depends on the CPU utilization and how the memory buffers are handled by the host application. Generally, a performance impact of around 5-15% has to be assumed. However, writing to each memory page of freshly allocated memory can cause much higher overhead. You therefore should reuse host memory buffers whereever possible instead of reallocating them before each transfer.
If it is not possible to reuse host memory buffers, you can set the environment variable BITTFIX_MLOCK
, which causes our automatic workaround to pre-allocate all memory pages in one go before writing the byte string to each page. However, you still have to expect a severe performance degradation for FPGA-to-host data transfers in this case.
Limitations
The automatic workaround only validates pages where the first 32 bytes of the page are part of the target buffer. To make sure that the entire transfer is validated, you should therefore allocate your target buffer aligned to 4 KiB and make sure that the last page of the target buffer contains at least 32 bytes.
The workaround has been validated with a couple of user codes. However, it should be regarded as experimental. Make sure that it does not influence your application negatively and please get in contact with us if you notice any issues.