Using Bittware 520N Boards from Different Ranks

When trying to use the two Bittware 520N boards from different MPI ranks, likely an error like the following will occur:

Native API failed. Native API returns: -2 (PI_ERROR_DEVICE_NOT_AVAILABLE) -2 (PI_ERROR_DEVICE_NOT_AVAILABLE)

The reason for this is that the runtime tries to access all boards in the system when creating a context or a queue. If any of the devices has already been locked by a different process, this will fail.

Workaround

Caution: This workaround should be considered experimental and currently still undergoes testing.

We provide a software module and a corresponding C++ header file to circumvent this issue. Currently, this solutions supports the simple scenario of two MPI ranks per node, where each rank is supposed to use one of the boards. You can load the module as follows:

module load fpga module load intel/oneapi_queue_extensions

The header file pc2/queue_extensions.hpp can then be included which provides routines for initializing either one or multiple queues for the rank-local board. The following example code demonstrates both of these routines:

#include <CL/sycl.hpp> #include <iostream> #include <mpi.h> #include <pc2/queue_extensions.hpp> int main() { MPI_Init(NULL, NULL); int myrank; MPI_Comm_rank(MPI_COMM_WORLD, &myrank); { auto q = sycl::ext::pc2::mpi_queue(sycl::ext::intel::fpga_selector_v); // Work with the queue std::cout << myrank << " running on device: " << q.get_device().get_info<sycl::info::device::name>() << "\n"; } MPI_Barrier(MPI_COMM_WORLD); { auto qs = sycl::ext::pc2::mpi_queues(sycl::ext::intel::fpga_selector_v, 2); // Work with the queue for (auto &q : qs) { std::cout << myrank << " running on device: " << q.get_device().get_info<sycl::info::device::name>() << "\n"; } } MPI_Finalize(); }

In this case, the first rank on each node will be assigned board aclbitt_s10_pcie0, and the second rank on each node will be assigned aclbitt_s10_pcie1.

Note that also at runtime the fpga/intel/oneapi_queue_extensions module needs to be loaded as it hooks into the call of ls /sys/class/aclpci_bitt_s10_pcie that is used by the runtime to determine what boards are available.