FPGA-to-FPGA Networking

All boards in Noctua 2 can be configured with customized FPGA-to-FPGA networking setup within each user job.

There are two interfaces to configure the requested network topology. Both support the--fpgalink syntax, for which configurations can be generated and visualized via the FPGA-Link GUI.

  • or you can provide a series of individual connection descriptions like --fpgalink="n00:acl0:ch0-n01:acl0:ch0"

  • The recommended configuration mode is to use the script.

    • It is available for both card types.

    • It supports point-to-point links and Ethernet connections.

    • After it’s execution, the topology setup is guaranteed to be complete.

  • The legacy configuration mode is via additional arguments along with the job allocation via slurm.

    • It is only available for the Bittware 520N cards and only supports point-to-point links.

    • It supports a number of predefined topologies that can be selected with a shorthand notation like --fpgalink="pair".

    • In a few corner cases, it might hand over an incomplete topology due to race conditions.

Editor with GUI

The required notation to configure custom point-to-point connections can be generated with the FPGA-Link GUI.

Custom Topologies

The notation nXX:aclY:chZ describes a unique serial channel endpoint within a job allocation according to the following pattern

  • nXX, e.g. n02 specifies the node ID within your allocation, starting with n00 for the first node, n02 will specify the third node of your allocation. You can not use higher node IDs than the number of nodes requested by the allocation. At allocation time, the node ID is translated to a concrete node name, e.g. fpga-0008.

  • aclY, e.i. acl0 and acl1 describe the first and second FPGA board within each node.

  • chZ, e.i. ch0, ch1, ch2 and ch3 describe 4 external channel connections.

By specifying one unique pair of serial channel endpoints per --fpgalink argument, an arbitrary topology can be created within a job allocation. When the task starts, the topology will be summarized and for each fpgalink, an environment variable will be exported.

The following example uses one node n00 and connects all four channels from the first FPGA board acl0 to the four channels of the second FPGA board acl1 (see figure). The custom topology example can be directly used in the FPGA-Link GUI using this link.

srun -A pc2-mitarbeiter --constraint=19.2.0_max -N 1 --fpgalink="n00:acl0:ch0-n00:acl1:ch0" --fpgalink="n00:acl0:ch1-n00:acl1:ch1" --fpgalink="n00:acl0:ch2-n00:acl1:ch2" --fpgalink="n00:acl0:ch3-n00:acl1:ch3" -p fpga --pty bash
... Summarizing most recent topology information and exporting FPGALINK variables: Host list fpga-0004 Generated connections FPGALINK0=fpga-0004:acl0:ch0-fpga-0004:acl1:ch0 FPGALINK1=fpga-0004:acl0:ch1-fpga-0004:acl1:ch1 FPGALINK2=fpga-0004:acl0:ch2-fpga-0004:acl1:ch2 FPGALINK3=fpga-0004:acl0:ch3-fpga-0004:acl1:ch3

We recommend using srun and sbatch, because this information is not automatically shown when using salloc (the configuration itself still works). When using salloc, you can still recover the information and setup your environment variables by invoking

source /opt/cray/slurm/default/etc/scripts/SAllocTopologyInfo.sh

Predefined Topologies

As it can be tedious and error-prone to define each connection manually, we also provide a set of predefined topologies to be requested. The following table summarizes the available options.

Topology type

Invocation

Min-Max number of nodes

Brief description

Topology type

Invocation

Min-Max number of nodes

Brief description

pair

--fpgalink="pair"

1-N

Pairwise connect the 2 FPGAs within each node

clique

--fpgalink="clique"

2

All-to-all connection for 2 nodes, 4 FPGAs

ring

--fpgalink="ringO"

1-N

Ring with two links per direction, acl0 down, acl1 up

--fpgalink="ringN"

1-N

Ring with two links per direction, acl0 down, acl1 down

--fpgalink="ringZ"

1-N

Ring with two links per direction, acl0 and acl1 neighbors

torus

--fpgalink="torus2"

1-N

Torus with 2 FPGAs per row

--fpgalink="torus3"

2-N

Torus with 3 FPGAs per row

--fpgalink="torus4"

2-N

Torus with 4 FPGAs per row

--fpgalink="torus5"

3-N

Torus with 5 FPGAs per row

--fpgalink="torus6"

3-N

Torus with 6 FPGAs per row

Pair topology

Within each node, all channels of one FPGA board are connected to the respective channel of the other FPGA board. No connections between nodes are made.

The following example uses three nodes n00-n02 and connects within each node all four channels from the first FPGA board acl0 to the four channels of the second FPGA board acl1 (see figure). The pair topology example can be directly used in the FPGA-Link GUI using this link.

Clique topology

Within a pair of 2 nodes, each of the 4 FPGAs is connected to all 3 other FPGAs.

  • channel 0: to the same FPGA in the other node

  • channel 1: to the other FPGA in the same node

  • channel 2: to the other FPGA in the other node.

The following example uses three nodes n00-n02 and connects within each node all four channels from the first FPGA board acl0 to the four channels of the second FPGA board acl1 (see figure). The clique topology example can be directly used in the FPGA-Link GUI using this link.

Ring topology

This setup puts all FPGAs in a ring topology that defines for each FPGA the neighbor FPGAs "north" and "south". It connects each FPGA's channels 0 and 2 to the "north" direction and channels 1 and 3 to the "south" direction. Thus, the local perspective for each node within the topology is

Three different variants define how the FPGAs are arranged into the ring

Full example for a ringO with 4 nodes. See this example in the FPGA-Link GUI using this link.

Torus topology

This setup puts all FPGAs in a torus topology that defines for each FPGA the neighbor FPGAs "north", "south", "west", "east". It connects each FPGA's

  • channel 0 to the "north" direction,

  • channel 1 to the "south" direction,

  • channel 2 to the "west" direction and

  • channel 3 to the "east" direction.

Thus, the local perspective for each node within the topology is

The torus topology can be instantiated with a configurable width, that is number of FPGAs that are connected in "west-east" direction. With an uneven width, FPGAs in the same node can belong to consecutive rows of the torus. The number of FPGAs gets rounded down to the biggest full torus for the given width. The following block illustrates 3 different torus topologies on nodes fpga-[0001-0005].

Full example for a torus4 with 8 nodes. See this example in the FPGA-Link GUI using this link.

Using Serial Channels in Design Flows

Xilinx Alveo U280

Refer to our Aurora_HLS project for an example design implementing serial communication channels on the Alveo U280. Alternatively, you can use Xilinx ACCL for an Ethernet-based communication scheme.

Intel Stratix 10

All Intel Stratix 10 boards on Noctua 2 offer 4 point-to-point connections to other FPGA boards when the node is configured with a p520_max_sg280l BSP. Their use differs based on the used development flow.

OneAPI

Refer to the documentation on I/O Pipes for details on how to use the external serial channels. The channel IDs are mapped as follows:

  • Port 0: Channels 0 (read) and 1 (write)

  • Port 1: Channels 2 (read) and 3 (write)

  • Port 2: Channels 4 (read) and 5 (write)

  • Port 3: Channels 6 (read) and 7 (write)

The pipes need to be configured for a data type of width 256 bits. This could, for example, be a std::array<int, 8>. You may use a small C++ header like the following to bundle read and write pipes into a single channel class that has both read and write operations:

OpenCL

From the OpenCL environment, these links are used as external serial channels. A status reg value of 0xfff11ff1 in the diagnose indicates an active connection.