Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 75 Current »

Status

  • traffic light/corporate design to indicate status

  • reference to main Noctua 2 status page

News and Updates

Below you can find the most recent software and hardware updates regarding the FPGA partition. More news and updates can be found in the FPGA news and updates archive.

2023

  • 12 December

    • Intel oneAPI 24.0.0 available

  • 08 November

    • Xilinx Vitis and Vivado 23.2 and XRT 2.16 available

  • 01 September

    • Intel oneAPI 23.2.0 available

    • Xilinx Vitis and Vivado 23.1.2 avaiable

  • 30 May

    • Xilinx Vitis and Vivado 23.1 and XRT 2.15 available

  • 24 May

  • 24 April

    • Intel oneAPI 23.1.0 available

  • 7 Feb

    • Intel oneAPI 23.0.0 available

2022

  • 28 Sep

    • Intel oneAPI 22.3 available

  • 8 Jun

    • All Bittware 520N cards migrated from Noctua 1 to Noctua 2.

    • Small fixes to Quick Start Guides and lmod modules.

  • 2-3 Jun

    • Identified and fixed issue with constraints and node allocation.

  • 1 Jun

    • Fixed xilinx/xrt modules to allow executing xbutil validate from user space.

    • Added PLATFORM_PATH to xilinx/u280 modules to support System Compiler examples.

  • Apr

    • Setup of Noctua 2 with new Xilinx Alveo U280 cards and migration of first batch of exisisting Bittware 520N cards from Noctua 1

    • Module structure for Xilinx development flow added

    • Module structure for Intel/Bittware development flows slightly adapted (hierarchy and naming) - make sure to check your exisiting scripts

    • Quick Start Guides added.

  • 25 Jan

    • Intel FPGA SDK for OpenCL 21.3.0 and 21.4.0 available and can be used with new BSP (see Software Overview).

 2021
  • 23 Nov

    • Migration to RHEL8 completed, all BSPs available again, interactive optical link changes available again.

  • 10 Nov

    • Temporary feature limitations after migration to RHEL8: only 20.4.0 and 19.4.0 BSPs tested and active, interactive optical link changes disabled.

  • 21 Sep

    • Added new feature to change optical link topology during allocated job

      Restructuring and extension of documentation including sub-pages (see structure section)

  • 29 Jul

    • Installed new Bittware BSP 20.4.0.

    • Intel FPGA SDK for OpenCL 21.1.0 and 21.2.0 available and can be used with new BSP (see compatibility matrix).

  • 7 Jan

    • Installed new Intel FPGA SDK for OpenCL 20.4.0

    • Added reference to Intel design examples

On this page


FPGA Infrastructure

Hardware Overview

The Noctua 2 FPGA infrastructure consists of 36 nodes in the fpga partition and 3 heterogeneous accelerator nodes in the hacc partition. Note that not only the dedicated hacc-nodes, but all nodes with AMD/Xilinx FPGAs are accessible as part of the Heterogeneous Accelerated Compute Clusters (HACC) program via a small project proposal for FPGA researchers worldwide.

A technical description of Noctua 2 and the FPGA partition can be found in the Noctua 2 paper.

 

Xilinx Alveo U280 Nodes

Intel Stratix 10 Nodes

Custom Configuration Nodes

HACC Nodes

Number of Nodes

16

16

4

3

 

Accelerator Cards

3x Xilinx Alveo U280 cards

2x Bittware 520N cards

 

2x Xilinx Alveo U55C cards
2x Xilinx VCK5000 Versal development cards
4x AMD Instinct MI210 GPU cards

FPGA Types

Xilinx UltraScale+ FPGA (XCU280, 3 SLRs)

Intel Stratix 10 GX 2800 FPGA

Xilinx UltraScale+ FPGA (3 SLRs)
Xilinx Versal FPGA

Main Memory per Card

32 GiB DDR

32 GiB DDR

-
16 GiB DDR

High-Bandwidth Memory per Card

8 GiB HBM2

 

8 GiB HBM2
-

Network Interfaces per Card

2x QSFP28 (100G) links

4x QSFP+ (40G) serial point-to-point links

2x QSFP28 (100G) links (U55C)
2x QSFP28 (100G) links (VCK5000)

Topology of System

Topology Alveo Node

Topology Stratix Node

 

CPUs

2x AMD Milan 7713, 2.0 GHz, each with 64 cores

2x AMD Milan 7V13, 2.45 GHz, each with 64 cores

Main Memory

512 GiB

512 GiB

Storage

480 GB local SSD in /tmp/, full access to the Noctua 2 shared file systems

full access to the Noctua 2 shared file systems

 

 

Application-specific interconnect

Connected via CALIENT S320 Optical Circuit Switch (OCS), configurable point-to-point connections to any other FPGA or to a 100G Ethernet switch, more details see FPGA-to-FPGA Networking.

The software environment is setup using modules, see section Software Overview. For hardware execution, FPGA nodes with the correct configuration and driver version (so-called board support package or BSP) need to be allocated, see section FPGA System Access to FPGA Partition.

Software Overview

The software environment is setup using modules. Depending on the user requirements, different development flows for the FPGA cards are supported.

  • Xilinx Alveo U280

    • Vitis Design Flow (recommended)

    • Vivado Design Flow

  • Intel Stratix 10

    • oneAPI (recommended)

    • OpenCL (recommended for projects with exisiting OpenCL code base or for usage of serial channels)

    • DSP Builder

The development (including emulation to check functional correctness, report generation to get indications of expected performance, and synthesis with bittstream generation) can be done on any Noctua 2 node. Just load the corresponding modules for the target FPGA platform and Development Tool flow.

To execute designs on actual FPGAs, the same modules are required and additionally an FPGA node needs to be allocated with a fitting constraint, to get FPGAs with the expected configuration and drivers.

The Table of FPGA Software and Firmware Stacks provides an overview of the interplay of tools, modules and constraints for the three recommended development flows. Additionally, we have created Quick Start Guides to walk you through the six steps with examples using the latest tools.

System Access to FPGA Partition

To use Noctua 2 nodes with FPGAs, along with your Slurm command, you need to select the FPGA partition and provide a constraint to specify the configuration of FPGAs (shell, driver, board support package (BSP)) that your designs have been built for.

For Xilinx Alveo U280 cards you can use

srun --partition=fpga --constraint=xilinx_u280_xrt2.12 -A [YOUR_PROJECT_ACCOUNT] -t 2:00:00 --pty bash

For Bittware 520n cards with Intel Stratix 10 FPGAs you can use

srun --partition=fpga --constraint=bittware_520n_20.4.0_max -A [YOUR_PROJECT_ACCOUNT] -t 2:00:00 --pty bash

Constraints can be used together with srun, sbatch and salloc, however under some conditions salloc will fail, for details click the expansion box below. We recommend to always use srun or sbatch.

 Issues with salloc

A problem occurs when one of the nodes to be allocated is configured for a different constraint and is currently in use. Then salloc will fail with the following error message.

salloc: error: Job submit/allocate failed: Requested node configuration is not available

Workaround: use an allocation without requesting specific node names.

A problem also occurs when one of the nodes to be allocated is configured for a different constraint and is currently free. The allocation succeeds while the nodes are still reconfigured, programs or scripts starting during this time will fail, actual errors encountered differ.

A list of available matching versions of the BSPs and SDKs can be found in the Software Overview Details.

Serial Channels between FPGAs

When configured with a correct BSP, all FPGA boards offer 4 point-to-point connections to other FPGA boards. FPGA Serial Channels have an own documentation page with various examples and a graphical input tool (see figure below).

Clique topology

Sanity Checks and Troubleshooting

Contact and Support

For problems with the FPGA infrastructure (software and/or hardware) use our main support mail address pc2-support@uni-paderborn.de.

In order to help you as quickly as possible, please follow these guidelines

  • Use [Noctua2-FPGA] as a prefix in your email subject line.

  • Where did the problem occur and what did you expect to happen?

  • If possible, how can we reproduce the error in a systematic manner?

  • Did you attempt to fix/troubleshoot the problem? If you attempted to debug the problem, provide us the steps you already took and the intermediate results.

For general questions, acceleration support or project ideas with regarding FPGAs, please reach out to the FPGA domain-expects.

Getting Started

As a beginner with FPGAs it might take a lot of time to learn about the different concepts to setup the environment, compile code and actually run it on an FPGA. The good news is, that most of the knowledge can be boiled down to 6 steps that are very similar for the different development flows. We have created Quick Start Guides to walk you walk you through the six steps with examples using the latest tools.

 Pre-requirements for the guides

The general structure of the quick start guides is the following:

1. Get Example Code

All development flows are shipped with design examples. In this step you learn how to get the code and the structure of the examples.

2. Setup the Environment

All development flows are loaded with modules. In this step you learn which modules are required to establish a working environment.

3. Build and Execute in Emulation

Build and execute in emulation: Most of the FPGA code development will be done in emulation (run the code on CPU), because the actual FPGA code generation is very time consuming. In this step you learn how to build and run the code in emulation.

4. Create and Inspect Reports

Reports can be generated quickly during code compilation. It is crucial for any efficient FPGA development process to analyze reports regularly prior to actual hardware builds. In this step you learn how to generate the reports and which indicators are good starting point to estimate the efficiency on an actual FPGA.

5. Build the Hardware Design

The hardware build step (so-called hardware synthesis) can take lots of time and compute resources. In this step you learn how to create batch scripts to submit the synthesis job to the slurm workload manager.

6. Execute Design on FPGA Hardware

After the hardware synthesis, we can allocate an FPGA node for execution. In this step you learn how to allocate a correctly configured FPGA node that matches the soft- and hardware requirements.

Available FPGA Libraries and Applications

We and others have developed several ready to use libraries and applications that use FPGAs to accelerate the computation. Please use the links to get examples and the documentation. You can contact us, if you need guidance to accelerate your target code with our FPGAs libraries.

Application

Toolchain

Type of Support

CP2K for DFT with FPGA Acceleration of the Submatrix Method

OpenCL

Ready-to-use module files and bitstreams deployed on Noctua 2

CP2K for DFT with FPGA Acceleration of 3D FFTs

OpenCL

FPGA support in CP2K main repository + extra repository with FPGA designs fitting the Bittware 520N cards in Noctua 2

HPCC FPGA: HPC Challenge Benchmark Suite for FPGAs

OpenCL

Repository with benchmark suite targeting FPGAs from Intel (including the Bittware 520N with Stratix 10 cards in Noctua 2) and Xilinx

Cannon Matrix Multiplication on FPGAs

OpenCL

Repository with implementation of Cannon matrix multiplication as building block for GEMM on FPGAs fitting the Bittware 520N cards in Noctua 2

StencilStream Stencil Simulation Library for FPGAs

oneAPI

Repository with library and examples build with oneAPI for stencil simulations on FPGAs fitting the Bittware 520N cards in Noctua 2

Related pages

  • No labels