Power and Energy Measurements of FPGAs and AMD GPUs

Power and Energy Measurements of FPGAs and AMD GPUs

AMD/Xilinx Devices (Alveo U280, Xilinx Alveo U55C, Xilinx Versal VCK5000, Instinct MI210)

There are different ways to read the power values for AMD/Xilinx devices.

sysfs Interface

The Xilinx xrt driver and AMD rocm driver provide power measurements via a hwmon interface. It can be queried by simply reading from a corresponding file, which returns the current power consumption in either µW (Alveo) or W (otherwise):

# Alveo example: # Location: /sys/bus/pci/devices/$BDF/hwmon/hwmon*/power1_input, # where BDF is one of 0000:a1:00.1, 0000:81:00.1 or 0000:01:00.1 [tester@n2fpga01 ~]$ cat /sys/bus/pci/devices/0000\:01\:00.1/hwmon/hwmon*/power1_input 58360411

xilinx_power Utility

For a more convenient interface, we provide a dedicated tool named xilinx_power which is available on all Xilinx FPGA and Xilinx HACC nodes:

# # Xilinx FPGA Nodes # [tester@n2fpga01 ~]$ xilinx_power 0000:a1:00.1: 41.35W 0000:81:00.1: 39.22W 0000:01:00.1: 58.28W [tester@n2fpga01 ~]$ xilinx_power -c2 0000:01:00.1: 58.22W [tester@n2fpga01 ~]$ xilinx_power -c 0000:01:00.1 0000:01:00.1: 57.95W # # Xilinx HACC Nodes # [tester@n2hacc03 ~]$ xilinx_power 0000:e1:00.1: 21.00W 0000:c1:00.1: 21.76W 0000:a1:00.1: 36.00W 0000:81:00.1: 21.03W 0000:03:00.0: 40.00W 0000:26:00.0: 42.00W 0000:43:00.0: 40.00W 0000:63:00.0: 40.00W

You can run the xilinx_power with -v (verbose) to include the mapping of device type and BDF address:

# # Xilinx HACC Nodes # [tester@n2hacc03 ~]$ xilinx_power -v 0000:e1:00.1: 20.00W (Card.Versal) 0000:c1:00.1: 21.76W (Card.Alveo) 0000:a1:00.1: 36.00W (Card.Versal) 0000:81:00.1: 21.03W (Card.Alveo) 0000:03:00.0: 40.00W (Card.Instinct) 0000:26:00.0: 42.00W (Card.Instinct) 0000:43:00.0: 40.00W (Card.Instinct) 0000:63:00.0: 40.00W (Card.Instinct)

Run with --help to get a list of command line arguments. -c allows specifying a specific card by either index or BDF.

XRT API

You can also use the XRT API to query electrical information, including the current power consumption, as JSON. The following example uses Boost to parse that JSON data:

#include <boost/property_tree/json_parser.hpp> #include <xrt/xrt_device.h> [...] // Assuming `device` being an instance of or a reference to a valid xrt::device auto json = std::stringstream{}; json << device.get_info<xrt::info::device::electrical>(); // parse JSON into a property tree auto props = boost::property_tree::ptree{}; boost::property_tree::read_json(json, props); auto watts = props.get<float>("power_consumption_watts", 0.0f); std::cout << watts << "W\n";

Build-in Utilities (xbutil, rocm-smi)

Lastly, you can also use xbutil for FPGAs and rocm-smi for GPUs to query those electrical information. Usage example and sample output for xrt 2.12, querying the first card. Power consumption is shown in line 17:

xbutil example output:

[tester@n2fpga02 ~]$ ml fpga [tester@n2fpga02 ~]$ ml xilinx/xrt/2.12 [tester@n2fpga02 ~]$ xbutil examine ... Devices present [0000:a1:00.1] : xilinx_u280_xdma_201920_3 user(inst=129) [0000:81:00.1] : xilinx_u280_xdma_201920_3 user(inst=130) [0000:01:00.1] : xilinx_u280_xdma_201920_3 user(inst=128) [tester@n2fpga02 ~]$ xbutil examine -d 0000:a1:00.1 --report electrical ----------------------------------------------- 1/1 [0000:a1:00.1] : xilinx_u280_xdma_201920_3 ----------------------------------------------- Electrical Max Power : 225 Watts Power : 33.793573 Watts Power Warning : false Power Rails : Voltage Current 12 Volts Auxillary : 12.199 V, 1.363 A 12 Volts PCI Express : 12.192 V, 1.408 A 3.3 Volts PCI Express : 3.286 V 3.3 Volts Auxillary : 3.292 V Internal FPGA Vcc : 0.851 V, 5.076 A DDR Vpp Bottom : 2.500 V DDR Vpp Top : 2.500 V 5.5 Volts System : 5.488 V Vcc 1.2 Volts Top : 1.212 V Vcc 1.2 Volts Bottom : 1.204 V 1.8 Volts Top : 1.808 V 0.9 Volts Vcc : 0.901 V 12 Volts SW : 12.235 V Mgt Vtt : 1.203 V

rocm-smi example output:

rocm-smi ========================================= ROCm System Management Interface ========================================= =================================================== Concise Info =================================================== Device Node IDs Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU% (DID, GUID) (Edge) (Avg) (Mem, Compute, ID) ==================================================================================================================== 0 10 0x740f, 12261 41.0°C 40.0W N/A, N/A, 0 800Mhz 1600Mhz 0% auto 300.0W 0% 0% 1 11 0x740f, 42047 38.0°C 42.0W N/A, N/A, 0 800Mhz 1600Mhz 0% auto 300.0W 0% 0% 2 9 0x740f, 57300 38.0°C 40.0W N/A, N/A, 0 800Mhz 1600Mhz 0% auto 300.0W 0% 0% 3 8 0x740f, 1997 35.0°C 40.0W N/A, N/A, 0 800Mhz 1600Mhz 0% auto 300.0W 0% 0% ==================================================================================================================== =============================================== End of ROCm SMI Log ================================================

General notes:

  • We recommend to run kernels for at least ~1s and automatically perform repeated invocations to the command line tool concurrently in order to get reasonably accurate results. Further effects of increased power consumption due to thermal effects can be observed after multiple minutes of load on the cards, but are comparably minor.

  • Power consumption values are only updated once per second. Querying with a higher frequency therefore does not provide any additional data.

Bittware 520N

The Bittware driver allows to measure power using a standalone command line tool that queries the board power via the i2c bus. It's available on all FPGA nodes with Bittware 520N cards (irrespective of the constraint).

  • We recommend to run kernels for at least ~1s and automatically perform repeated invocations to the command line tool concurrently in order to get reasonably accurate results. Further effects of increased power consumption due to thermal effects can be observed after multiple minutes of load on the cards, but are comparably minor.

Usage example and sample output:

[tester@n2fpga18 ~]$ bittware_power acl0: 64.86W acl1: 65.99W [tester@n2fpga18 ~]$ bittware_power -c0 acl0: 64.91W [tester@n2fpga18 ~]$ bittware_power -c1 acl1: 65.99W

Run with --help to get a list of command line arguments. For example, with -c the 520N card of interest can be selected.