Power and Energy Measurements of FPGAs and AMD GPUs
AMD/Xilinx Devices (Alveo U280, Xilinx Alveo U55C, Xilinx Versal VCK5000, Instinct MI210)
There are different ways to read the power values for AMD/Xilinx devices.
sysfs Interface
The Xilinx xrt
driver and AMD rocm
driver provide power measurements via a hwmon
interface. It can be queried by simply reading from a corresponding file, which returns the current power consumption in either µW (Alveo) or W (otherwise):
# Alveo example:
# Location: /sys/bus/pci/devices/$BDF/hwmon/hwmon*/power1_input,
# where BDF is one of 0000:a1:00.1, 0000:81:00.1 or 0000:01:00.1
[tester@n2fpga01 ~]$ cat /sys/bus/pci/devices/0000\:01\:00.1/hwmon/hwmon*/power1_input
58360411
xilinx_power Utility
For a more convenient interface, we provide a dedicated tool named xilinx_power
which is available on all Xilinx FPGA and Xilinx HACC nodes:
#
# Xilinx FPGA Nodes
#
[tester@n2fpga01 ~]$ xilinx_power
0000:a1:00.1: 41.35W
0000:81:00.1: 39.22W
0000:01:00.1: 58.28W
[tester@n2fpga01 ~]$ xilinx_power -c2
0000:01:00.1: 58.22W
[tester@n2fpga01 ~]$ xilinx_power -c 0000:01:00.1
0000:01:00.1: 57.95W
#
# Xilinx HACC Nodes
#
[tester@n2hacc03 ~]$ xilinx_power
0000:e1:00.1: 21.00W
0000:c1:00.1: 21.76W
0000:a1:00.1: 36.00W
0000:81:00.1: 21.03W
0000:03:00.0: 40.00W
0000:26:00.0: 42.00W
0000:43:00.0: 40.00W
0000:63:00.0: 40.00W
You can run the xilinx_power
with -v
(verbose) to include the mapping of device type and BDF address:
#
# Xilinx HACC Nodes
#
[tester@n2hacc03 ~]$ xilinx_power -v
0000:e1:00.1: 20.00W (Card.Versal)
0000:c1:00.1: 21.76W (Card.Alveo)
0000:a1:00.1: 36.00W (Card.Versal)
0000:81:00.1: 21.03W (Card.Alveo)
0000:03:00.0: 40.00W (Card.Instinct)
0000:26:00.0: 42.00W (Card.Instinct)
0000:43:00.0: 40.00W (Card.Instinct)
0000:63:00.0: 40.00W (Card.Instinct)
Run with --help
to get a list of command line arguments. -c
allows specifying a specific card by either index or BDF.
XRT API
You can also use the XRT API to query electrical information, including the current power consumption, as JSON. The following example uses Boost to parse that JSON data:
#include <boost/property_tree/json_parser.hpp>
#include <xrt/xrt_device.h>
[...]
// Assuming `device` being an instance of or a reference to a valid xrt::device
auto json = std::stringstream{};
json << device.get_info<xrt::info::device::electrical>();
// parse JSON into a property tree
auto props = boost::property_tree::ptree{};
boost::property_tree::read_json(json, props);
auto watts = props.get<float>("power_consumption_watts", 0.0f);
std::cout << watts << "W\n";
Build-in Utilities (xbutil
, rocm-smi
)
Lastly, you can also use xbutil
for FPGAs and rocm-smi
for GPUs to query those electrical information. Usage example and sample output for xrt 2.12
, querying the first card. Power consumption is shown in line 17:
xbutil
example output:
[tester@n2fpga02 ~]$ ml fpga
[tester@n2fpga02 ~]$ ml xilinx/xrt/2.12
[tester@n2fpga02 ~]$ xbutil examine
...
Devices present
[0000:a1:00.1] : xilinx_u280_xdma_201920_3 user(inst=129)
[0000:81:00.1] : xilinx_u280_xdma_201920_3 user(inst=130)
[0000:01:00.1] : xilinx_u280_xdma_201920_3 user(inst=128)
[tester@n2fpga02 ~]$ xbutil examine -d 0000:a1:00.1 --report electrical
-----------------------------------------------
1/1 [0000:a1:00.1] : xilinx_u280_xdma_201920_3
-----------------------------------------------
Electrical
Max Power : 225 Watts
Power : 33.793573 Watts
Power Warning : false
Power Rails : Voltage Current
12 Volts Auxillary : 12.199 V, 1.363 A
12 Volts PCI Express : 12.192 V, 1.408 A
3.3 Volts PCI Express : 3.286 V
3.3 Volts Auxillary : 3.292 V
Internal FPGA Vcc : 0.851 V, 5.076 A
DDR Vpp Bottom : 2.500 V
DDR Vpp Top : 2.500 V
5.5 Volts System : 5.488 V
Vcc 1.2 Volts Top : 1.212 V
Vcc 1.2 Volts Bottom : 1.204 V
1.8 Volts Top : 1.808 V
0.9 Volts Vcc : 0.901 V
12 Volts SW : 12.235 V
Mgt Vtt : 1.203 V
rocm-smi
example output:
rocm-smi
========================================= ROCm System Management Interface =========================================
=================================================== Concise Info ===================================================
Device Node IDs Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU%
(DID, GUID) (Edge) (Avg) (Mem, Compute, ID)
====================================================================================================================
0 10 0x740f, 12261 41.0°C 40.0W N/A, N/A, 0 800Mhz 1600Mhz 0% auto 300.0W 0% 0%
1 11 0x740f, 42047 38.0°C 42.0W N/A, N/A, 0 800Mhz 1600Mhz 0% auto 300.0W 0% 0%
2 9 0x740f, 57300 38.0°C 40.0W N/A, N/A, 0 800Mhz 1600Mhz 0% auto 300.0W 0% 0%
3 8 0x740f, 1997 35.0°C 40.0W N/A, N/A, 0 800Mhz 1600Mhz 0% auto 300.0W 0% 0%
====================================================================================================================
=============================================== End of ROCm SMI Log ================================================
General notes:
We recommend to run kernels for at least ~1s and automatically perform repeated invocations to the command line tool concurrently in order to get reasonably accurate results. Further effects of increased power consumption due to thermal effects can be observed after multiple minutes of load on the cards, but are comparably minor.
Power consumption values are only updated once per second. Querying with a higher frequency therefore does not provide any additional data.
Bittware 520N
The Bittware driver allows to measure power using a standalone command line tool that queries the board power via the i2c bus. It's available on all FPGA nodes with Bittware 520N cards (irrespective of the constraint).
We recommend to run kernels for at least ~1s and automatically perform repeated invocations to the command line tool concurrently in order to get reasonably accurate results. Further effects of increased power consumption due to thermal effects can be observed after multiple minutes of load on the cards, but are comparably minor.
Usage example and sample output:
[tester@n2fpga18 ~]$ bittware_power
acl0: 64.86W
acl1: 65.99W
[tester@n2fpga18 ~]$ bittware_power -c0
acl0: 64.91W
[tester@n2fpga18 ~]$ bittware_power -c1
acl1: 65.99W |
Run with --help
to get a list of command line arguments. For example, with -c
the 520N card of interest can be selected.