PDU Types for AI Server Racks: Basic, Metered, Switched, ATS

The Power Distribution Unit is the most boring thing in the rack and the piece you regret first. A 4-GPU AI server pulls 1.8–2.4 kW sustained; an 8-GPU node is 3.5–4.5 kW. At those numbers the PDU is no longer "the thing the server plugs into" — it is the gauge, the breaker map, the remote reset button, and on a bad day the only thing that tells you which outlet is melting.

This article walks the PDU ladder from the dumb power strip up to the dual-source intelligent unit, with honest pricing, what each tier buys you, and where the cost-vs-pain curve breaks for AI compute. Brand-agnostic.

The tiers, in one table

Tier What it does Per-outlet visibility Remote control EU price, 24-outlet 0U
Basic Power strip in rack form None No €80–150
Metered (input) Aggregate current/voltage display None No €350–500
Outlet-metered Per-outlet current/voltage/power Yes No €700–1200
Switched Aggregate meter + remote outlet on/off None Yes €900–1400
Switched + outlet-metered The full picture Yes Yes €1500–3000
ATS variant (dual-input) Failover between two sources Per tier above Per tier above +€400–800 on top

The right answer for any AI rack with two or more compute nodes is switched + outlet-metered. We will defend that below.

Basic PDU — the dumb strip

A rack-mountable power strip. Live, neutral, ground, a row of outlets, sometimes a breaker per bank. No display, no network port, no logic. Single input (C20 inlet or a hard-wired plug), six to twenty-four outlets in a mix of C13 and C19, a power LED if you are lucky.

What you do not get: any visibility into how much current any individual outlet — or the unit itself — is drawing. No remote power-cycle. No monitoring integration. If the server hangs and you are 200 km away, you call someone with a key to the rack.

Basic is fine for a lab bench under your desk, or one development node in a small office where you can see the lights and reach the plug. It is the wrong answer for any production AI server, any rack you cannot reach in 30 minutes, and any rack with more than two compute nodes. The gap from basic (€100) to metered (€400) is small relative to one GPU; buy metered at minimum.

Metered PDU — aggregate visibility, no per-outlet detail

A metered PDU adds a local display showing input current, voltage, and usually power (kW) and energy (kWh). On 3-phase units it shows per-phase values. Better units expose the same data over SNMP, Modbus TCP, or HTTP.

What you still do not get: per-outlet current. You see the rack pulls 18 A. You do not see whether outlet 4 is 9 A and outlet 5 is 0.5 A, or whether both are 4.75 A. For a heterogeneous AI rack that matters. You also get no outlet control — outlets are always on while the unit has power.

This tier is honest value for a single-tenant rack with one or two GPU servers and a switch, where load is predictable and you only need "are we close to tripping the breaker." Skip it and go outlet-metered if you have more than two GPU nodes, or any chargeback obligation, or a load profile that varies sharply between training and inference on the same outlet.

Outlet-metered PDU — per-outlet measurement, the AI baseline

Per-outlet current, voltage, power, and energy. A small CT or shunt on every outlet, polled many times per second. Same SNMP / Modbus / HTTP exposure as metered, with per-outlet OIDs.

For AI compute this is the first tier that gives you the data you actually need. You can see which GPU server is drawing what — when the team adds a training run and rack input creeps from 24 A to 30 A, you know immediately it was the 8× 5090 node. Per-outlet power factor surfaces an aging or undersized PSU before it fails. Per-outlet kWh accumulators give honest chargeback. Per-outlet alarms (e.g. 12 A on a C13, 80% of rating) catch creep before it trips the upstream breaker.

What you still cannot do: power-cycle a hung server remotely. The outlet is metered but always energised.

Outlet-metered units sit at €700–1200 — between metered (€400) and switched-metered (€1500+). If you have decided you will never need remote outlet control, this is a defensible stop. In practice, most AI racks want both.

Switched PDU — remote outlet control

A switched PDU adds a relay (mechanical or solid-state) behind each outlet. You can turn an outlet on, off, or cycle it (off for N seconds, then on) over HTTP, SNMP set, or vendor app. Aggregate metering is the same as a metered PDU; there is no per-outlet measurement.

The killer feature is reboot-without-truck-roll. The number of times an AI training job has wedged a server hard enough that IPMI is also unresponsive is not zero. The switched PDU is the last way to power-cycle the box without sending a human. Secondary uses: programmable startup sequencing after a UPS event (so all servers do not boot at the same instant), and scheduled on/off for lab gear.

What you do not get without the outlet-metered upgrade: any way to verify the outlet actually has load on it after you turned it on. The PDU tells you the relay closed; it does not tell you the server is drawing current. For a confident remote reboot you really want both states.

Switched + outlet-metered — the AI-ops sweet spot

Per-outlet measurement and per-outlet remote control in one unit. This is the right answer for any production AI rack:

  1. Per-outlet draw in your monitoring stack alongside GPU power, room temperature, PSU rails. Anomalies surface in one dashboard.
  2. Closed-loop remote reboot. Set relay off, watch current go to zero, wait five seconds, set relay on, watch current return to expected idle.
  3. Early thermal warning. If outlet 4 starts pulling 2.6 kW when it normally idles at 1.7 kW, alarm fires before the room AC gives up.
  4. Real chargeback with NTP-timestamped kWh per outlet.

Cost: €1500–3000 for a 24-outlet 0U vertical. Less than the price of a single RTX 5090. On a rack hosting €30k+ of GPUs, this is not the line item to economise on.

If you cannot physically reach the rack — colocation, remote office, restricted building — you NEED switched. One avoided after-hours truck roll covers the difference.

ATS PDU — failover between two power sources

An ATS (Automatic Transfer Switch) PDU has two input cords from two different sources and feeds a single bank of outlets from whichever input is healthy. When the primary sags or drops, the ATS transfers to the secondary within 8–16 ms. Downstream gear sees only a brief disturbance.

ATS PDUs are for single-corded equipment that needs source redundancy. Modern servers with dual PSUs do not need an ATS — you plug each PSU into a separate PDU on a separate source and the server handles failover natively. ATS is what you reach for when the rack has single-corded gear — most network switches, KVMs, smaller storage appliances — and you cannot tolerate the source going down.

ATS does not buy you battery backup; if both sources fail, the ATS fails with them. You still want a UPS upstream of at least one source. It also does not protect against downstream shorts — a bad outlet trips the breaker regardless.

Worth it for a mixed rack with some dual-PSU servers and some single-PSU appliances. Overkill for a rack where everything is already dual-PSU (just use two independent PDUs) or for a lab where a brief outage is annoying but not catastrophic.

3-phase PDU distribution

Once a rack crosses about 5 kW sustained, single-phase 16 A (3.7 kW) is gone and 32 A (7.4 kW) feels tight. The standard answer is a 3-phase PDU on an IEC 60309 32 A 3P+N+E plug (red, 400 V line-to-line / 230 V line-to-neutral).

P_max = √3 × V_LL × I × PF
      = 1.732 × 400 V × 32 A × 1.0
      ≈ 22.2 kVA  (about 22 kW at unity PF)

EU code derates continuous load to 80% of breaker, so usable sustained is closer to 17–18 kW. Enough for four 4-GPU nodes plus networking, comfortably.

Internally the three phases (L1, L2, L3) are split across outlets in one of two ways. Phase-per-group (outlets 1–8 on L1, 9–16 on L2, 17–24 on L3) is simple but risks imbalance if all your high-load gear lands on the same group. Phase-striped (outlets 1/4/7 on L1, 2/5/8 on L2, 3/6/9 on L3) puts adjacent outlets on different phases, so contiguously-mounted servers naturally distribute across phases. Most modern intelligent PDUs stripe by default.

A worked wiring example for a 3-phase 32 A PDU feeding a 4-node AI rack:

Input: IEC 60309 32A 3P+N+E (red, 400/230V)
       L1 — 32A   L2 — 32A   L3 — 32A   N   PE

Internal: per-phase 32A hydraulic-magnetic breaker
          per-outlet relay + CT (switched + outlet-metered)
          phase-striped outlets

Outlets (24 total, 12× C19 + 12× C13, 0U vertical):

  1  (L1, C19)  Node A PSU-1   (8× 5090,  ~4.0 kW)
  2  (L2, C19)  Node A PSU-2   (same node, split delivery)
  3  (L3, C19)  Node B PSU-1   (4× Pro 6000, ~2.2 kW)
  4  (L1, C19)  Node B PSU-2
  5  (L2, C19)  Node C PSU-1   (4× 5090, ~2.0 kW)
  6  (L3, C19)  Node C PSU-2
  7  (L1, C19)  Node D PSU-1   (4× 5090, ~2.0 kW)
  8  (L2, C19)  Node D PSU-2
  9  (L3, C13)  ToR switch (~80 W)
  10 (L1, C13)  Mgmt switch (~30 W)
  11 (L2, C13)  KVM-over-IP (~20 W)
  12 (L3, C13)  Head/jump host (~150 W)
  13–24         Reserve / aux

Per-phase load (all nodes at ~80% sustained):
  L1 ≈ 4.1 kW / 17.8 A
  L2 ≈ 4.0 kW / 17.4 A
  L3 ≈ 2.4 kW / 10.4 A

Two things matter here. First, phase striping does most of the balancing automatically. Second, the dual-PSU split-delivery nodes intentionally land their two cords on different phases. For any AI server with dual PSUs in split-delivery configuration, plug each PSU into a different phase — halves per-phase current, improves headroom. ("Split delivery" is our product copy term: two PSUs each carrying half the load, not redundant 1+1. See W04 for the distinction.)

The L3 imbalance in the example (about 14 A delta) is on the edge of what utilities accept gracefully. P03 covers phase balancing across racks at the building level.

C13, C14, C19, C20 — outlet types

Connector Rating Where you see it
C13 10 A / 250 V Servers ≤1.5 kW, switches, KVMs
C14 10 A / 250 V Inlet on the device for a C13 plug
C19 16 A / 250 V High-current AI nodes, big PSUs
C20 16 A / 250 V Inlet on the device for a C19 plug

Even-numbered ends are device inlets, odd-numbered ends are cord/PDU plugs.

The relevant point: anything above about 1.5 kW continuous wants C19/C20, not C13/C14. A 4-GPU AI node with a 2000–2400 W PSU is past the C13 rating. Trying to feed it via C13 is asking for a melted connector four months into deployment when contact resistance has crept up from heat cycling.

A correctly-specified AI rack PDU has plenty of C19 outlets, not just two or three. A good layout for a 24-outlet 0U vertical is 12× C19 + 12× C13. Some vendors ship "combo" outlets that accept C14 or C20 in the same physical receptacle — fine, just costs more. Order matching C19-to-C20 cords (0.9–1.2 m for 0U PDUs mounted at the back of the rack). Buy three or four spares — they go missing.

PDU current rating vs circuit rating

Utility / panel breaker:     32 A 3-phase (C-curve)
Continuous load (80% rule):  25.6 A per phase usable
PDU rating:                  32 A per phase
Per-phase outlet sum target: ≤ 25 A

The PDU will happily carry 32 A per phase if you let it. The upstream breaker will eventually trip on sustained 30 A — thermal-magnetic breakers integrate overload over many minutes. Plan to 80% and you have margin for inrush and brief spikes. The PDU's own per-phase breakers are the second layer.

SNMP, Modbus, HTTP — getting the data out

A switched + outlet-metered PDU is worth what your monitoring can see of it.

  • SNMP v2c / v3 — the universal answer. Every serious vendor publishes a MIB; Prometheus' snmp_exporter scrapes it. Use v3 where supported, v2c on a management VLAN where you have to. Poll every 30 s normally, 10 s for fast anomaly detection.
  • Modbus TCP — common on industrial-leaning brands. Less standardised than SNMP (register maps vary), but reliable. Prometheus has Modbus exporters.
  • HTTP / JSON REST — modern intelligent PDUs ship a REST API. Easier for custom integrations; rarer than SNMP.
  • Vendor cloud apps — useful for at-a-glance, do not rely on for production monitoring.
PDU
SNMP v3 per-outlet data
SNMP
snmp_exporter
MIB translation
Prometheus
  • Grafana dashboards
  • Alertmanager → Slack / PagerDuty
Alert thresholds
Metric Threshold
Per-phase input current Warn 70% of breaker, page 85%
Per-outlet active power Warn on +30% deviation from 1h rolling mean
Aggregate kWh No alarm, graph for chargeback
Internal PDU temperature Warn 50 °C, page 60 °C
Outlet relay state vs commanded Page on mismatch
SNMP reachability Page after 3 min of unreachability

Relay-state-vs-commanded is the underrated check. A switched relay can fail welded-closed (more common than welded-open). You commanded outlet 5 off, current is still flowing, you have a stuck relay and a server you cannot actually reset. You want to know.

The honest take

For any AI server rack with two or more compute nodes, buy switched + outlet-metered. 3-phase if sustained load is over 5 kW and the building has 3-phase service. ATS variant only if you have non-redundant gear in a rack with two source feeds.

Reasons to step down are narrow: a single node on a lab desk where you can reach the plug is fine with basic; one or two nodes in your own building with a tight budget can stop at metered. Production AI compute that you cannot reach within 15 minutes: switched + outlet-metered, no exceptions.

The mistake is buying basic or metered to save €1000 on a rack containing €30k+ of GPUs. The first time you eat a 4-hour outage because you cannot reset a hung node remotely, the PDU upgrade pays for itself.

What breaks

PDU failure modes we have seen, in rough order of frequency:

  • Welded relay on switched units. Outlet commanded off, current still flowing. Alarm on relay-state-vs-commanded mismatch and replace the unit — relays do not heal.
  • Outlet-metering drift. Cheaper CTs drift 5–10% after two or three years. If chargeback depends on this, re-calibrate annually or buy units with stated lifetime accuracy (good units claim ±1%).
  • Controller hang. Metering and control lock up; outlets keep flowing but SNMP goes dark. Most units have a watchdog that reboots the controller without interrupting outlet power — verify yours does.
  • Single-outlet thermal failure. C13 in particular suffers contact-resistance creep under sustained 8–10 A. Mitigation: use C19 above 1.5 kW, IR-scan a heavy rack quarterly.
  • Phase imbalance creeping over time. Rack is balanced at install; six months later someone has added gear on one phase. Per-phase alarms catch it, quarterly reviews catch it too.
  • Firmware update bricks the controller. Treat firmware updates like any prod change: maintenance window, secondary unit first, rollback plan ready.

What to do next

If you are speccing a rack for AI compute:

  1. Compute sustained load. Sum each server's typical-load draw, not nameplate. K-AI 4-GPU node, plan 2.0–2.4 kW; 8-GPU, 3.5–4.5 kW. Add 10% for networking and BMC.
  2. Single-phase vs 3-phase. Under 5 kW sustained, single-phase 32 A is fine. Above that, 3-phase 32 A. Past 17–18 kW you are into 63 A 3-phase and a serious conversation with the electrician.
  3. Pick the tier. Default switched + outlet-metered. Step down only with a clear reason.
  4. Pick connectors. Count the C19/C20 inlets on the actual servers; get a PDU with at least that many C19 outlets plus C13 for network gear, plus spares.
  5. Plan the monitoring. SNMP v3 if supported; snmp_exporter into Prometheus; build the dashboard before you need it.
  6. For dual-source feeds: decide which gear is dual-PSU (two separate PDUs) vs single-PSU (ATS PDU fed from both). Do not mix strategies on the same equipment.
  7. Order spare C19/C20 cords. Three or four. Trust us.

P03 covers phase balancing across multiple racks. P04 covers breaker sizing and inrush. P05 sizes the UPS upstream of the PDU.


This is part of the Kentino Wiki, a reference series on AI compute, robotics, and the systems that connect them. Comments and corrections welcome at info@kentino.com.