Build Wiki

Build Wiki

A reference series on building, networking, powering, and operating AI compute — for buyers and integrators sizing their next 4-GPU box, 8-GPU server, or robotics lab.

Every article is written from real Kentino builds. No filler. Opinionated where the engineering demands it. Honest about limits.

52articles 9topic tracks 2new per week · Tue + Thu

Foundational AI Server W series · 7 articles

If you are spec-ing a multi-GPU box, read these first. Memory, PCIe, power, thermals, storage, and the GPU shortlist.

W01RAM and VRAM: How They Relate in an AI ServerA 4-GPU box with 192 GB VRAM and 32 GB RAM is broken. The right ratio depends on what you actually run.
W02PCIe Lanes & Topology in a Multi-GPU AI Server"PCIe x8 vs x16 doesn't matter for inference" is mostly correct — and the people repeating it usually don't know why.
W03GPU Risers: When You Need Them and What BreaksWhere signal integrity quietly dies, links silently retrain to Gen3, and benches that pass start dropping a GPU per day.
W04PSU Sizing and Dual-PSU ConfigurationsThe math, the form-factor reality, and the honest framing for 4-GPU and 8-GPU power delivery.
W05Thermals and Airflow in Multi-GPU BuildsA 4× RTX 5090 chassis dumps 2.4 kW continuously. Most "the GPU server got slower" stories are thermal, not software.
W06Storage Tiers: Model, Dataset, Scratch, CheckpointFour real storage tiers in a single AI server, when Gen5 NVMe is worth the premium, and how to size each without overbuying.
W07GPU Selection: 5090, 4090, RTX Pro 6000, L40, L4Honest head-to-head with real performance numbers, trade-offs, and a decision flow we actually use on customer calls.

Linux & OS Stack L series · 5 articles

Once the hardware is racked: driver pinning that survives unattended upgrades, CUDA/cuDNN/toolkit, kernel tuning that actually moves the needle, filesystem choice, and a monitoring stack that catches GPU faults before customers do.

L01Ubuntu Pinning & NVIDIA Driver ManagementThe single most common Kentino support call after a server's been up two months. Here's how to make it not happen.
L02CUDA, cuDNN, and the NVIDIA Container ToolkitFrom bare driver to a vLLM container serving tokens. Opinionated about which path to take, honest about which ones bite.
L03Linux Kernel Tuning: What Actually Moves the NeedleMost "Linux tuning for AI" advice is recycled database guidance from 2015. The smaller, honest set that actually helps.
L04Filesystem Choice: XFS, ZFS, ext4 (and Why Not Btrfs)Which filesystem to put where on a Kentino-class AI server, and the trade-offs that actually matter.
L05Monitoring: Prometheus, Grafana, DCGM, LokiAn AI server nobody is watching is silently throttling, leaking VRAM, or being OOM-killed at 03:00. The stack we ship.

Networking N series · 8 articles

Ethernet vs InfiniBand vs RoCE, NVLink reality, topologies (leaf-spine, fat-tree, dragonfly, tesseract, switchless), latency dissection, routing, and RDMA setup in practice.

N01Ethernet for AI: 25/100/400 GbE FundamentalsThe default fabric for any AI cluster outside a hyperscaler. Speed tiers, NICs, optics, switches.
N02InfiniBand vs RoCE vs Plain EthernetThree transport choices, three engineering philosophies, three different cost curves. The honest decision matrix.
N03NVLink and NVSwitch: When It MattersDGX marketing brags about terabytes per second of NVLink bandwidth. For most Kentino workloads, you don't need any of it.
N04Switched Topologies: Fat-Tree, Leaf-Spine, Dragonfly, TesseractEvery cluster diagram starts the same way. The real choice is which topology, how much oversubscription, what per-port speed.
N05Switchless Topologies: Mesh, Ring, Direct-ConnectA 32-port 400 GbE switch lands €40k–€80k in mid-2026. For 2-to-4 nodes, you don't need one.
N06Latency Dissection: Where Every Microsecond GoesPeople size networks with bandwidth charts. Then their allreduce benchmark prints a number nowhere near line rate.
N07Routing: ECMP, Adaptive Routing, DCQCNWhat happens above cables, NICs, switches: how packets find a path, and what stops the fabric from collapsing under all-reduce.
N08RDMA Setup in Practice + Cluster Uplink DesignHands-on: install drivers, prove the path, turn on GPUDirect, validate NCCL, then step up and design the whole-cluster uplink.

Clustering K series · 7 articles

When one node isn't enough. Single-vs-multi-node decision, distributed training, inference clusters, shared storage, scheduling, failure handling, and PCIe-as-interconnect limits.

K01Single-Node Multi-GPU vs Multi-Node: When to Scale OutThe most expensive mistake is splitting a GPU budget across two nodes when one bigger node would have done the job.
K02Distributed Training in 2026: DDP, FSDP2, DeepSpeed, MegatronFour open-source stacks, five axes of parallelism, and which one to actually pick for which job.
K03Inference Clusters: vLLM Tensor Parallel, Pipeline ParallelA 70B model doesn't fit on one GPU at useful KV cache. A 405B doesn't fit on one node. How you cut the model decides what it costs.
K04Cluster Storage: NFS, BeeGFS, Lustre, Object StoresShared storage is the part of a distributed cluster nobody thinks about until GPUs sit at 40% utilization for it.
K05Job Scheduling: SLURM, Kubernetes, RayA scheduler decides who runs where and when. Without one, shared clusters degenerate into Slack pings. With the wrong one, YAML.
K06Failure Handling: What Actually BreaksMeta's published Llama 3 405B post-mortem: 419 unexpected interruptions over 54 days on a 16,384-GPU cluster. Plan for it.
K07The Base Node: PCIe as InterconnectWhat one box looks like, what the bandwidth between two GPUs inside that box is, and why adding more boxes stops helping.

Power Delivery P series · 6 articles

What it takes to keep AI compute powered, in a building that wasn't designed for it. Single vs 3-phase, PDU types, phase balancing, breaker sizing, UPS, and generators.

P01Single-Phase vs Three-Phase PowerThe conversation that derails an AI build before the GPUs arrive. Three weeks lost to an electrician retrofit you didn't plan for.
P02PDU Types: Basic, Metered, Switched, ATSThe boring part of the rack that you regret first. 1.8–4.5 kW per node means the PDU is no longer just "the thing the server plugs into".
P03Phase Balancing Across AI Racks22 kW per phase looks like plenty on paper. So why do half of self-built AI labs trip a breaker in the first month?
P04Breaker Sizing and Inrush CurrentHalf the "breaker keeps tripping" calls aren't overloads — they're wrong trip curve or wrong RCD type. Here's how to size both right.
P05UPS Sizing: Topology, Battery, Runtime, the kVA TrapA "6 kVA / 6000 W" UPS is often not a 6 kW UPS. The gap between right and cheap is more about knowing what you're buying.
P06Generator and Transfer SwitchMost AI labs do not need a generator. For the minority where downtime has a measurable cost on a 24/7 SLA — this is the article.

Integration I series · 6 articles

Putting it all together. The robot-plus-server architecture, inference server setup, a robotics-lab network topology, the power/cooling budget, and a reference build that ties the whole series down to real hardware.

I01Edge AI Architecture: Robot ↔ On-Prem Inference ServerThe gold-standard article. A humanoid you bought is only half the system; this is the other half and how the two halves wire together.
I02Setting Up an Inference Server: vLLM, llama.cpp, SGLangHardware works, nvidia-smi sees every card. Now what serves tokens — and which wrong choice costs you 2× throughput.
I03Network Topology for a Robotics + AI LabNot an office, not a data centre, not a home lab — a small, unusually demanding network. The right shape for it in 2026.
I04Power and Cooling Budget for a Robotics + AI LabThe full load list, the cooling math, the electrical install you actually need, and the checklist to hand the electrician.
I05Reference Build: A Robotics + AI Lab in One RackAll earlier articles described the architecture. This one puts a price tag and a parts list on it — from real Kentino-shipped hardware.
I06Fleet Deployment: Multiple Robots, Shared ComputeOne robot is a project. Five is a system. Twenty is an operation. What changes at each tier, and how to scale K-AI behind them.

Token Economics T series · 3 articles

The math your CFO actually needs. Tok/s per €, on-prem vs cloud cost per million tokens, and how traffic shape (sustained vs burst) determines which one wins.

T01Tokens per Second per Euro: GPU Economics for InferenceMost buyers ask which GPU is fastest. The right question is which produces the most tokens per euro for the model you serve.
T02Cost per Million Tokens: On-Prem vs CloudCloud API, rented GPU, or your own rack — three honest ways to buy LLM inference in 2026, and what each costs per Mtok.
T03Sustained vs Burst Inference EconomicsThe article your CFO actually needs. Same fixed cost whether you serve 20 M tokens or 20 k — and what shape your traffic needs to be.

Robotics R series · 9 articles · blog

A modern humanoid is six or seven engineering disciplines bolted together. This series walks through the anatomy of humanoids and quadrupeds, the sensor stack, on-device vs off-device compute, SDKs and ROS 2, robot networking, the buying process, why robots need dedicated edge compute, and the bleeding-edge VLM-driven world-model stack.

R01Anatomy of a Modern HumanoidFive platforms you can actually buy today, $12k to $129k, three size classes. The differences matter.
R02Anatomy of a Quadruped RobotQuadrupeds have settled. Eight platforms a Czech buyer can obtain in 2026, $2,800 to $200,000. A 70× spread for the same animal.
R03The Robot Sensor StackFifteen to forty distinct sensor channels per modern legged robot. The photos show two. The other thirty-eight matter.
R04On-Device vs Off-Device ComputeWhat runs on the robot vs the rack, why the split exists, and how to design the boundary so latency doesn't kill the application.
R05Robot SDKs, ROS 2, and SimulationWhat you'll actually write code against. ROS 2 reality, vendor SDK reality, simulation environments that earn their keep.
R06Robot Networking: Wi-Fi, Tethers, 5G, LatencyWireless reality for a moving robot: what fails, what works, and the latency budget that decides which one you can tolerate.
R07Buying a Robot: Lead Times, Customs, SupportBuying robotics hardware into the EU isn't like buying a workstation. What lead times, customs, and post-sale support actually look like.
R08Why Robots Need Dedicated Edge ComputeThe latency argument. Why putting your model behind a cloud API breaks the use case the customer actually wants.
R09Autolabeling with VLM-Driven World ModelsThe bleeding-edge perception stack — Qwen2.5-VL, Grounded-SAM 2, Florence-2, NVIDIA Cosmos — applied to robotics ground truth.

Case Studies C series · 1+ articles · blog

Real Kentino builds with real measured numbers. Photographs, BOMs, benchmarks, and honest post-mortems.

C01Case Study: 4× RTX 4090 AI WorkstationEPYC 7542, 512 GB DDR4 ECC, 4× RTX 4090. 651.6 TFLOPS measured. 179.3 tok/s sustained on vLLM. 73°C peak. Real numbers from a shipped build.

New articles every Tuesday and Thursday

This wiki ships in batches. Two articles a week, scheduled out for the next six months. Coming next: a clear publish cadence so the order stays predictable. If you want a specific topic prioritized, write to info@kentino.com.