Build Wiki

A reference series on building, networking, powering, and operating AI compute — for buyers and integrators sizing their next 4-GPU box, 8-GPU server, or robotics lab.

Every article is written from real Kentino builds. No filler. Opinionated where the engineering demands it. Honest about limits.

49articles live 9topic tracks 2new per week · Tue + Thu

Foundational AI Server W series

If you are spec-ing a multi-GPU box, read these first. Memory, PCIe, power, thermals, storage, and the GPU shortlist.

W01RAM and VRAM: How They Relate in an AI ServerA 4-GPU box with 192 GB VRAM and 32 GB RAM is broken. The right ratio depends on what you actually run.

W02PCIe Lanes & Topology in a Multi-GPU AI Server"PCIe x8 vs x16 doesn't matter for inference" is mostly correct — and the people repeating it usually don't know why.

W03GPU Risers: When You Need Them and What BreaksWhere signal integrity quietly dies, links silently retrain to Gen3, and benches that pass start dropping a GPU per day.

W04PSU Sizing and Dual-PSU ConfigurationsThe math, the form-factor reality, and the honest framing for 4-GPU and 8-GPU power delivery.

W05Thermals and Airflow in Multi-GPU AI Server BuildsFour GPUs at 450 W each is 1.8 kW of heat in a box. Front-to-back airflow, static pressure, and why "it has fans" isn't a cooling plan.

W06Storage Tiers in an AI ServerModel, dataset, scratch, checkpoint — four workloads with four access patterns. One NVMe tier serves none of them well.

W07GPU Selection: 5090, 4090, RTX Pro 6000, L40, L4Honest head-to-head with real performance numbers, trade-offs, and a decision flow we actually use on customer calls.

Token Economics T series

The money math. Tokens per euro, on-prem vs cloud cost per million tokens, and when each model actually wins.

T01Tokens per Second per EuroThe only GPU-value metric that matters for inference. Real tokens/sec per euro across the 5090, 4090, RTX Pro 6000, L40, L4.

T02Cost per Million Tokens: On-Prem vs CloudOn-prem vs cloud, worked out in euros. Where the crossover is, and why the cloud bill wins until it suddenly doesn't.

T03Sustained vs Burst Inference EconomicsSteady 24/7 inference and spiky batch jobs have opposite economics. When on-prem wins, and when the cloud bails you out.

Linux / OS / Software L series

The software stack under the GPUs. Driver pinning, CUDA setup, kernel tuning, filesystems, and monitoring.

L01Ubuntu Pinning and NVIDIA Driver ManagementPin the kernel, pin the driver, or watch an apt upgrade take your GPUs offline. Driver management that survives reboots.

L02CUDA, cuDNN, and the NVIDIA Container ToolkitThe sane setup path. CUDA vs driver vs toolkit versioning, and the container route that stops dependency hell.

L03Linux Kernel Tuning for AI ServersWhat actually moves the needle for AI workloads — hugepages, NUMA, IRQ affinity — and what's cargo-cult.

L04Filesystem Choice for AI ServersXFS, ZFS, ext4 — and why Btrfs isn't on the list for AI servers. Match the filesystem to the access pattern.

L05Monitoring Stack: Prometheus, Grafana, DCGM, LokiSee GPU temps, VRAM, ECC errors, and job failures before they cost you a training run.

Networking N series

NVLink reality, cluster topologies (leaf-spine, fat-tree, dragonfly, switchless), latency dissection, routing, and RDMA setup in practice.

N03NVLink and NVSwitch: When It MattersDGX marketing brags about terabytes per second of NVLink bandwidth. For most Kentino workloads, you don't need any of it.

N04Switched Topologies: Fat-Tree, Leaf-Spine, Dragonfly, TesseractEvery cluster diagram starts the same way. The real choice is which topology, how much oversubscription, what per-port speed.

N05Switchless Topologies: Mesh, Ring, Direct-ConnectA 32-port 400 GbE switch lands €40k–€80k in mid-2026. For 2-to-4 nodes, you don't need one.

N06Latency Dissection: Where Every Microsecond GoesPeople size networks with bandwidth charts. Then their allreduce benchmark prints a number nowhere near line rate.

N07Routing: ECMP, Adaptive Routing, DCQCNWhat happens above cables, NICs, switches: how packets find a path, and what stops the fabric from collapsing under all-reduce.

N08RDMA Setup in Practice + Cluster Uplink DesignHands-on: install drivers, prove the path, turn on GPUDirect, validate NCCL, then step up and design the whole-cluster uplink.

Clustering K series

When one node isn't enough. Single-vs-multi-node decision, distributed training, inference clusters, shared storage, scheduling, and failure handling.

K01Single-Node Multi-GPU vs Multi-Node: When to Scale OutThe most expensive mistake is splitting a GPU budget across two nodes when one bigger node would have done the job.

K02Distributed Training in 2026: DDP, FSDP2, DeepSpeed, MegatronFour open-source stacks, five axes of parallelism, and which one to actually pick for which job.

K03Inference Clusters: vLLM Tensor Parallel, Pipeline ParallelA 70B model doesn't fit on one GPU at useful KV cache. A 405B doesn't fit on one node. How you cut the model decides what it costs.

K04Cluster Storage: NFS, BeeGFS, Lustre, Object StoresShared storage is the part of a distributed cluster nobody thinks about until GPUs sit at 40% utilization for it.

K05Job Scheduling: SLURM, Kubernetes, RaySLURM, Kubernetes, Ray — and knowing when you need none of them. Match the scheduler to the team size.

K06Failure Handling in AI ClustersWhat actually breaks in a GPU cluster and how to recover — checkpointing, health checks, and draining bad nodes.

Integration I series

Putting it all together — inference-server setup, lab network and power budgets, a reference build, and fleet deployment.

I01Edge AI Architecture: Robot ↔ On-Prem Inference ServerThe gold-standard article. A humanoid you bought is only half the system; this is the other half and how the two halves wire together.

I02Setting Up an Inference Server: vLLM, llama.cpp, SGLangInstall, serve, and benchmark. Which engine for which model and latency target.

I03Network Topology for a Robotics + AI Compute LabWiring a robotics + AI lab — robot links, inference-server uplinks, and where latency hides.

I04Power and Cooling Budget for a Robotics + AI LabSizing power and cooling for a mixed robotics + AI lab before you sign the lease.

I05Reference Build: A Robotics + AI Lab in One RackA complete robotics + AI lab in one rack — the BOM, the layout, and the trade-offs we'd actually ship.

I06Fleet Deployment: Multiple Robots, Shared ComputeMultiple robots, shared compute. How to size and schedule inference for a fleet, not a demo.

Power Delivery P series

Getting clean power to the rack. Phases, PDUs, balancing, breakers, UPS, and generators for AI compute.

P01Single-Phase vs Three-Phase PowerWhen a 4-GPU box outgrows a single 16 A circuit, and what three-phase actually buys you in an AI rack.

P02PDU Types: Basic, Metered, Switched, ATSWhat each does and which you actually need for a GPU rack.

P03Phase Balancing Across AI RacksLoad three phases evenly or trip a breaker at the worst time. How balancing works across AI racks.

P04Breaker Sizing and Inrush CurrentContinuous load, inrush current, and the 80% rule. Size breakers for GPU servers that don't nuisance-trip.

P05UPS Sizing for AI ComputeTopology, battery chemistry, runtime, and the kVA-vs-kW trap that undersizes half of AI-lab UPS purchases.

P06Generator and Transfer SwitchWhen battery runtime isn't enough. Generator sizing and transfer-switch basics for AI labs.

Robotics R series · blog

A modern humanoid is six or seven engineering disciplines bolted together. Anatomy, sensors, compute placement, SDKs, networking, buying, and the bleeding-edge VLM world-model stack.

R01Anatomy of a Modern HumanoidSix or seven engineering disciplines bolted together. What's actually inside a modern humanoid.

R02Anatomy of a Quadruped RobotQuadrupeds trade reach for stability and battery life. The mechanical and compute anatomy of the workhorse form factor.

R03The Robot Sensor StackCameras, depth, LiDAR, IMU, force-torque — what each sensor buys and what it costs in compute.

R04On-Device vs Off-Device Compute for RobotsWhat runs on the robot vs on the server. The latency-vs-capability trade every deployment has to make.

R05Robot SDKs, ROS 2, and SimulationROS 2, the vendor SDKs, and simulation environments — the software you develop against before hardware arrives.

R06Robot Networking: Wi-Fi, Tethers, 5GWi-Fi, tethers, 5G — and why latency, not bandwidth, decides what you can offload off-robot.

R07Buying a Robot: Lead Times, Customs, SupportBuying robotics hardware into the EU isn't like buying a workstation. What lead times, customs, and post-sale support actually look like.

R08Why Robots Need Dedicated Edge ComputeThe latency argument. Why putting your model behind a cloud API breaks the use case the customer actually wants.

R09Autolabeling with VLM-Driven World ModelsThe bleeding-edge perception stack — Qwen2.5-VL, Grounded-SAM 2, Florence-2, NVIDIA Cosmos — applied to robotics ground truth.

Case Studies C series · blog

Real Kentino builds with real measured numbers. Photographs, BOMs, benchmarks, and honest post-mortems.

C01Case Study: 4× RTX 4090 AI WorkstationEPYC 7542, 512 GB DDR4 ECC, 4× RTX 4090. 651.6 TFLOPS measured. 179.3 tok/s sustained on vLLM. 73°C peak. Real numbers from a shipped build.

New articles every Tuesday and Thursday

This wiki is a growing library — new build, networking, clustering, power, and robotics articles publish through 2026, each drawn from a real Kentino build. If you want a specific topic prioritized, write to info@kentino.com.

Item added to your cart

Build Wiki

Build Wiki

Foundational AI Server W series

Token Economics T series

Linux / OS / Software L series

Networking N series

Clustering K series

Integration I series

Power Delivery P series

Robotics R series · blog

Case Studies C series · blog

New articles every Tuesday and Thursday

Country/region

Language