Build Wiki

Build Wiki

A reference series on building, networking, powering, and operating AI compute — for buyers and integrators sizing their next 4-GPU box, 8-GPU server, or robotics lab.

Every article is written from real Kentino builds. No filler. Opinionated where the engineering demands it. Honest about limits.

20articles live 9topic tracks 2new per week · Tue + Thu

Foundational AI Server W series

If you are spec-ing a multi-GPU box, read these first. Memory, PCIe, power, and the GPU shortlist.

W01RAM and VRAM: How They Relate in an AI ServerA 4-GPU box with 192 GB VRAM and 32 GB RAM is broken. The right ratio depends on what you actually run.
W02PCIe Lanes & Topology in a Multi-GPU AI Server"PCIe x8 vs x16 doesn't matter for inference" is mostly correct — and the people repeating it usually don't know why.
W03GPU Risers: When You Need Them and What BreaksWhere signal integrity quietly dies, links silently retrain to Gen3, and benches that pass start dropping a GPU per day.
W04PSU Sizing and Dual-PSU ConfigurationsThe math, the form-factor reality, and the honest framing for 4-GPU and 8-GPU power delivery.
W07GPU Selection: 5090, 4090, RTX Pro 6000, L40, L4Honest head-to-head with real performance numbers, trade-offs, and a decision flow we actually use on customer calls.

Networking N series

NVLink reality, cluster topologies (leaf-spine, fat-tree, dragonfly, switchless), latency dissection, routing, and RDMA setup in practice.

N03NVLink and NVSwitch: When It MattersDGX marketing brags about terabytes per second of NVLink bandwidth. For most Kentino workloads, you don't need any of it.
N04Switched Topologies: Fat-Tree, Leaf-Spine, Dragonfly, TesseractEvery cluster diagram starts the same way. The real choice is which topology, how much oversubscription, what per-port speed.
N05Switchless Topologies: Mesh, Ring, Direct-ConnectA 32-port 400 GbE switch lands €40k–€80k in mid-2026. For 2-to-4 nodes, you don't need one.
N06Latency Dissection: Where Every Microsecond GoesPeople size networks with bandwidth charts. Then their allreduce benchmark prints a number nowhere near line rate.
N07Routing: ECMP, Adaptive Routing, DCQCNWhat happens above cables, NICs, switches: how packets find a path, and what stops the fabric from collapsing under all-reduce.
N08RDMA Setup in Practice + Cluster Uplink DesignHands-on: install drivers, prove the path, turn on GPUDirect, validate NCCL, then step up and design the whole-cluster uplink.

Clustering K series

When one node isn't enough. Single-vs-multi-node decision, distributed training, inference clusters, and shared storage.

K01Single-Node Multi-GPU vs Multi-Node: When to Scale OutThe most expensive mistake is splitting a GPU budget across two nodes when one bigger node would have done the job.
K02Distributed Training in 2026: DDP, FSDP2, DeepSpeed, MegatronFour open-source stacks, five axes of parallelism, and which one to actually pick for which job.
K03Inference Clusters: vLLM Tensor Parallel, Pipeline ParallelA 70B model doesn't fit on one GPU at useful KV cache. A 405B doesn't fit on one node. How you cut the model decides what it costs.
K04Cluster Storage: NFS, BeeGFS, Lustre, Object StoresShared storage is the part of a distributed cluster nobody thinks about until GPUs sit at 40% utilization for it.

Integration I series

Putting it all together — the robot-plus-server architecture that ties the whole series down to real hardware.

I01Edge AI Architecture: Robot ↔ On-Prem Inference ServerThe gold-standard article. A humanoid you bought is only half the system; this is the other half and how the two halves wire together.

Robotics R series · blog

A modern humanoid is six or seven engineering disciplines bolted together. The buying process, why robots need dedicated edge compute, and the bleeding-edge VLM-driven world-model stack.

R07Buying a Robot: Lead Times, Customs, SupportBuying robotics hardware into the EU isn't like buying a workstation. What lead times, customs, and post-sale support actually look like.
R08Why Robots Need Dedicated Edge ComputeThe latency argument. Why putting your model behind a cloud API breaks the use case the customer actually wants.
R09Autolabeling with VLM-Driven World ModelsThe bleeding-edge perception stack — Qwen2.5-VL, Grounded-SAM 2, Florence-2, NVIDIA Cosmos — applied to robotics ground truth.

Case Studies C series · blog

Real Kentino builds with real measured numbers. Photographs, BOMs, benchmarks, and honest post-mortems.

C01Case Study: 4× RTX 4090 AI WorkstationEPYC 7542, 512 GB DDR4 ECC, 4× RTX 4090. 651.6 TFLOPS measured. 179.3 tok/s sustained on vLLM. 73°C peak. Real numbers from a shipped build.

New articles every Tuesday and Thursday

This wiki is a growing library — new build, networking, clustering, power, and robotics articles publish through 2026, each drawn from a real Kentino build. If you want a specific topic prioritized, write to info@kentino.com.