Kentino s.r.o.

K-AI 768 TurinDual RTXPro6000MQ 16000TOPS — 8× RTX Pro 6000 Blackwell Max-Q AI Frontier Server (Dual Turin)

Name: K-AI 768 TurinDual RTXPro6000MQ 16000TOPS — 8× RTX Pro 6000 Blackwell Max-Q AI Frontier Server (Dual Turin)
Brand: Kentino s.r.o.
Availability: InStock

€0,00 EUR

Sale Sold out

Shipping calculated at checkout.

Quantity

K-AI 768 TurinDual RTXPro6000MQ 16000TOPS

768 GB ECC VRAM Frontier Flagship
8x RTX Pro 6000 Max-Q | Dual EPYC Turin | 16 000 TOPS INT8

16 000

TOPS INT8

768 GB

ECC VRAM pool

Gen5

PCIe end-to-end

Flagship

frontier multi-tenant

CPU pricing finalized at order — Turin 9005-series market moves weekly in Q2 2026.

Published external references. Not measured on Kentino hardware.

Top of the Kentino AI server lineup. A 7U rack-mount flagship frontier-tier inference platform with eight NVIDIA RTX Pro 6000 Blackwell Max-Q turbofan cards pooled to 768 GB ECC VRAM, two AMD EPYC Turin 9005-series CPUs (Zen5c, SP5), 1.5 TB DDR5-4800 ECC (all 24 channels populated), 4 TB NVMe boot, and 5x 1200 W server PSU. PCIe Gen5 end-to-end. DeepSeek V3 fp8 native (~670 GB) on-card. Kimi-K2 Q4-Q5. 4 frontier-class models resident simultaneously.

Hardware

Component	Detail
GPUs	8x NVIDIA RTX Pro 6000 Blackwell Max-Q 96 GB ECC (turbofan, 600 W TDP spec, PCIe 5.0 x16, 2000 INT8 TOPS/card, fp8 native)
VRAM pool	768 GB total across 8 cards (no NVLink — P2P over PCIe Gen5 at ~55-60 GB/s within socket, cross-socket via CPU interconnect)
CPU	2x AMD EPYC Turin 9005-series (Zen5c, SP5, PCIe 5.0) — quote-pending, exact SKU confirmed at order
Motherboard	ASRock Rack TURIN2D24XGM/500W (dual SP5 Turin, PCIe 5.0, 24x DDR5, 2x 10 GbE, IPMI)
System RAM	1.5 TB DDR5-4800 ECC RDIMM (24x 64 GB — all 24 channels populated, ~920 GB/s aggregate)
Boot / storage	4 TB NVMe M.2 (PCIe 4.0 x4) — sized for frontier checkpoints
Power supply	5x 1200 W server PSU set (6 kW total)
Chassis	7U 8-GPU rack-mount, 10 PCIe slot capacity, active Gen5 risers
Cooling	2x SP5 Turin tower coolers + 8x 120 mm Martech chassis fans. Per-GPU turbofan blowers self-contained.
Network	Onboard dual 10 GbE (Intel X550)

Power envelope

GPU draw (spec): 8 x 600 W = 4 800 W
CPU draw: 2 x 360 W = 720 W (Turin mid-tier estimate)
System total at spec full load: ~5 720 W
PSU total: 6 000 W — ~4.7% raw headroom at spec
Real-world: Max-Q sustains 520-550 W in inference, lifting sustained headroom to ~20%+
Firmware power-cap at 520 W available for guaranteed headroom

Lane topology

Dual Turin provides 2x 128 PCIe Gen5 lanes. TURIN2D24XGM/500W routes 8 GPU slots direct-attached to the CPUs at Gen5 x16 via active risers — 4 slots per CPU root. No PCIe switch in the GPU path — clean dual-root topology. NUMA tuning required for optimal cross-socket peer-to-peer. No NVLink; P2P at ~55-60 GB/s per direction within socket.

What you can run

With 768 GB of pooled ECC VRAM — the top of the Kentino envelope — this server runs DeepSeek V3 fp8 native (~670 GB) on-card, Kimi-K2 Q4-Q5 (~630 GB) comfortable, and the defining use case: 4 frontier-class models resident simultaneously for multi-tenant production serving.

LLMs — text / reasoning / coding

Chinese frontier at production quants

Kimi-K2 (Base / Instruct / Thinking) at Q4_K_M / Q5_K_M (~630 GB) comfortable (~15-25 tok/s single, published reference) — flagship Chinese frontier on a single box at production quants
DeepSeek V3 / R1 / V3.1 / V3.2 at fp8 native (~670 GB) on-card (~30-50 tok/s single, published reference) — Blackwell fp8 tensor cores run this natively at speed
DeepSeek V3 at Q4_K_M (~404 GB) with multiple concurrent large-batch serving instances
GLM-5 / GLM-5.1 (~745B/44B) at Q3-Q4 (~420-560 GB) comfortable on-card
Intern-S1-Pro (1T/22B active, SAGE) at Q3-Q4 (~440-580 GB) comfortable
Qwen3-Coder-480B-A35B at Q5-Q6 (~340-400 GB) with 1M ctx
Qwen3-235B-A22B at bf16 (~470 GB) with generous KV for long context
ERNIE-4.5-424B-A47B at Q6 (~360 GB); Hunyuan-Large at fp8 (~390 GB)
MiniMax-Text-01 / M1 at Q5-Q6 (~325-390 GB)

Western frontier at production quants

Mistral Large 3 (675B/41B MoE, Apache 2.0) at Q3-Q4 (~317-404 GB) comfortable (~20-30 tok/s single, published reference)
Llama 4 Maverick (400B/17B, 128 experts) at Q5-Q6 (~290-350 GB)
Llama-3.1-Nemotron Ultra 253B at bf16 (~506 GB) on-card
Snowflake Arctic at Q5-Q6 (~350-420 GB); Grok-1 at Q5-Q6 (~225-270 GB)
DBRX Instruct 132B/36B at bf16 (~264 GB) multi-instance

Vision-Language Models

Qwen3-VL-235B-A22B flagship VLM with long context; InternVL3.5-241B-A28B at bf16 (~482 GB); GLM-4.5V / 4.6V 106B bf16 multi-instance; Llama 3.2 90B Vision bf16 multi-instance; Pixtral Large 124B bf16; Molmo 72B bf16 multi-instance.

Image generation

HunyuanImage-3.0 Instruct concurrent instances; FLUX.1 multi-instance (~15-20 s per 1024x1024 image, published reference); SD 3.5 Large; SDXL; AuraFlow; OmniGen; HunyuanImage-2.1; Kolors 2.0 — full Chinese + Western image stack resident concurrent.

Video generation

Wan 2.2 T2V-A14B / I2V-A14B — many concurrent streams; HunyuanVideo 13B bf16 multiple concurrent streams; Open-Sora 2.0 (11B) multi-instance; Mochi-1 (10B) multi-instance; NVIDIA Cosmos Predict 2 up to 14B.

Audio / Speech / TTS

Full stack resident at batch: Whisper v3 large, Parakeet-TDT, Canary 1B, Moshi 7B realtime, Qwen3-Omni, Step-Audio R1, CosyVoice 3.0, Kokoro, Stable Audio Open.

Multi-model / multi-tenant serving (the defining use case)

Multi-tenant frontier production: 4 frontier-class models resident simultaneously — e.g. DeepSeek V3 fp8 + Kimi-K2 Q4 + Mistral Large 3 Q3 + Qwen3-Coder-480B Q5 — with partitioned VRAM and per-tenant SLOs
Concurrent fp8-native Blackwell inference (DeepSeek V3 / R1 family, Hunyuan fp8) + quantized serving on separate PCIe domains
Research A/B across 4-5 frontier open-weight models at research-grade quants
Agentic platform with a 400B+ primary + multiple 30-70B specialists resident

Target workloads

Multi-tenant frontier open-weight production — multiple frontier models resident concurrently with per-tenant isolation
Sovereign frontier AI deployment — on-prem DeepSeek V3 fp8 / Kimi-K2 / Mistral Large 3 access, EU data residency
Frontier research lab with A/B evaluation across 4+ frontier open-weight models at research-grade quants
Enterprise agentic platform where 400B+ MoE drives tools + multiple specialist models
Air-gapped regulated-industry inference at frontier scale with ECC + PCIe Gen5

Published performance references

External references | Not measured on Kentino hardware

Benchmark	Result
RTX Pro 6000 per-card INT8 TOPS	2 000 TOPS
vLLM — DeepSeek V3 fp8 on 8x RTX Pro 6000 (single)	~30-50 tok/s
vLLM — DeepSeek V3 fp8 on 8x RTX Pro 6000 (batch-32)	300-500 tok/s aggregate
Kimi-K2 Q4 serving on 8x RTX Pro 6000 (single)	~15-25 tok/s
FLUX.1 [dev] fp8 on single RTX Pro 6000	~15-20 s per 1024x1024 image

Exact figures confirmed at PoC stage. Kentino will publish first-party numbers after initial customer build.

Not ideal for

Budget-conscious deployments — flagship SKU at flagship price
Training from scratch on frontier-class models — no NVLink, PCIe P2P only (for training at this scale H100/H200 SXM or GB200 NVLink fabric is the right tool)
Plug-and-play deployment — frontier multi-tenant MoE serving requires a skilled MLOps team

Warranty and lead time

2 years

parts warranty

1 year

labor warranty

10-28 days

lead time

Build includes assembly, BIOS config, driver install, burn-in, memtest, functional verification, NUMA tuning, and LLM environment setup (vLLM / SGLang / llama.cpp / CUDA 13 stack with fp8 Blackwell kernels). Lead time depends on component availability, confirmed at order.

Recommended add-ons

NVIDIA ConnectX-5 MCX555A-ECAT or ConnectX-7 Gen5 100 GbE NIC for multi-node scale-out
Mellanox ConnectX-6 25 GbE SFP28 for datacenter fabric
Second 4 TB NVMe for dataset / model library (frontier checkpoints are large — Kimi-K2 bf16 alone is ~1 TB)
Full 24U rack cabinet with front perforated door and managed PDU
Online UPS 10 kVA (graceful shutdown on power event)