Kentino s.r.o.
K-AI 768 TurinDual RTXPro6000MQ 16000TOPS — 8× RTX Pro 6000 Blackwell Max-Q AI Frontier Server (Dual Turin)
K-AI 768 TurinDual RTXPro6000MQ 16000TOPS — 8× RTX Pro 6000 Blackwell Max-Q AI Frontier Server (Dual Turin)
Couldn't load pickup availability
K-AI 768 TurinDual RTXPro6000MQ 16000TOPS
768 GB ECC VRAM Frontier Flagship
8x RTX Pro 6000 Max-Q | Dual EPYC Turin | 16 000 TOPS INT8
CPU pricing finalized at order — Turin 9005-series market moves weekly in Q2 2026.
Published external references. Not measured on Kentino hardware.
Top of the Kentino AI server lineup. A 7U rack-mount flagship frontier-tier inference platform with eight NVIDIA RTX Pro 6000 Blackwell Max-Q turbofan cards pooled to 768 GB ECC VRAM, two AMD EPYC Turin 9005-series CPUs (Zen5c, SP5), 1.5 TB DDR5-4800 ECC (all 24 channels populated), 4 TB NVMe boot, and 5x 1200 W server PSU. PCIe Gen5 end-to-end. DeepSeek V3 fp8 native (~670 GB) on-card. Kimi-K2 Q4-Q5. 4 frontier-class models resident simultaneously.
Hardware
| Component | Detail |
|---|---|
| GPUs | 8x NVIDIA RTX Pro 6000 Blackwell Max-Q 96 GB ECC (turbofan, 600 W TDP spec, PCIe 5.0 x16, 2000 INT8 TOPS/card, fp8 native) |
| VRAM pool | 768 GB total across 8 cards (no NVLink — P2P over PCIe Gen5 at ~55-60 GB/s within socket, cross-socket via CPU interconnect) |
| CPU | 2x AMD EPYC Turin 9005-series (Zen5c, SP5, PCIe 5.0) — quote-pending, exact SKU confirmed at order |
| Motherboard | ASRock Rack TURIN2D24XGM/500W (dual SP5 Turin, PCIe 5.0, 24x DDR5, 2x 10 GbE, IPMI) |
| System RAM | 1.5 TB DDR5-4800 ECC RDIMM (24x 64 GB — all 24 channels populated, ~920 GB/s aggregate) |
| Boot / storage | 4 TB NVMe M.2 (PCIe 4.0 x4) — sized for frontier checkpoints |
| Power supply | 5x 1200 W server PSU set (6 kW total) |
| Chassis | 7U 8-GPU rack-mount, 10 PCIe slot capacity, active Gen5 risers |
| Cooling | 2x SP5 Turin tower coolers + 8x 120 mm Martech chassis fans. Per-GPU turbofan blowers self-contained. |
| Network | Onboard dual 10 GbE (Intel X550) |
Power envelope
- GPU draw (spec): 8 x 600 W = 4 800 W
- CPU draw: 2 x 360 W = 720 W (Turin mid-tier estimate)
- System total at spec full load: ~5 720 W
- PSU total: 6 000 W — ~4.7% raw headroom at spec
- Real-world: Max-Q sustains 520-550 W in inference, lifting sustained headroom to ~20%+
- Firmware power-cap at 520 W available for guaranteed headroom
Lane topology
Dual Turin provides 2x 128 PCIe Gen5 lanes. TURIN2D24XGM/500W routes 8 GPU slots direct-attached to the CPUs at Gen5 x16 via active risers — 4 slots per CPU root. No PCIe switch in the GPU path — clean dual-root topology. NUMA tuning required for optimal cross-socket peer-to-peer. No NVLink; P2P at ~55-60 GB/s per direction within socket.
What you can run
With 768 GB of pooled ECC VRAM — the top of the Kentino envelope — this server runs DeepSeek V3 fp8 native (~670 GB) on-card, Kimi-K2 Q4-Q5 (~630 GB) comfortable, and the defining use case: 4 frontier-class models resident simultaneously for multi-tenant production serving.
LLMs — text / reasoning / coding
Chinese frontier at production quants
- Kimi-K2 (Base / Instruct / Thinking) at Q4_K_M / Q5_K_M (~630 GB) comfortable (~15-25 tok/s single, published reference) — flagship Chinese frontier on a single box at production quants
- DeepSeek V3 / R1 / V3.1 / V3.2 at fp8 native (~670 GB) on-card (~30-50 tok/s single, published reference) — Blackwell fp8 tensor cores run this natively at speed
- DeepSeek V3 at Q4_K_M (~404 GB) with multiple concurrent large-batch serving instances
- GLM-5 / GLM-5.1 (~745B/44B) at Q3-Q4 (~420-560 GB) comfortable on-card
- Intern-S1-Pro (1T/22B active, SAGE) at Q3-Q4 (~440-580 GB) comfortable
- Qwen3-Coder-480B-A35B at Q5-Q6 (~340-400 GB) with 1M ctx
- Qwen3-235B-A22B at bf16 (~470 GB) with generous KV for long context
- ERNIE-4.5-424B-A47B at Q6 (~360 GB); Hunyuan-Large at fp8 (~390 GB)
- MiniMax-Text-01 / M1 at Q5-Q6 (~325-390 GB)
Western frontier at production quants
- Mistral Large 3 (675B/41B MoE, Apache 2.0) at Q3-Q4 (~317-404 GB) comfortable (~20-30 tok/s single, published reference)
- Llama 4 Maverick (400B/17B, 128 experts) at Q5-Q6 (~290-350 GB)
- Llama-3.1-Nemotron Ultra 253B at bf16 (~506 GB) on-card
- Snowflake Arctic at Q5-Q6 (~350-420 GB); Grok-1 at Q5-Q6 (~225-270 GB)
- DBRX Instruct 132B/36B at bf16 (~264 GB) multi-instance
Vision-Language Models
Qwen3-VL-235B-A22B flagship VLM with long context; InternVL3.5-241B-A28B at bf16 (~482 GB); GLM-4.5V / 4.6V 106B bf16 multi-instance; Llama 3.2 90B Vision bf16 multi-instance; Pixtral Large 124B bf16; Molmo 72B bf16 multi-instance.
Image generation
HunyuanImage-3.0 Instruct concurrent instances; FLUX.1 multi-instance (~15-20 s per 1024x1024 image, published reference); SD 3.5 Large; SDXL; AuraFlow; OmniGen; HunyuanImage-2.1; Kolors 2.0 — full Chinese + Western image stack resident concurrent.
Video generation
Wan 2.2 T2V-A14B / I2V-A14B — many concurrent streams; HunyuanVideo 13B bf16 multiple concurrent streams; Open-Sora 2.0 (11B) multi-instance; Mochi-1 (10B) multi-instance; NVIDIA Cosmos Predict 2 up to 14B.
Audio / Speech / TTS
Full stack resident at batch: Whisper v3 large, Parakeet-TDT, Canary 1B, Moshi 7B realtime, Qwen3-Omni, Step-Audio R1, CosyVoice 3.0, Kokoro, Stable Audio Open.
Multi-model / multi-tenant serving (the defining use case)
- Multi-tenant frontier production: 4 frontier-class models resident simultaneously — e.g. DeepSeek V3 fp8 + Kimi-K2 Q4 + Mistral Large 3 Q3 + Qwen3-Coder-480B Q5 — with partitioned VRAM and per-tenant SLOs
- Concurrent fp8-native Blackwell inference (DeepSeek V3 / R1 family, Hunyuan fp8) + quantized serving on separate PCIe domains
- Research A/B across 4-5 frontier open-weight models at research-grade quants
- Agentic platform with a 400B+ primary + multiple 30-70B specialists resident
Target workloads
- Multi-tenant frontier open-weight production — multiple frontier models resident concurrently with per-tenant isolation
- Sovereign frontier AI deployment — on-prem DeepSeek V3 fp8 / Kimi-K2 / Mistral Large 3 access, EU data residency
- Frontier research lab with A/B evaluation across 4+ frontier open-weight models at research-grade quants
- Enterprise agentic platform where 400B+ MoE drives tools + multiple specialist models
- Air-gapped regulated-industry inference at frontier scale with ECC + PCIe Gen5
Published performance references
External references | Not measured on Kentino hardware
| Benchmark | Result |
|---|---|
| RTX Pro 6000 per-card INT8 TOPS | 2 000 TOPS |
| vLLM — DeepSeek V3 fp8 on 8x RTX Pro 6000 (single) | ~30-50 tok/s |
| vLLM — DeepSeek V3 fp8 on 8x RTX Pro 6000 (batch-32) | 300-500 tok/s aggregate |
| Kimi-K2 Q4 serving on 8x RTX Pro 6000 (single) | ~15-25 tok/s |
| FLUX.1 [dev] fp8 on single RTX Pro 6000 | ~15-20 s per 1024x1024 image |
Exact figures confirmed at PoC stage. Kentino will publish first-party numbers after initial customer build.
Not ideal for
- Budget-conscious deployments — flagship SKU at flagship price
- Training from scratch on frontier-class models — no NVLink, PCIe P2P only (for training at this scale H100/H200 SXM or GB200 NVLink fabric is the right tool)
- Plug-and-play deployment — frontier multi-tenant MoE serving requires a skilled MLOps team
Warranty and lead time
Build includes assembly, BIOS config, driver install, burn-in, memtest, functional verification, NUMA tuning, and LLM environment setup (vLLM / SGLang / llama.cpp / CUDA 13 stack with fp8 Blackwell kernels). Lead time depends on component availability, confirmed at order.
Recommended add-ons
- NVIDIA ConnectX-5 MCX555A-ECAT or ConnectX-7 Gen5 100 GbE NIC for multi-node scale-out
- Mellanox ConnectX-6 25 GbE SFP28 for datacenter fabric
- Second 4 TB NVMe for dataset / model library (frontier checkpoints are large — Kimi-K2 bf16 alone is ~1 TB)
- Full 24U rack cabinet with front perforated door and managed PDU
- Online UPS 10 kVA (graceful shutdown on power event)
Share
