Kentino s.r.o.

Kentino AI 192 Turin2U RTXPro6000 4000TOPS — 2× RTX Pro 6000 Blackwell Server Edition — 2U Turin SP5

Name: Kentino AI 192 Turin2U RTXPro6000 4000TOPS — 2× RTX Pro 6000 Blackwell Server Edition — 2U Turin SP5
Brand: Kentino s.r.o.
Price: 56600.00 EUR
Availability: InStock

€56.600,00 EUR

Έκπτωση Εξαντλήθηκε

Τα έξοδα αποστολής υπολογίζονται κατά την ολοκλήρωση της αγοράς.

Ποσότητα

Kentino AI 192 Turin2U RTXPro6000 4000TOPS

192 GB ECC Blackwell Flagship Pair
2x RTX Pro 6000 Server Edition | EPYC Turin SP5 | 4 000 TOPS INT8

4 000

INT8 TOPS

192 GB

ECC VRAM

Blackwell

fp8 native

2-card

minimal TP

Two passive RTX Pro 6000 Blackwell Server Edition cards -- 96 GB ECC each. Less tensor-parallel overhead than 4- or 8-card builds. Datacenter flagship pair on a Gen5/DDR5 2U platform with genuine 1+1 redundant power.

A 2U rack-mount inference server with two passive RTX Pro 6000 Blackwell Server Edition cards (96 GB ECC GDDR7 per card), one AMD EPYC 9335 Turin CPU (32C/64T, 3.0/4.4 GHz), 512 GB DDR5-4800 ECC, 5.76 TB datacenter Gen5 NVMe, and a 1+1 redundant 2.7 kW 80+ Platinum CRPS power supply. Starting from €56 600 ex VAT. For 70B dense bf16 and mid-size MoE, fewer big cards beat more small cards -- two-card tensor parallelism has minimal communication overhead, and each 96 GB card carries a complete copy of most models.

The same 192 GB Blackwell pair as our 4U Rome build, in a 2U rack-dense ASRock chassis with full Gen5 host-side, DDR5-4800 memory, and a genuine 1+1 redundant 2.7 kW Platinum CRPS power supply. Pick this build when rack density matters, when your grant or procurement spec mandates a modern PCIe 5.0 / DDR5 platform, or when redundant power is a requirement rather than an upsell.

Hardware

Component	Detail
GPUs	2x NVIDIA RTX Pro 6000 Blackwell Server Edition 96 GB ECC GDDR7 (passive, 600 W, PCIe 5.0 x16, dual-slot)
VRAM pool	192 GB ECC (96 GB x 2) -- each card holds a 70B bf16 model standalone
CPU	AMD EPYC 9335 Turin (32C/64T, 3.0/4.4 GHz, 210 W, SP5, 128x PCIe 5.0 lanes, Zen5c, 256 MB L3)
Motherboard	ASRock Rack 2U4G-GENOA/M3 (SP5, 4x PCIe 5.0 x16 dual-slot GPU, 8x DDR5 1DPC, OCP 3.0, IPMI AST2600)
System RAM	512 GB DDR5-4800 ECC RDIMM (8x 64 GB, 1DPC fully populated -- max bandwidth configuration)
Boot / storage	Kioxia CD8-P 3.84 TB Gen5 U.3 (hot-tier, 1 DWPD, ~12 GB/s read) + Kioxia CD8-P 1.92 TB Gen5 U.3 (boot OS tier) -- 5.76 TB total datacenter Gen5 NVMe
Power supply	1+1 redundant 2.7 kW 80+ Platinum CRPS (2x 1350 W at 230 V) -- genuine N+1 redundancy; one PSU sustains full inference load
Chassis	2U rack-mount with front-to-back directed airflow (80 mm high-static-pressure fans). 24/7-capable.
Cooling	SP5 active CPU heatsink + 3x 80x38 mm front intake + 1x 80x80 mm rear exhaust (designed for 4x passive GPU thermal load; 2-card layout provides ample thermal headroom)
Network	Intel X710-T2L PCIe dual 10GBASE-T + OCP 3.0 slot available for 25/100 GbE upgrade

Power envelope

GPU draw: 2x 600 W = 1 200 W
System total at full load: ~1 510 W
PSU config: 1+1 redundant CRPS, 2x 1350 W at 230 V (2 700 W total)
Headroom: 44.1 % under typical inference load
Genuine N+1 redundancy -- one PSU sustains full inference load; no single-PSU failure risk

Lane topology

PCIe Gen5 x16 end-to-end -- both host and card native Gen5. Direct root-complex connection, no PCIe switch. One PCIe 5.0 x16 single-slot + one PCIe 5.0 x8 slot remain available (NIC occupies the x8 slot). No NVLink -- inter-GPU peer-to-peer via PCIe. Gen5 bandwidth eliminates the Gen4 host-cap present in the 4U Rome sibling.

What you can run

With 192 GB ECC VRAM on just two Blackwell cards with native fp8/fp4, this is the cleanest path to dense 70B at bf16 and mid-size MoE. Two independent 70B streams -- one per card -- or 200B MoE across both with minimal 2-way TP overhead.

LLMs -- text / reasoning / coding

Chinese frontier

Qwen3 / Qwen3.5 (Alibaba): Qwen3-235B-A22B Q4 (~132 GB) comfortable with long ctx (~15-25 tok/s single-stream across 2 cards); Qwen3-Coder-480B-A35B Q2 (~160 GB); Qwen3.5-122B-A10B fp8 (~75 GB); Qwen3-32B dense bf16 with huge KV; QwQ-32B bf16
DeepSeek: DeepSeek-V3/R1 Q2 (~215 GB with small RAM spill) -- Blackwell runs fp8 natively; DeepSeek-R2 32B bf16 two concurrent streams (one per card)
GLM / Z.ai: GLM-4.5 / 4.6 / 4.7 Q4 (~177 GB) -- hero config at this tier; GLM-4.5-Air fp8 or bf16 with huge KV
Tencent Hunyuan: Hunyuan-Large Q3 (~160 GB) -- 389B MoE with 256k ctx; Hunyuan-A13B fp8 native (~80 GB) with huge KV
Others: Baidu ERNIE-4.5-424B Q3 (~180 GB); InternVL3.5-241B-A28B Q4 (~135 GB); MiniMax-M1 Q3 (~180 GB)

Western frontier

Meta Llama: Llama 3.3 70B bf16 on one card -- two independent concurrent 70B streams (~20-30 tok/s per stream); Llama 4 Scout bf16 (~218 GB, tight); Llama 4 Maverick Q3 (~188 GB)
Mistral: Mistral Large 2 / Pixtral Large / Devstral 2 123B Q6 (~88 GB) single-card or bf16 across both; Mistral Small 3 multi-stream
OpenAI (open weights): gpt-oss-120b MXFP4 native (80 GB) -- fits on ONE card, two independent concurrent streams
NVIDIA Nemotron: Llama-3.1-Nemotron Ultra 253B Q4 (~147 GB); Super 49B bf16 on single card
Others: Cohere Command R+ 104B Q6 (~85 GB) on one card; Google Gemma 3 27B bf16 multiple concurrent streams

Vision-Language Models

InternVL3.5-241B-A28B Q4 (~135 GB); Qwen3-VL-235B-A22B Q4; Qwen3-VL-32B bf16 single-card; Pixtral Large 124B bf16 or Q6; Llama 3.2 90B Vision bf16 (~180 GB); Molmo 72B bf16 (~144 GB); GLM-4.6V 106B fp8; Gemma 3 27B multimodal x 2-3 concurrent streams.

Image generation

FLUX.1 [dev] bf16 multiple concurrent streams; FLUX.1 Kontext [dev]; FLUX Tools; SD 3.5 Large bf16 concurrent; HunyuanImage-2.1 bf16 (~34 GB) x 2-4 concurrent; HunyuanImage-3.0 base (80B MoE, 13B active) bf16 -- fits on one card; HunyuanDiT; Kolors / Kolors 2.0; AuraFlow; OmniGen v1; PixArt-Sigma.

Video generation

Wan 2.2 MoE dual-expert bf16 full context -- fits on one card, two concurrent generation streams; Wan 2.2 TI2V-5B; HunyuanVideo 13B bf16 both experts; HunyuanVideo 1.5; CogVideoX-5B bf16; Open-Sora 2.0 11B bf16; Mochi-1 bf16 (~42 GB); LTX-Video; Pyramid Flow; SVD / SV3D / SV4D; NVIDIA Cosmos Predict 2.

Audio / Speech / TTS

ASR: Whisper v3 large / turbo (~50x realtime); Parakeet-TDT; Canary 1B; Qwen3-ASR; SenseVoice
TTS: CosyVoice 2/3; Kokoro 82M; XTTS v2; Stable Audio Open; Step-Audio-EditX
Realtime / S2S: Kyutai Moshi 7B; Step-Audio 2 mini/R1; Qwen2.5-Omni-7B
Music / SFX: MusicGen / AudioGen / Bark; SeamlessM4T v2

Multi-model / multi-tenant serving

Two independent 70B streams -- one per card, simplest form of tenant isolation
Dense 70B bf16 + supporting stack -- LLM on card 1, image/video/audio on card 2
200B MoE across both cards -- minimal tensor-parallel overhead (2-way split)
fp8-native frontier -- DeepSeek V3 family, Hunyuan-Large fp8 with Blackwell native paths

Target workloads

Dense 70B bf16 inference -- two cards tensor-parallel with minimal overhead, or one model per card for streaming
100-150B MoE at Q4-Q6 (GLM-4.5-Air, Qwen3.5-122B-A10B, Hunyuan-A13B, Llama 4 Scout)
FP8-native frontier inference (DeepSeek V3 family, Hunyuan, Llama 4) -- Blackwell runs fp8 natively
Scientific computation requiring datacenter-grade Gen5 NVMe throughput and ECC memory
Image + video generation studio at bf16 (Wan 2.2 T2V-A14B, HunyuanVideo 13B, FLUX.1 [dev])
Rack-density-constrained deployments -- 2U form factor vs the 4U Rome equivalent at same VRAM
Procurement specs mandating PCIe 5.0 / DDR5 platform or redundant PSU

Measured performance

Published references | NVIDIA RTX Pro 6000 Blackwell Server Edition datasheet + community benchmarks

Benchmark	Result
Per-card INT8 TOPS (NVIDIA datasheet)	2 000 TOPS
Aggregate INT8 TOPS (2 cards)	4 000 TOPS
Memory bandwidth per card	~1 800 GB/s, 96 GB ECC GDDR7
Llama 3.3 70B bf16 per-card (community)	15-25 tok/s single-stream, 60-90 tok/s batch -- expected improvement from Gen5 host-side memory path in streaming batch workloads vs Gen4 host
Gen5 host-side advantage (single-card same silicon)	PCIe 5.0 x16 end-to-end reduces host-device transfer latency for streaming batch workloads; on-card compute-bound tasks see identical throughput to Gen4-hosted builds
Dual-card tensor-parallel 70B (community)	~30-45 tok/s single-stream expected
Blackwell fp8 native	DeepSeek-V3 fp8, Hunyuan-A13B fp8 run without bf16 upcast

Published external references, not measured on Kentino hardware. Kentino will publish first-party numbers after the first customer build.

Not ideal for

Very high concurrency multi-tenant serving -- 4x L40 or 6x L4 distributes better across more cards
Heavy KV cache at very long context -- step up to Kentino AI 576 Genoa RTXPro6000 12000TOPS
Training -- Kentino does not sell H-class NVLink fabrics
Budget inference at this VRAM pool -- the 4U Rome Kentino AI 192 RTXPro6000 4000TOPS build is lower-cost if Gen4 host-side is acceptable and PSU redundancy is not required

Warranty and lead time

2 years

parts warranty

1 year

labor warranty

14-21 days

lead time

NVIDIA OEM 3-year warranty on RTX Pro 6000 Server Edition + 36-month chassis warranty + Kentino integration warranty. Build includes assembly, BIOS/firmware configuration, IPMI setup, driver install, burn-in testing, and functional verification. Lead time of 14-21 business days reflects reseller order for Turin-class components; confirmed at order placement.

Recommended add-ons

Expand to 4-card configuration -- chassis has 4 GPU bays natively (current build uses 2 of 4), upgrade path to Kentino AI 384 Turin2U RTXPro6000 8000TOPS
Add 25 GbE or 100 GbE via OCP 3.0 slot (Mellanox ConnectX-5/6 OCP variant)
Additional Kioxia CD8-P NVMe in the 2 remaining U.2 bays for RAID or scratch storage
Upgrade storage tier to Samsung PM1743 or Kioxia CM7-V for higher endurance (3 DWPD)
24U rack cabinet + online UPS 5 kVA