Kentino s.r.o.

K-AI 192 Rome RTXPro6000 4000TOPS — 2× RTX Pro 6000 Blackwell Server Edition — EPYC Milan

Name: K-AI 192 Rome RTXPro6000 4000TOPS — 2× RTX Pro 6000 Blackwell Server Edition — EPYC Milan
Brand: Kentino s.r.o.
Price: 25162.00 EUR
Availability: InStock

€25.162,00 EUR

Aanbieding Uitverkocht

Belastingen inbegrepen. Verzendkosten worden berekend bij de checkout.

Aantal

K-AI 192 Rome RTXPro6000 4000TOPS

192 GB ECC Blackwell Flagship Pair
2x RTX Pro 6000 Server Edition | EPYC Milan | 4 000 TOPS INT8

4 000

INT8 TOPS

192 GB

ECC VRAM

Blackwell

fp8 native

2-card

minimal TP

Two passive RTX Pro 6000 Blackwell Server Edition cards — 96 GB ECC each. Less tensor-parallel overhead than 4- or 8-card builds. Datacenter flagship pair.

A 4U rack-mount inference server with two passive RTX Pro 6000 Blackwell Server Edition cards (96 GB ECC GDDR7 per card), one AMD EPYC 7643 Milan CPU (48C/96T), 256 GB DDR4 ECC, 2 TB NVMe boot, and a single 2 kW ATX PSU. For 70B dense bf16 and mid-size MoE, fewer big cards beat more small cards — two-card tensor parallelism has minimal communication overhead, and each 96 GB card carries a complete copy of most models.

Hardware

Component	Detail
GPUs	2x NVIDIA RTX Pro 6000 Blackwell Server Edition 96 GB ECC GDDR7 (passive, 600 W, PCIe 5.0 x16, dual-slot)
VRAM pool	192 GB ECC (96 GB x 2) — each card holds a 70B bf16 model standalone
CPU	AMD EPYC 7643 Milan (48C/96T, 225 W, 128x PCIe 4.0 lanes)
Motherboard	ASRock Rack ROMED8-2T (SP3, 7x PCIe 4.0 x16, 8x DDR4 ECC, 2x 10 GbE, IPMI)
System RAM	256 GB DDR4-2666 ECC RDIMM (4x 64 GB)
Boot / storage	2 TB NVMe M.2 (PCIe 4.0 x4)
Power supply	1x 2 kW ATX PSU
Chassis	4U rack-mount with front-to-back directed airflow
Cooling	Arctic Freezer 4U-M SP3 tower + 3x 120 mm front intake + 1x 120 mm rear exhaust
Network	Onboard dual 10 GbE (Intel X550)

Power envelope

GPU draw: 2 x 600 W = 1 200 W
System total at full load: ~1 525 W
PSU total: 2 000 W (single 2 kW) — 23.7 % headroom
Single PSU sufficient; optional dual-PSU upgrade for N+1 redundancy

Lane topology

PCIe Gen4 x16 per GPU (card is Gen5 native; Rome board caps at Gen4). Direct root-complex connection — no PCIe switch. No NVLink — inter-GPU peer-to-peer. Five x16 slots remain open for expansion. Gen4 vs Gen5 negligible for inference at this VRAM density.

What you can run

With 192 GB ECC VRAM on just two Blackwell cards with native fp8/fp4, this is the cleanest path to dense 70B at bf16 and mid-size MoE. Two independent 70B streams — one per card — or 200B MoE across both with minimal 2-way TP overhead.

LLMs — text / reasoning / coding

Chinese frontier

Qwen3 / Qwen3.5 (Alibaba): Qwen3-235B-A22B Q4 (~132 GB) comfortable with long ctx (~15-25 tok/s single-stream across 2 cards); Qwen3-Coder-480B-A35B Q2 (~160 GB); Qwen3.5-122B-A10B fp8 (~75 GB); Qwen3-32B dense bf16 with huge KV; QwQ-32B bf16
DeepSeek: DeepSeek-V3/R1 Q2 (~215 GB with small RAM spill) — Blackwell runs fp8 natively; DeepSeek-R2 32B bf16 two concurrent streams (one per card)
GLM / Z.ai: GLM-4.5 / 4.6 / 4.7 Q4 (~177 GB) — hero config at this tier; GLM-4.5-Air fp8 or bf16 with huge KV
Tencent Hunyuan: Hunyuan-Large Q3 (~160 GB) — 389B MoE with 256k ctx; Hunyuan-A13B fp8 native (~80 GB) with huge KV
Others: Baidu ERNIE-4.5-424B Q3 (~180 GB); InternVL3.5-241B-A28B Q4 (~135 GB); MiniMax-M1 Q3 (~180 GB)

Western frontier

Meta Llama: Llama 3.3 70B bf16 on one card — two independent concurrent 70B streams (~20-30 tok/s per stream); Llama 4 Scout bf16 (~218 GB, tight); Llama 4 Maverick Q3 (~188 GB)
Mistral: Mistral Large 2 / Pixtral Large / Devstral 2 123B Q6 (~88 GB) single-card or bf16 across both; Mistral Small 3 multi-stream
OpenAI (open weights): gpt-oss-120b MXFP4 native (80 GB) — fits on ONE card, two independent concurrent streams
NVIDIA Nemotron: Llama-3.1-Nemotron Ultra 253B Q4 (~147 GB); Super 49B bf16 on single card
Others: Cohere Command R+ 104B Q6 (~85 GB) on one card; Google Gemma 3 27B bf16 multiple concurrent streams

Vision-Language Models

InternVL3.5-241B-A28B Q4 (~135 GB); Qwen3-VL-235B-A22B Q4; Qwen3-VL-32B bf16 single-card; Pixtral Large 124B bf16 or Q6; Llama 3.2 90B Vision bf16 (~180 GB); Molmo 72B bf16 (~144 GB); GLM-4.6V 106B fp8; Gemma 3 27B multimodal x 2-3 concurrent streams.

Image generation

FLUX.1 [dev] bf16 multiple concurrent streams; FLUX.1 Kontext [dev]; FLUX Tools; SD 3.5 Large bf16 concurrent; HunyuanImage-2.1 bf16 (~34 GB) x 2-4 concurrent; HunyuanImage-3.0 base (80B MoE, 13B active) bf16 — fits on one card; HunyuanDiT; Kolors / Kolors 2.0; AuraFlow; OmniGen v1; PixArt-Sigma.

Video generation

Wan 2.2 MoE dual-expert bf16 full context — fits on one card, two concurrent generation streams; Wan 2.2 TI2V-5B; HunyuanVideo 13B bf16 both experts; HunyuanVideo 1.5; CogVideoX-5B bf16; Open-Sora 2.0 11B bf16; Mochi-1 bf16 (~42 GB); LTX-Video; Pyramid Flow; SVD / SV3D / SV4D; NVIDIA Cosmos Predict 2.

Audio / Speech / TTS

ASR: Whisper v3 large / turbo (~50x realtime); Parakeet-TDT; Canary 1B; Qwen3-ASR; SenseVoice
TTS: CosyVoice 2/3; Kokoro 82M; XTTS v2; Stable Audio Open; Step-Audio-EditX
Realtime / S2S: Kyutai Moshi 7B; Step-Audio 2 mini/R1; Qwen2.5-Omni-7B
Music / SFX: MusicGen / AudioGen / Bark; SeamlessM4T v2

Multi-model / multi-tenant serving

Two independent 70B streams — one per card, simplest form of tenant isolation
Dense 70B bf16 + supporting stack — LLM on card 1, image/video/audio on card 2
200B MoE across both cards — minimal tensor-parallel overhead (2-way split)
fp8-native frontier — DeepSeek V3 family, Hunyuan-Large fp8 with Blackwell native paths

Target workloads

Dense 70B bf16 inference — two cards tensor-parallel with minimal overhead, or one model per card for streaming
100-150B MoE at Q4-Q6 (GLM-4.5-Air, Qwen3.5-122B-A10B, Hunyuan-A13B, Llama 4 Scout)
FP8-native frontier inference (DeepSeek V3 family, Hunyuan, Llama 4) — Blackwell runs fp8 natively
Image + video generation studio at bf16 (Wan 2.2 T2V-A14B, HunyuanVideo 13B, FLUX.1 [dev])
Long-context document analysis (MiniMax-M1, Kimi-K2 1.58-bit UD with spill)

Measured performance

Published references | NVIDIA RTX Pro 6000 Blackwell Server Edition datasheet + community benchmarks

Benchmark	Result
Per-card INT8 TOPS (NVIDIA datasheet)	2 000 TOPS
Aggregate INT8 TOPS (2 cards)	4 000 TOPS
Memory bandwidth per card	~1 800 GB/s, 96 GB ECC GDDR7
Llama 3.3 70B bf16 per-card (community)	15-25 tok/s single-stream, 60-90 tok/s batch
Dual-card tensor-parallel 70B (community)	~30-45 tok/s single-stream expected
Blackwell fp8 native	DeepSeek-V3 fp8, Hunyuan-A13B fp8 run without bf16 upcast

Published external references, not measured on Kentino hardware. Kentino will publish first-party numbers after the first customer build.

Not ideal for

Very high concurrency multi-tenant serving — 4x L40 or 6x L4 distributes better across more cards
Heavy KV cache at very long context — step up to K-AI 384 RTXPro6000 8000TOPS
Training — Kentino does not sell H-class NVLink fabrics
Budget inference at 192 GB pool — 8x RTX 4090 is cheaper (trading ECC and passive cooling for cost)

Warranty and lead time

2 years

parts warranty

1 year

labor warranty

10-28 days

lead time

NVIDIA OEM 3-year warranty on RTX Pro 6000 Server Edition + Kentino integration warranty. Build includes assembly, BIOS configuration, driver install, burn-in testing, and functional verification. Lead time depends on component availability, confirmed at order.

Recommended add-ons

Upgrade to dual 2 kW synced PSU for N+1 redundancy
Upgrade RAM to 512 GB (4 DIMM slots open)
4 TB NVMe for large weight libraries and model staging
Expand to 4-card configuration (K-AI 384 RTXPro6000 8000TOPS) — chassis has slot capacity
24U rack cabinet + online UPS 5 kVA