Kentino s.r.o.

K-AI 384 Rome RTXPro6000 — 4× RTX Pro 6000 Blackwell Server Edition (384 GB ECC VRAM)

Name: K-AI 384 Rome RTXPro6000 — 4× RTX Pro 6000 Blackwell Server Edition (384 GB ECC VRAM)
Brand: Kentino s.r.o.
Price: 60000.00 EUR
Availability: InStock

€60.000,00 EUR

Sleva Vyprodáno

Poštovné se vypočítá na pokladně.

Množství

K-AI 384 Rome RTXPro6000 8000TOPS

384 GB ECC VRAM Datacenter Server
4x RTX Pro 6000 Server Edition | EPYC Milan | 8 000 TOPS INT8

8 000

TOPS INT8

384 GB

ECC VRAM pool

fp8

Blackwell native

Passive

datacenter cooling

Published external references. Not measured on Kentino hardware.

A 4U rack-mount inference server with four NVIDIA RTX Pro 6000 Blackwell Server Edition passive datacenter cards (96 GB ECC each) pooled to 384 GB ECC VRAM, one AMD EPYC 7643 Milan CPU (48C/96T), 384 GB DDR4-2666 ECC, 2 TB NVMe boot, and dual synchronized 2.5 kW ATX PSU. Blackwell silicon with fp8 native acceleration. Passive airflow-directed cooling for datacenter chassis. Runs DeepSeek V3 Q3, Mistral Large 3, Qwen3-Coder-480B, and every major frontier open-weight model.

Hardware

Component	Detail
GPUs	4x NVIDIA RTX Pro 6000 Blackwell Server Edition 96 GB ECC (passive datacenter cooler, 600 W TGP, PCIe 5.0 x16, 2000 INT8 TOPS/card, fp8 native)
VRAM pool	384 GB aggregate ECC across 4 cards
CPU	AMD EPYC 7643 Milan (48C/96T, 225 W, 128x PCIe 4.0 lanes)
Motherboard	ASRock Rack ROMED8-2T (SP3, 7x PCIe 4.0 x16, 8x DDR4 ECC, 2x 10 GbE, IPMI)
System RAM	384 GB DDR4-2666 ECC RDIMM (6x 64 GB — 2 DIMM slots open for upgrade to 512 GB)
Boot / storage	2 TB NVMe M.2 (PCIe 4.0 x4)
Power supply	2x 2.5 kW ATX with dual-PSU sync cable (5 kW aggregate)
Chassis	4U rack-mount
Cooling	SP3 tower cooler (Arctic Freezer 4U-M class) + front-to-back directed airflow (3x 120 mm front intake + 1x 120 mm rear exhaust). Passive GPU cards — requires datacenter chassis airflow.
Network	Onboard dual 10 GbE (Intel X550)

Power envelope

GPU draw: 4 x 600 W = 2 400 W
System total under full load: ~2 775 W
PSU total: 5 000 W (dual 2.5 kW synced) — 44.5% headroom
Dual PSU for split power delivery — single PSU failure = loss of 2 GPUs or 2 GPUs + motherboard

Lane topology

ROMED8-2T exposes 7x PCIe 4.0 x16 direct from EPYC Milan. Four slots populated — three free for NIC / storage / telemetry. RTX Pro 6000 is Gen5-capable silicon; runs Gen4 at full x16 on this platform — no bandwidth bottleneck for inference. No PCIe switch. No NVLink.

What you can run

With 384 GB of pooled ECC VRAM on Blackwell fp8 native silicon, this server runs DeepSeek V3 / R1 at Q3 comfortably on-card, Mistral Large 3 Q3, GLM-5 Q3, Qwen3-Coder-480B Q3, and Llama 3.3 70B bf16 resident on a single card (96 GB/card).

LLMs — text / reasoning / coding

Chinese frontier

DeepSeek V3 / V3-0324 / V3.1 / V3.2 / R1 / R1-0528 Q3 (~290 GB) comfortably on-card (~30-40 tok/s single, published reference); fp8 native (~670 GB) with RAM spill
Qwen3-Coder-480B-A35B Q3 (~350 GB tight with RAM spill) — SOTA open coding agent (~18-25 tok/s single, published reference)
Qwen3-235B-A22B Q6/Q8 (~200-280 GB) with very long ctx and multi-user batching
GLM-5 / GLM-5.1 Q3 (~317 GB) — Chinese frontier, close to Claude Opus 4.6 on coding
Kimi-K2 1.58-bit UD (~240 GB) — trillion-param agent at real throughput
Hunyuan-Large 389B/52B Q4 (~220 GB), fp8 native (~390 GB spill)
ERNIE-4.5-424B-A47B Q4 (~240 GB); MiniMax-M1 Q4 (~260 GB) 1M-ctx
Llama 3.3 70B bf16 resident on a single card (96 GB/card — no tensor-parallel needed)

Western frontier

Mistral Large 3 (675B/41B MoE, Apache 2.0) Q3 (~317 GB) — frontier Western open weights (~20-30 tok/s single, published reference)
Llama 4 Maverick (400B/17B) Q4 (~232 GB) with generous KV budget (~45-55 tok/s single, published reference)
Llama-3.1-Nemotron Ultra 253B Q4-Q6 (~119-207 GB)
gpt-oss-120b MXFP4 native (80 GB) with massive concurrent fleet headroom
Pixtral Large / Mistral Large 2 bf16 (~248 GB); Devstral 2 123B bf16 — 256k top open coding
Llama 3.3 70B bf16 on a single card; 4x concurrent 70B deployments possible

Vision-Language Models

Qwen3-VL-235B-A22B bf16 (~240 GB); InternVL3.5-241B-A28B Q4 (~135 GB); Llama 3.2 90B Vision bf16; Pixtral Large 124B bf16 (~248 GB); Qwen3-Omni-30B-A3B; Molmo 72B; ERNIE-4.5-VL; GLM-4.6V 106B bf16 on TP. Blackwell fp8 delivers ~2x throughput on vision-tower inference vs Ada.

Image generation

FLUX.1 [dev] / Kontext / Tools at fp8 native (~15-20 s per 1024x1024 image on single RTX Pro 6000, published reference); SD 3.5 Large; HunyuanImage-2.1 (17B native 2K); HunyuanImage-3.0 80B/13B MoE; AuraFlow; OmniGen; 4x concurrent ComfyUI workers.

Video generation

Wan 2.2 T2V-A14B / I2V-A14B dual expert bf16; HunyuanVideo 13B bf16 both experts; Open-Sora 2.0 (11B) bf16; CogVideoX-5B; Mochi-1; LTX-Video; Pyramid Flow; SVD / SV3D / SV4D; NVIDIA Cosmos Predict 2.

Audio / Speech / TTS

ASR: Whisper v3 large / turbo; Parakeet-TDT 1.1B; Canary 1B; Qwen3-ASR; SenseVoice
TTS: CosyVoice 2/3; Kokoro; Stable Audio Open; XTTS v2; Step-Audio-EditX
Realtime / S2S: Kyutai Moshi; Step-Audio 2 mini / R1; Qwen2.5-Omni-7B
Music / SFX: MusicGen / AudioGen / Bark / SeamlessM4T

Multi-model / multi-tenant serving

DeepSeek V3 Q3 + concurrent 70B + FLUX.1 + Whisper all resident
4-way tensor-parallel on 350-400B class at Q4
Per-card tenant isolation — one 96 GB Llama 3.3 70B bf16 per card, 4 independent inference silos
Multi-model RAG: reader + reranker + vision + embedder all on one host

Target workloads

Frontier open-weight inference backend — DeepSeek V3 Q3, Qwen3-Coder-480B Q3, GLM-5 Q3
Production serving of Llama 4 Maverick Q4 multimodal agents with generous context budget
4-tenant per-card isolation — one Llama 3.3 70B bf16 per tenant, zero cross-contamination
fp8-native DeepSeek / R1 / Hunyuan serving on Blackwell silicon
Mistral Large 3 Q3 as Western Apache-2.0 frontier open-weight alternative

Published performance references

External references | Not measured on Kentino hardware

Benchmark	Result
RTX Pro 6000 per-card INT8 TOPS	2 000 TOPS
RTX Pro 6000 memory bandwidth	~1 800 GB/s per card
vLLM — DeepSeek V3 Q3 on 4x Blackwell PCIe (single)	~30-40 tok/s
vLLM — DeepSeek V3 Q3 on 4x Blackwell PCIe (batch-8)	~200 tok/s aggregate
SGLang — Llama 4 Maverick Q4 on 4x Blackwell (single)	~45-55 tok/s
llama.cpp — Qwen3-Coder-480B Q3 on 4x Blackwell (single)	~18-25 tok/s
FLUX.1 [dev] fp8 on single RTX Pro 6000	~1.8 s per 1024x1024 image

Kentino will publish first-party numbers after initial customer build.

Not ideal for

Single-user workloads up to 70B — 4x RTX 5090 is materially cheaper for a 128 GB pool if ECC and passive reliability are not required
Silent lab / office-adjacent deployment — passive cooler requires proper datacenter front-to-back airflow. For acoustic-sensitive sites choose the Max-Q turbofan variant (K-AI 384 Rome RTXPro6000MQ)
Frontier training from scratch (no NVLink)
Full DeepSeek V3 Q4 on-card (~404 GB) — upgrade to 6x RTX Pro 6000 / 576 GB

Warranty and lead time

3 years

NVIDIA OEM GPU warranty

2 years

parts warranty

1 year

labor warranty

10-28 days

lead time

Build includes assembly, BIOS configuration, driver install, burn-in, memtest, and functional verification. Lead time depends on component availability, confirmed at order.

Recommended add-ons

Upgrade RAM to 512 GB DDR4 (add 2x 64 GB — 2 DIMM slots open) for RAM-spill headroom on Q3 frontier quants
4 TB NVMe Gen4 x4 for frontier-model library (DeepSeek V3 Q3 alone is ~290 GB on disk)
Full 24U rack cabinet with managed PDU + online UPS
Alternative silhouette: Max-Q turbofan variant (K-AI 384 Rome RTXPro6000MQ) — same silicon, quieter blower cooler, for lab deployments