Перейти к информации о продукте
1 из 7

Kentino s.r.o.

K-AI 384 Rome RTXPro6000MQ — 4× RTX Pro 6000 Blackwell Max-Q Turbofan (384 GB ECC VRAM)

K-AI 384 Rome RTXPro6000MQ — 4× RTX Pro 6000 Blackwell Max-Q Turbofan (384 GB ECC VRAM)

Обычная цена €46.583,00 EUR
Обычная цена Цена со скидкой €46.583,00 EUR
Распродажа Продано
Налоги включены. Стоимость доставки рассчитывается при оформлении заказа.

K-AI 384 Rome RTXPro6000MQ 8000TOPS

384 GB ECC VRAM Lab Server
4x RTX Pro 6000 Max-Q Turbofan | EPYC Milan | 8 000 TOPS INT8

8 000
TOPS INT8
384 GB
ECC VRAM pool
fp8
Blackwell native
Quiet
turbofan cooling

Published external references. Not measured on Kentino hardware.

A 4U rack-mount inference server with four NVIDIA RTX Pro 6000 Blackwell Max-Q turbofan (blower) cards (96 GB ECC each) pooled to 384 GB ECC VRAM, one AMD EPYC 7643 Milan CPU (48C/96T), 384 GB DDR4-2666 ECC, 2 TB NVMe boot, and dual synchronized 2.5 kW ATX PSU. Same Blackwell silicon as the Server Edition — identical inference envelope, identical throughput — with a quieter blower cooler suited to lab, R&D, and office-adjacent environments.

Hardware

Component Detail
GPUs 4x NVIDIA RTX Pro 6000 Blackwell Max-Q 96 GB ECC (turbofan / blower cooler, 600 W TGP, PCIe 5.0 x16, 2000 INT8 TOPS/card, fp8 native)
VRAM pool 384 GB aggregate ECC across 4 cards
CPU AMD EPYC 7643 Milan (48C/96T, 225 W, 128x PCIe 4.0 lanes)
Motherboard ASRock Rack ROMED8-2T (SP3, 7x PCIe 4.0 x16, 8x DDR4 ECC, 2x 10 GbE, IPMI)
System RAM 384 GB DDR4-2666 ECC RDIMM (6x 64 GB — 2 DIMM slots open for upgrade to 512 GB)
Boot / storage 2 TB NVMe M.2 (PCIe 4.0 x4)
Power supply 2x 2.5 kW ATX with dual-PSU sync cable (5 kW aggregate)
Chassis 4U rack-mount
Cooling SP3 tower cooler (Arctic Freezer 4U-M class) + front-to-back directed airflow (3x 120 mm front intake + 1x 120 mm rear exhaust). GPU cards self-cooled via turbofan blower (rear exhaust) — quieter for lab environments.
Network Onboard dual 10 GbE (Intel X550)

Power envelope

  • GPU draw: 4 x 600 W = 2 400 W
  • System total under full load: ~2 775 W
  • PSU total: 5 000 W (dual 2.5 kW synced) — 44.5% headroom
  • Dual PSU for split power delivery — single PSU failure = loss of 2 GPUs or 2 GPUs + motherboard

Thermal profile (Max-Q)

Max-Q uses a turbofan (blower) cooler with directional rear-of-card exhaust. Expected GPU hotspot 72-80 C under continuous load. Materially quieter than passive cards in a high-static-pressure chassis. Better suited to non-datacenter airflow, open-rack, or lab / office-adjacent placement. Silicon, TDP, ECC, and performance are identical to the Server Edition.

What you can run

Identical to the Server Edition (K-AI 384 Rome RTXPro6000) — same Blackwell silicon, same 384 GB ECC pool, same fp8 native, same model compatibility. The difference is acoustic, not computational.

LLMs — text / reasoning / coding

Chinese frontier

  • DeepSeek V3 / V3-0324 / V3.1 / V3.2 / R1 / R1-0528 Q3 (~290 GB) comfortably on-card (~30-40 tok/s single, published reference); fp8 native (~670 GB) with RAM spill
  • Qwen3-Coder-480B-A35B Q3 (~350 GB tight with RAM spill) — SOTA open coding agent (~18-25 tok/s single, published reference)
  • Qwen3-235B-A22B Q6/Q8 (~200-280 GB) with long ctx and multi-user batching
  • GLM-5 / GLM-5.1 Q3 (~317 GB) — Chinese frontier, close to Claude Opus 4.6 on coding
  • Kimi-K2 1.58-bit UD (~240 GB) — trillion-param agent at real throughput
  • Hunyuan-Large 389B/52B Q4 (~220 GB), fp8 native (~390 GB spill)
  • ERNIE-4.5-424B-A47B Q4 (~240 GB); MiniMax-M1 Q4 (~260 GB) 1M-ctx
  • Llama 3.3 70B bf16 resident on a single card (96 GB/card)

Western frontier

  • Mistral Large 3 (675B/41B MoE, Apache 2.0) Q3 (~317 GB) — frontier Western open weights (~20-30 tok/s single, published reference)
  • Llama 4 Maverick (400B/17B) Q4 (~232 GB) with generous KV budget (~45-55 tok/s single, published reference)
  • Llama-3.1-Nemotron Ultra 253B Q4-Q6 (~119-207 GB)
  • gpt-oss-120b MXFP4 native (80 GB) with concurrent fleet headroom
  • Pixtral Large / Mistral Large 2 bf16 (~248 GB); Devstral 2 123B bf16 — 256k top open coding
  • Llama 3.3 70B bf16 on a single card; 4x concurrent 70B deployments possible

Vision-Language Models

Qwen3-VL-235B-A22B bf16 (~240 GB); InternVL3.5-241B-A28B Q4 (~135 GB); Llama 3.2 90B Vision bf16; Pixtral Large 124B bf16; Qwen3-Omni-30B-A3B; Molmo 72B; ERNIE-4.5-VL; GLM-4.6V 106B bf16 on TP. Blackwell fp8 delivers ~2x throughput on vision-tower inference vs Ada.

Image generation

FLUX.1 [dev] / Kontext / Tools at fp8 native (~15-20 s per 1024x1024 image on single RTX Pro 6000, published reference); SD 3.5 Large; HunyuanImage-2.1 (17B native 2K); HunyuanImage-3.0 80B/13B MoE; AuraFlow; OmniGen; 4x concurrent ComfyUI workers.

Video generation

Wan 2.2 T2V-A14B / I2V-A14B dual-expert bf16; HunyuanVideo 13B bf16 both experts; Open-Sora 2.0 (11B) bf16; CogVideoX-5B; Mochi-1; LTX-Video; Pyramid Flow; SVD / SV3D / SV4D; NVIDIA Cosmos Predict 2.

Audio / Speech / TTS

  • ASR: Whisper v3 large / turbo; Parakeet-TDT; Canary; Qwen3-ASR; SenseVoice
  • TTS: CosyVoice 2/3; Kokoro; Stable Audio Open; XTTS v2; Step-Audio-EditX
  • Realtime / S2S: Kyutai Moshi; Step-Audio 2 mini / R1; Qwen2.5-Omni-7B
  • Music / SFX: MusicGen / AudioGen / Bark / SeamlessM4T

Multi-model / multi-tenant serving

  • DeepSeek V3 Q3 + concurrent 70B + FLUX.1 + Whisper all resident
  • 4-way tensor-parallel on 350-400B class at Q4
  • Per-card tenant isolation — one 96 GB Llama 3.3 70B bf16 per card, 4 independent inference silos
  • Multi-model RAG: reader + reranker + vision + embedder all on one host

Target workloads

  • Frontier open-weight inference for a lab / R&D team where acoustic budget matters
  • Small-team server room without dedicated datacenter airflow — Max-Q self-cooling tolerates open-rack placement
  • Office-adjacent AI workstation for a specialist team (ML research, agentic tools)
  • fp8-native serving (DeepSeek / R1 / Hunyuan) in lab settings
  • 4-tenant per-card isolation workload with noise budget

Published performance references

External references | Same silicon as Server Edition | Not measured on Kentino hardware

Benchmark Result
RTX Pro 6000 per-card INT8 TOPS 2 000 TOPS
RTX Pro 6000 memory bandwidth ~1 800 GB/s per card
vLLM — DeepSeek V3 Q3 on 4x Blackwell PCIe (single) ~30-40 tok/s
vLLM — DeepSeek V3 Q3 on 4x Blackwell PCIe (batch-8) ~200 tok/s aggregate
SGLang — Llama 4 Maverick Q4 on 4x Blackwell (single) ~45-55 tok/s
llama.cpp — Qwen3-Coder-480B Q3 on 4x Blackwell (single) ~18-25 tok/s
FLUX.1 [dev] fp8 on single RTX Pro 6000 ~1.8 s per 1024x1024 image

Kentino will publish first-party numbers after initial customer build.

Not ideal for

  • Proper datacenter rack deployments with established hot-aisle airflow — choose the passive Server Edition (K-AI 384 Rome RTXPro6000) instead: same silicon, simpler mechanically
  • Single-user workloads up to 70B (4x RTX 5090 is materially cheaper for 128 GB pool)
  • Frontier training from scratch (no NVLink)
  • Full DeepSeek V3 Q4 on-card (~404 GB) — upgrade to 6x RTX Pro 6000 / 576 GB

Warranty and lead time

3 years
NVIDIA OEM GPU warranty
2 years
parts warranty
1 year
labor warranty
10-28 days
lead time

Build includes assembly, BIOS configuration, driver install, burn-in, memtest, and functional verification. Lead time depends on component availability, confirmed at order.

Recommended add-ons

  • Upgrade RAM to 512 GB DDR4 (add 2x 64 GB — 2 DIMM slots open) for RAM-spill headroom on Q3 frontier quants
  • 4 TB NVMe Gen4 x4 for frontier-model library (DeepSeek V3 Q3 alone is ~290 GB on disk)
  • Full 24U rack cabinet with managed PDU + online UPS
  • Alternative silhouette: passive Server Edition (K-AI 384 Rome RTXPro6000) — same silicon, for datacenter airflow deployments
Просмотреть всю информацию