Kentino s.r.o.

K-AI 576 Genoa RTXPro6000MQ 12000TOPS — 6× RTX Pro 6000 Blackwell Max-Q AI Frontier Server

Name: K-AI 576 Genoa RTXPro6000MQ 12000TOPS — 6× RTX Pro 6000 Blackwell Max-Q AI Frontier Server
Brand: Kentino s.r.o.
Price: 106069.00 EUR
Availability: InStock

€106.069,00 EUR

Udsalg Udsolgt

Levering beregnes ved betaling.

Antal

K-AI 576 Genoa RTXPro6000MQ 12000TOPS

576 GB ECC VRAM Frontier Server
6x RTX Pro 6000 Max-Q Turbofan | EPYC Genoa | 12 000 TOPS INT8

12 000

TOPS INT8

576 GB

ECC VRAM pool

Gen5

Broadcom switch

Quiet

turbofan cooling

Published external references. Not measured on Kentino hardware.

A 7U rack-mount frontier-tier inference platform with six NVIDIA RTX Pro 6000 Blackwell Max-Q turbofan cards pooled to 576 GB ECC VRAM, one AMD EPYC 9354 Genoa CPU (32C/64T), 768 GB DDR5-4800 ECC (all 12 channels populated), 4 TB NVMe boot, and 5x 1200 W server PSU. Same silicon and memory pool as the passive Server Edition build — different cooler. The Max-Q turbofan is self-contained per card, runs quieter, and tolerates less strict chassis airflow. Identical model envelope to its passive sibling.

Hardware

Component	Detail
GPUs	6x NVIDIA RTX Pro 6000 Blackwell Max-Q 96 GB ECC (turbofan blower, 600 W TDP spec, PCIe 5.0 x16, 2000 INT8 TOPS per card)
VRAM pool	576 GB total across 6 cards (no NVLink — P2P over PCIe Gen5 at ~55-60 GB/s per direction)
CPU	AMD EPYC 9354 Genoa (32C/64T, 280 W, 128x PCIe 5.0 lanes, 12-channel DDR5)
Motherboard	ASRock Rack GENOAD8X-2T/BCM (SP5 Genoa, integrated Broadcom PEX PCIe Gen5 switch, 12x DDR5, 2x 10 GbE, IPMI)
System RAM	768 GB DDR5-4800 ECC RDIMM (12x 64 GB — all channels populated, ~460 GB/s aggregate)
Boot / storage	4 TB NVMe M.2 (PCIe 4.0 x4) — sized for frontier checkpoint staging
Power supply	5x 1200 W server PSU set (HP-compatible, 6 kW total)
Chassis	7U 8-GPU rack-mount, 10 PCIe slot capacity, active Gen5 risers
Cooling	SP5 Genoa tower cooler + 8x 120 mm chassis fans. Per-GPU turbofan blowers are self-contained — datacenter airflow recommended but not strictly required. Quieter for lab environments.
Network	Onboard dual 10 GbE (Intel X550)

Power envelope

GPU draw (spec): 6 x 600 W = 3 600 W
System total at spec full load: ~4 080 W
PSU total: 6 000 W (5x 1200 W) — 32% headroom
Max-Q cards typically run 520-550 W sustained — real-world headroom above 20%

Cooling (Max-Q differentiator)

Each card pulls air front-to-back via its own blower — self-contained per card. Tolerates mixed-rack / open-cabinet deployment. Quieter than an equivalent axial-fan stack. Max-Q firmware profile favours lower sustained power (520-550 W typical in inference). Recommended: cabinet with front perforated door and clear rear exhaust path.

What you can run

Identical to the Server Edition sibling — same silicon, same 576 GB pool. DeepSeek V3 Q4 (~404 GB) with long context, Kimi-K2 Q2, Mistral Large 3 Q2-Q3, GLM-5 Q2, Qwen3-Coder-480B Q4.

LLMs — text / reasoning / coding

Chinese frontier

DeepSeek V3 / R1 / V3.1 / V3.2 at Q4_K_M (~404 GB) comfortable with long context (~5-8 tok/s single vLLM TP-6, published reference); fp8 native (~670 GB) with RAM spill
Kimi-K2 (Base / Instruct / Thinking) at Q2_K (~375 GB) comfortable (~5-8 tok/s single, published reference)
GLM-5 / GLM-5.1 (~745B/44B) at Q2_K (~260 GB); Q3 (~420 GB) with RAM spill
Qwen3-Coder-480B-A35B at Q4_K_M (~270 GB) with long context
Qwen3-235B-A22B at bf16 (~470 GB) or fp8 (~240 GB)
ERNIE-4.5-424B-A47B at Q4 (~240 GB) with 128k ctx
Intern-S1-Pro at Q2_K (~325 GB); Hunyuan-Large at Q4 (~220 GB)
MiniMax-Text-01 / M1 at Q4 (~260 GB)

Western frontier

Mistral Large 3 at Q2-Q3 (~243-317 GB) comfortable (~20-30 tok/s single, published reference)
Llama 4 Maverick at Q4_K_M (~232 GB) with long ctx (~45-55 tok/s single, published reference)
Llama-3.1-Nemotron Ultra 253B at fp8 (~253 GB)
Grok-1 314B at Q4 (~182 GB); Snowflake Arctic at Q4 (~278 GB)
DBRX Instruct 132B/36B at bf16 (~264 GB) or fp8

Vision-Language Models

Qwen3-VL-235B-A22B; InternVL3.5-241B-A28B Q4; GLM-4.5V / 4.6V 106B bf16; Llama 3.2 90B Vision bf16; Pixtral Large 124B fp8; Molmo 72B bf16.

Image generation

HunyuanImage-3.0 Instruct; FLUX.1 [dev] / [schnell] / Kontext multi-instance (~15-20 s per 1024x1024 image, published reference); SD 3.5 Large; SDXL; AuraFlow; OmniGen; HunyuanImage-2.1; Kolors 2.0.

Video generation

Wan 2.2 T2V-A14B dual-expert MoE bf16; HunyuanVideo 13B bf16; Open-Sora 2.0 (11B); Mochi-1 (10B); NVIDIA Cosmos Predict 2 up to 14B; CogVideoX-5B; LTX-Video; Pyramid Flow.

Audio / Speech / TTS

Full stack resident: Whisper v3 large, Parakeet-TDT 1.1B, Canary 1B, Moshi 7B realtime, Qwen3-Omni, Step-Audio R1, CosyVoice 3.0, Kokoro, Stable Audio Open.

Multi-model / multi-tenant serving

DeepSeek V3 Q4 + FLUX + HunyuanVideo + Whisper/Moshi realtime all resident
Concurrent 70B tensor-parallel + 235B-MoE on separate PCIe domains
3 frontier models resident for A/B evaluation

Target workloads

Frontier open-weight research lab with mixed / non-ideal airflow infra
Colocation / private-datacenter where per-card turbofan is operationally simpler than full passive airflow
Sovereign AI deployment with Apache 2.0 / MIT model stack
Enterprise multi-model RAG + agentic platform
Lab environments with open racks

Published performance references

External references | Same silicon as Server Edition | Not measured on Kentino hardware

Benchmark	Result
RTX Pro 6000 per-card INT8 TOPS	2 000 TOPS
vLLM — DeepSeek V3 Q4 on 6x RTX Pro 6000 (single)	~25-40 tok/s
vLLM — DeepSeek V3 Q4 on 6x RTX Pro 6000 (batch-32)	200-400 tok/s aggregate
FLUX.1 [dev] fp8 on single RTX Pro 6000	~15-20 s per 1024x1024 image

Exact figures confirmed at PoC stage. Kentino will publish first-party numbers after initial customer build.

Not ideal for

Kimi-K2 / DeepSeek V3 at Q4 real-speed production serving — step up to K-AI 768 TurinDual RTXPro6000MQ
Training from scratch on frontier-class models — no NVLink
Plug-and-play deployment — frontier MoE serving needs a skilled MLOps team

Warranty and lead time

2 years

parts warranty

1 year

labor warranty

10-28 days

lead time

Build includes assembly, BIOS config, driver install, burn-in, memtest, functional verification, and LLM environment setup. Lead time depends on component availability, confirmed at order.