Kentino s.r.o.

K-AI 576 Genoa RTXPro6000 12000TOPS — 6× RTX Pro 6000 Blackwell Server Edition AI Frontier Server

Name: K-AI 576 Genoa RTXPro6000 12000TOPS — 6× RTX Pro 6000 Blackwell Server Edition AI Frontier Server
Brand: Kentino s.r.o.
Price: 106069.00 EUR
Availability: InStock

€106.069,00 EUR

Zľava Vypredané

Doprava sa vypočíta pri platbe.

Množstvo

K-AI 576 Genoa RTXPro6000 12000TOPS

576 GB ECC VRAM Frontier Research Server
6x RTX Pro 6000 Server Edition | EPYC Genoa | 12 000 TOPS INT8

12 000

TOPS INT8

576 GB

ECC VRAM pool

BCM

PCIe Gen5 switch

Frontier

on-prem research

Published external references. Not measured on Kentino hardware.

A 7U rack-mount frontier-tier inference platform with six NVIDIA RTX Pro 6000 Blackwell Server Edition passive cards pooled to 576 GB ECC VRAM, one AMD EPYC 9354 Genoa CPU (32C/64T), 768 GB DDR5-4800 ECC (all 12 channels populated), 4 TB NVMe boot, and 5x 1200 W server PSU. On-board Broadcom PCIe Gen5 switch fans out uniformly to all 6 GPU slots. DeepSeek V3 Q4 (~404 GB) comfortable with long context, Kimi-K2 Q2, Mistral Large 3 Q2-Q3 — the full frontier on-prem.

Hardware

Component	Detail
GPUs	6x NVIDIA RTX Pro 6000 Blackwell Server Edition 96 GB ECC (passive, 600 W, PCIe 5.0 x16, 2000 INT8 TOPS per card)
VRAM pool	576 GB total across 6 cards (no NVLink — P2P over PCIe Gen5 at ~55-60 GB/s per direction)
CPU	AMD EPYC 9354 Genoa (32C/64T, 280 W, 128x PCIe 5.0 lanes, 12-channel DDR5)
Motherboard	ASRock Rack GENOAD8X-2T/BCM (SP5 Genoa, integrated Broadcom PEX PCIe Gen5 switch, 12x DDR5, 2x 10 GbE, IPMI)
System RAM	768 GB DDR5-4800 ECC RDIMM (12x 64 GB — all channels populated, ~460 GB/s aggregate)
Boot / storage	4 TB NVMe M.2 (PCIe 4.0 x4) — sized for frontier checkpoint staging
Power supply	5x 1200 W server PSU set (HP-compatible, 6 kW total)
Chassis	7U 8-GPU rack-mount, 10 PCIe slot capacity, active Gen5 risers
Cooling	SP5 Genoa tower cooler, 8x 120 mm chassis fans, front-to-back datacenter airflow required. Passive GPU cards.
Network	Onboard dual 10 GbE (Intel X550)

Power envelope

GPU draw: 6 x 600 W = 3 600 W
System total at full load: ~4 080 W
PSU total: 6 000 W (5x 1200 W) — 32% headroom
No power-cap required for steady-state inference

Lane topology

GENOAD8X-2T/BCM integrates a Broadcom PEX PCIe Gen5 switch on-board. 128 Gen5 lanes from the EPYC Genoa root upstream the switch, which fans out uniformly to all 6 GPU slots at Gen5 x16 end-to-end via active risers. Clean single-root topology — simpler NUMA tuning than dual-socket. No NVLink; P2P at ~55-60 GB/s per direction.

What you can run

With 576 GB of pooled ECC VRAM on Blackwell fp8 native silicon, this server runs the full Chinese + Western open-weight frontier at research-grade quants: DeepSeek V3 Q4 (~404 GB) with long context, Kimi-K2 Q2, Mistral Large 3 Q2-Q3, GLM-5 Q2, Qwen3-Coder-480B Q4.

LLMs — text / reasoning / coding

Chinese frontier

DeepSeek V3 / R1 / V3.1 / V3.2 at Q4_K_M (~404 GB) comfortable with long context (~5-8 tok/s single vLLM TP-6, published reference); fp8 native (~670 GB) with RAM spill
Kimi-K2 (Base / Instruct / Thinking) at Q2_K (~375 GB) comfortable (~5-8 tok/s single, published reference)
GLM-5 / GLM-5.1 (~745B/44B) at Q2_K (~260 GB) comfortable; Q3 (~420 GB) with RAM spill
Qwen3-Coder-480B-A35B at Q4_K_M (~270 GB) with long context
Qwen3-235B-A22B at bf16 (~470 GB) or fp8 (~240 GB)
ERNIE-4.5-424B-A47B at Q4 (~240 GB) with full 128k ctx
Intern-S1-Pro (1T/22B active, SAGE) at Q2_K (~325 GB) comfortable
Hunyuan-Large A52B at Q4 (~220 GB); MiniMax-M1 at Q4 (~260 GB)

Western frontier

Mistral Large 3 (675B/41B MoE, Apache 2.0) at Q2-Q3 (~243-317 GB) comfortable (~20-30 tok/s single, published reference)
Llama 4 Maverick (400B/17B) at Q4_K_M (~232 GB) with long ctx (~45-55 tok/s single, published reference)
Llama-3.1-Nemotron Ultra 253B at fp8 (~253 GB) or bf16 with RAM spill
Grok-1 314B at Q4 (~182 GB); Snowflake Arctic at Q4 (~278 GB)
DBRX Instruct 132B/36B at bf16 (~264 GB) or fp8 multi-instance
All 70-120B class models at bf16 with room to spare

Vision-Language Models

Qwen3-VL-235B-A22B flagship VLM; InternVL3.5-241B-A28B Q4 (~135 GB); GLM-4.5V / 4.6V 106B bf16 (~210 GB); Llama 3.2 90B Vision bf16; Pixtral Large 124B fp8; Molmo 72B bf16.

Image generation

HunyuanImage-3.0 Instruct tier (3x 80 GB) — fits with headroom; FLUX.1 [dev] / [schnell] / Kontext multi-instance (~15-20 s per 1024x1024 image on single RTX Pro 6000 fp8, published reference); SD 3.5 Large; SDXL; AuraFlow; OmniGen; HunyuanImage-2.1; Kolors 2.0.

Video generation

Wan 2.2 T2V-A14B / I2V-A14B dual-expert MoE bf16 (~54 GB); HunyuanVideo 13B bf16 comfortable; Open-Sora 2.0 (11B) bf16; Mochi-1 (10B) fp16; NVIDIA Cosmos Predict 2 up to 14B; CogVideoX-5B; LTX-Video; Pyramid Flow.

Audio / Speech / TTS

Full stack resident concurrently: Whisper v3 large, Parakeet-TDT 1.1B, Canary 1B, Moshi 7B realtime, Qwen3-Omni, Step-Audio R1, CosyVoice 3.0, Kokoro, Stable Audio Open.

Multi-model / multi-tenant serving

DeepSeek V3 Q4 inference + FLUX image + HunyuanVideo + Whisper/Moshi realtime voice all resident simultaneously
Concurrent 70B tensor-parallel + 235B-MoE on separate PCIe domains via the Broadcom switch
Research A/B evaluation: 3 frontier open-weight models resident concurrently

Target workloads

Frontier open-weight research lab — on-prem access to DeepSeek V3 / Kimi-K2 / Mistral Large 3 class without cloud egress
Sovereign AI deployment — EU data residency with an Apache 2.0 / MIT model stack
Enterprise multi-model RAG + agentic platform — several 200-400B MoE models resident
Model evaluation / safety research comparing frontier Chinese vs Western open weights
Inference-at-scale for regulated industries requiring air-gap + ECC + PCIe Gen5

Published performance references

External references | Not measured on Kentino hardware

Benchmark	Result
RTX Pro 6000 per-card INT8 TOPS	2 000 TOPS
vLLM — DeepSeek V3 Q4 on 6x RTX Pro 6000 (single)	~25-40 tok/s
vLLM — DeepSeek V3 Q4 on 6x RTX Pro 6000 (batch-32)	200-400 tok/s aggregate
FLUX.1 [dev] fp8 on single RTX Pro 6000	~15-20 s per 1024x1024 image

Exact figures confirmed at PoC stage. Kentino will publish first-party numbers after initial customer build.

Not ideal for

Kimi-K2 / DeepSeek V3 at Q4 real-speed production serving — step up to the 768 GB Turin dual
Training from scratch on frontier-class models — no NVLink, PCIe P2P only
Plug-and-play deployment — frontier MoE serving needs a skilled MLOps team

Warranty and lead time

2 years

parts warranty

1 year

labor warranty

10-28 days

lead time

Build includes assembly, BIOS config, driver install, burn-in, memtest, functional verification, and LLM environment setup (vLLM / SGLang / llama.cpp / CUDA 13 stack with fp8 Blackwell kernels). Lead time depends on component availability, confirmed at order.