商品情報にスキップ
1 7

Kentino s.r.o.

K-AI 576 Genoa RTXPro6000 12000TOPS — 6× RTX Pro 6000 Blackwell Server Edition AI Frontier Server

K-AI 576 Genoa RTXPro6000 12000TOPS — 6× RTX Pro 6000 Blackwell Server Edition AI Frontier Server

通常価格 €106.069,00 EUR
通常価格 セール価格 €106.069,00 EUR
セール 売り切れ
税込。 配送料はチェックアウト時に計算されます。

K-AI 576 Genoa RTXPro6000 12000TOPS

576 GB ECC VRAM Frontier Research Server
6x RTX Pro 6000 Server Edition | EPYC Genoa | 12 000 TOPS INT8

12 000
TOPS INT8
576 GB
ECC VRAM pool
BCM
PCIe Gen5 switch
Frontier
on-prem research

Published external references. Not measured on Kentino hardware.

A 7U rack-mount frontier-tier inference platform with six NVIDIA RTX Pro 6000 Blackwell Server Edition passive cards pooled to 576 GB ECC VRAM, one AMD EPYC 9354 Genoa CPU (32C/64T), 768 GB DDR5-4800 ECC (all 12 channels populated), 4 TB NVMe boot, and 5x 1200 W server PSU. On-board Broadcom PCIe Gen5 switch fans out uniformly to all 6 GPU slots. DeepSeek V3 Q4 (~404 GB) comfortable with long context, Kimi-K2 Q2, Mistral Large 3 Q2-Q3 — the full frontier on-prem.

Hardware

Component Detail
GPUs 6x NVIDIA RTX Pro 6000 Blackwell Server Edition 96 GB ECC (passive, 600 W, PCIe 5.0 x16, 2000 INT8 TOPS per card)
VRAM pool 576 GB total across 6 cards (no NVLink — P2P over PCIe Gen5 at ~55-60 GB/s per direction)
CPU AMD EPYC 9354 Genoa (32C/64T, 280 W, 128x PCIe 5.0 lanes, 12-channel DDR5)
Motherboard ASRock Rack GENOAD8X-2T/BCM (SP5 Genoa, integrated Broadcom PEX PCIe Gen5 switch, 12x DDR5, 2x 10 GbE, IPMI)
System RAM 768 GB DDR5-4800 ECC RDIMM (12x 64 GB — all channels populated, ~460 GB/s aggregate)
Boot / storage 4 TB NVMe M.2 (PCIe 4.0 x4) — sized for frontier checkpoint staging
Power supply 5x 1200 W server PSU set (HP-compatible, 6 kW total)
Chassis 7U 8-GPU rack-mount, 10 PCIe slot capacity, active Gen5 risers
Cooling SP5 Genoa tower cooler, 8x 120 mm chassis fans, front-to-back datacenter airflow required. Passive GPU cards.
Network Onboard dual 10 GbE (Intel X550)

Power envelope

  • GPU draw: 6 x 600 W = 3 600 W
  • System total at full load: ~4 080 W
  • PSU total: 6 000 W (5x 1200 W) — 32% headroom
  • No power-cap required for steady-state inference

Lane topology

GENOAD8X-2T/BCM integrates a Broadcom PEX PCIe Gen5 switch on-board. 128 Gen5 lanes from the EPYC Genoa root upstream the switch, which fans out uniformly to all 6 GPU slots at Gen5 x16 end-to-end via active risers. Clean single-root topology — simpler NUMA tuning than dual-socket. No NVLink; P2P at ~55-60 GB/s per direction.

What you can run

With 576 GB of pooled ECC VRAM on Blackwell fp8 native silicon, this server runs the full Chinese + Western open-weight frontier at research-grade quants: DeepSeek V3 Q4 (~404 GB) with long context, Kimi-K2 Q2, Mistral Large 3 Q2-Q3, GLM-5 Q2, Qwen3-Coder-480B Q4.

LLMs — text / reasoning / coding

Chinese frontier

  • DeepSeek V3 / R1 / V3.1 / V3.2 at Q4_K_M (~404 GB) comfortable with long context (~5-8 tok/s single vLLM TP-6, published reference); fp8 native (~670 GB) with RAM spill
  • Kimi-K2 (Base / Instruct / Thinking) at Q2_K (~375 GB) comfortable (~5-8 tok/s single, published reference)
  • GLM-5 / GLM-5.1 (~745B/44B) at Q2_K (~260 GB) comfortable; Q3 (~420 GB) with RAM spill
  • Qwen3-Coder-480B-A35B at Q4_K_M (~270 GB) with long context
  • Qwen3-235B-A22B at bf16 (~470 GB) or fp8 (~240 GB)
  • ERNIE-4.5-424B-A47B at Q4 (~240 GB) with full 128k ctx
  • Intern-S1-Pro (1T/22B active, SAGE) at Q2_K (~325 GB) comfortable
  • Hunyuan-Large A52B at Q4 (~220 GB); MiniMax-M1 at Q4 (~260 GB)

Western frontier

  • Mistral Large 3 (675B/41B MoE, Apache 2.0) at Q2-Q3 (~243-317 GB) comfortable (~20-30 tok/s single, published reference)
  • Llama 4 Maverick (400B/17B) at Q4_K_M (~232 GB) with long ctx (~45-55 tok/s single, published reference)
  • Llama-3.1-Nemotron Ultra 253B at fp8 (~253 GB) or bf16 with RAM spill
  • Grok-1 314B at Q4 (~182 GB); Snowflake Arctic at Q4 (~278 GB)
  • DBRX Instruct 132B/36B at bf16 (~264 GB) or fp8 multi-instance
  • All 70-120B class models at bf16 with room to spare

Vision-Language Models

Qwen3-VL-235B-A22B flagship VLM; InternVL3.5-241B-A28B Q4 (~135 GB); GLM-4.5V / 4.6V 106B bf16 (~210 GB); Llama 3.2 90B Vision bf16; Pixtral Large 124B fp8; Molmo 72B bf16.

Image generation

HunyuanImage-3.0 Instruct tier (3x 80 GB) — fits with headroom; FLUX.1 [dev] / [schnell] / Kontext multi-instance (~15-20 s per 1024x1024 image on single RTX Pro 6000 fp8, published reference); SD 3.5 Large; SDXL; AuraFlow; OmniGen; HunyuanImage-2.1; Kolors 2.0.

Video generation

Wan 2.2 T2V-A14B / I2V-A14B dual-expert MoE bf16 (~54 GB); HunyuanVideo 13B bf16 comfortable; Open-Sora 2.0 (11B) bf16; Mochi-1 (10B) fp16; NVIDIA Cosmos Predict 2 up to 14B; CogVideoX-5B; LTX-Video; Pyramid Flow.

Audio / Speech / TTS

Full stack resident concurrently: Whisper v3 large, Parakeet-TDT 1.1B, Canary 1B, Moshi 7B realtime, Qwen3-Omni, Step-Audio R1, CosyVoice 3.0, Kokoro, Stable Audio Open.

Multi-model / multi-tenant serving

  • DeepSeek V3 Q4 inference + FLUX image + HunyuanVideo + Whisper/Moshi realtime voice all resident simultaneously
  • Concurrent 70B tensor-parallel + 235B-MoE on separate PCIe domains via the Broadcom switch
  • Research A/B evaluation: 3 frontier open-weight models resident concurrently

Target workloads

  • Frontier open-weight research lab — on-prem access to DeepSeek V3 / Kimi-K2 / Mistral Large 3 class without cloud egress
  • Sovereign AI deployment — EU data residency with an Apache 2.0 / MIT model stack
  • Enterprise multi-model RAG + agentic platform — several 200-400B MoE models resident
  • Model evaluation / safety research comparing frontier Chinese vs Western open weights
  • Inference-at-scale for regulated industries requiring air-gap + ECC + PCIe Gen5

Published performance references

External references | Not measured on Kentino hardware

Benchmark Result
RTX Pro 6000 per-card INT8 TOPS 2 000 TOPS
vLLM — DeepSeek V3 Q4 on 6x RTX Pro 6000 (single) ~25-40 tok/s
vLLM — DeepSeek V3 Q4 on 6x RTX Pro 6000 (batch-32) 200-400 tok/s aggregate
FLUX.1 [dev] fp8 on single RTX Pro 6000 ~15-20 s per 1024x1024 image

Exact figures confirmed at PoC stage. Kentino will publish first-party numbers after initial customer build.

Not ideal for

  • Kimi-K2 / DeepSeek V3 at Q4 real-speed production serving — step up to the 768 GB Turin dual
  • Training from scratch on frontier-class models — no NVLink, PCIe P2P only
  • Plug-and-play deployment — frontier MoE serving needs a skilled MLOps team

Warranty and lead time

2 years
parts warranty
1 year
labor warranty
10-28 days
lead time

Build includes assembly, BIOS config, driver install, burn-in, memtest, functional verification, and LLM environment setup (vLLM / SGLang / llama.cpp / CUDA 13 stack with fp8 Blackwell kernels). Lead time depends on component availability, confirmed at order.

Recommended add-ons

  • NVIDIA ConnectX-5 MCX555A-ECAT 100 GbE NIC for multi-node scale-out
  • Second 4 TB NVMe for dataset / model library
  • Full 24U rack cabinet with front perforated door
  • Online UPS 10 kVA
  • Managed PDU
詳細を表示する