Μετάβαση στις πληροφορίες προϊόντος
1 από 7

Kentino s.r.o.

K-AI 192 Turin2U RTXPro6000 4000TOPS — 2× RTX Pro 6000 Blackwell Server Edition — 2U Turin SP5

K-AI 192 Turin2U RTXPro6000 4000TOPS — 2× RTX Pro 6000 Blackwell Server Edition — 2U Turin SP5

Κανονική τιμή €56.600,00 EUR
Κανονική τιμή Τιμή έκπτωσης €56.600,00 EUR
Έκπτωση Εξαντλήθηκε
Οι φόροι συμπεριλαμβάνονται. Τα έξοδα αποστολής υπολογίζονται κατά την ολοκλήρωση της αγοράς.

K-AI 192 Turin2U RTXPro6000 4000TOPS

192 GB ECC Blackwell Flagship Pair
2x RTX Pro 6000 Server Edition | EPYC Turin SP5 | 4 000 TOPS INT8

4 000
INT8 TOPS
192 GB
ECC VRAM
Blackwell
fp8 native
2-card
minimal TP

Two passive RTX Pro 6000 Blackwell Server Edition cards -- 96 GB ECC each. Less tensor-parallel overhead than 4- or 8-card builds. Datacenter flagship pair on a Gen5/DDR5 2U platform with genuine 1+1 redundant power.

A 2U rack-mount inference server with two passive RTX Pro 6000 Blackwell Server Edition cards (96 GB ECC GDDR7 per card), one AMD EPYC 9335 Turin CPU (32C/64T, 3.0/4.4 GHz), 512 GB DDR5-4800 ECC, 5.76 TB datacenter Gen5 NVMe, and a 1+1 redundant 2.7 kW 80+ Platinum CRPS power supply. Starting from €56 600 ex VAT. For 70B dense bf16 and mid-size MoE, fewer big cards beat more small cards -- two-card tensor parallelism has minimal communication overhead, and each 96 GB card carries a complete copy of most models.

The same 192 GB Blackwell pair as our 4U Rome build, in a 2U rack-dense ASRock chassis with full Gen5 host-side, DDR5-4800 memory, and a genuine 1+1 redundant 2.7 kW Platinum CRPS power supply. Pick this build when rack density matters, when your grant or procurement spec mandates a modern PCIe 5.0 / DDR5 platform, or when redundant power is a requirement rather than an upsell.

Hardware

Component Detail
GPUs 2x NVIDIA RTX Pro 6000 Blackwell Server Edition 96 GB ECC GDDR7 (passive, 600 W, PCIe 5.0 x16, dual-slot)
VRAM pool 192 GB ECC (96 GB x 2) -- each card holds a 70B bf16 model standalone
CPU AMD EPYC 9335 Turin (32C/64T, 3.0/4.4 GHz, 210 W, SP5, 128x PCIe 5.0 lanes, Zen5c, 256 MB L3)
Motherboard ASRock Rack 2U4G-GENOA/M3 (SP5, 4x PCIe 5.0 x16 dual-slot GPU, 8x DDR5 1DPC, OCP 3.0, IPMI AST2600)
System RAM 512 GB DDR5-4800 ECC RDIMM (8x 64 GB, 1DPC fully populated -- max bandwidth configuration)
Boot / storage Kioxia CD8-P 3.84 TB Gen5 U.3 (hot-tier, 1 DWPD, ~12 GB/s read) + Kioxia CD8-P 1.92 TB Gen5 U.3 (boot OS tier) -- 5.76 TB total datacenter Gen5 NVMe
Power supply 1+1 redundant 2.7 kW 80+ Platinum CRPS (2x 1350 W at 230 V) -- genuine N+1 redundancy; one PSU sustains full inference load
Chassis 2U rack-mount with front-to-back directed airflow (80 mm high-static-pressure fans). 24/7-capable.
Cooling SP5 active CPU heatsink + 3x 80x38 mm front intake + 1x 80x80 mm rear exhaust (designed for 4x passive GPU thermal load; 2-card layout provides ample thermal headroom)
Network Intel X710-T2L PCIe dual 10GBASE-T + OCP 3.0 slot available for 25/100 GbE upgrade

Power envelope

  • GPU draw: 2x 600 W = 1 200 W
  • System total at full load: ~1 510 W
  • PSU config: 1+1 redundant CRPS, 2x 1350 W at 230 V (2 700 W total)
  • Headroom: 44.1 % under typical inference load
  • Genuine N+1 redundancy -- one PSU sustains full inference load; no single-PSU failure risk

Lane topology

PCIe Gen5 x16 end-to-end -- both host and card native Gen5. Direct root-complex connection, no PCIe switch. One PCIe 5.0 x16 single-slot + one PCIe 5.0 x8 slot remain available (NIC occupies the x8 slot). No NVLink -- inter-GPU peer-to-peer via PCIe. Gen5 bandwidth eliminates the Gen4 host-cap present in the 4U Rome sibling.

What you can run

With 192 GB ECC VRAM on just two Blackwell cards with native fp8/fp4, this is the cleanest path to dense 70B at bf16 and mid-size MoE. Two independent 70B streams -- one per card -- or 200B MoE across both with minimal 2-way TP overhead.

LLMs -- text / reasoning / coding

Chinese frontier

  • Qwen3 / Qwen3.5 (Alibaba): Qwen3-235B-A22B Q4 (~132 GB) comfortable with long ctx (~15-25 tok/s single-stream across 2 cards); Qwen3-Coder-480B-A35B Q2 (~160 GB); Qwen3.5-122B-A10B fp8 (~75 GB); Qwen3-32B dense bf16 with huge KV; QwQ-32B bf16
  • DeepSeek: DeepSeek-V3/R1 Q2 (~215 GB with small RAM spill) -- Blackwell runs fp8 natively; DeepSeek-R2 32B bf16 two concurrent streams (one per card)
  • GLM / Z.ai: GLM-4.5 / 4.6 / 4.7 Q4 (~177 GB) -- hero config at this tier; GLM-4.5-Air fp8 or bf16 with huge KV
  • Tencent Hunyuan: Hunyuan-Large Q3 (~160 GB) -- 389B MoE with 256k ctx; Hunyuan-A13B fp8 native (~80 GB) with huge KV
  • Others: Baidu ERNIE-4.5-424B Q3 (~180 GB); InternVL3.5-241B-A28B Q4 (~135 GB); MiniMax-M1 Q3 (~180 GB)

Western frontier

  • Meta Llama: Llama 3.3 70B bf16 on one card -- two independent concurrent 70B streams (~20-30 tok/s per stream); Llama 4 Scout bf16 (~218 GB, tight); Llama 4 Maverick Q3 (~188 GB)
  • Mistral: Mistral Large 2 / Pixtral Large / Devstral 2 123B Q6 (~88 GB) single-card or bf16 across both; Mistral Small 3 multi-stream
  • OpenAI (open weights): gpt-oss-120b MXFP4 native (80 GB) -- fits on ONE card, two independent concurrent streams
  • NVIDIA Nemotron: Llama-3.1-Nemotron Ultra 253B Q4 (~147 GB); Super 49B bf16 on single card
  • Others: Cohere Command R+ 104B Q6 (~85 GB) on one card; Google Gemma 3 27B bf16 multiple concurrent streams

Vision-Language Models

InternVL3.5-241B-A28B Q4 (~135 GB); Qwen3-VL-235B-A22B Q4; Qwen3-VL-32B bf16 single-card; Pixtral Large 124B bf16 or Q6; Llama 3.2 90B Vision bf16 (~180 GB); Molmo 72B bf16 (~144 GB); GLM-4.6V 106B fp8; Gemma 3 27B multimodal x 2-3 concurrent streams.

Image generation

FLUX.1 [dev] bf16 multiple concurrent streams; FLUX.1 Kontext [dev]; FLUX Tools; SD 3.5 Large bf16 concurrent; HunyuanImage-2.1 bf16 (~34 GB) x 2-4 concurrent; HunyuanImage-3.0 base (80B MoE, 13B active) bf16 -- fits on one card; HunyuanDiT; Kolors / Kolors 2.0; AuraFlow; OmniGen v1; PixArt-Sigma.

Video generation

Wan 2.2 MoE dual-expert bf16 full context -- fits on one card, two concurrent generation streams; Wan 2.2 TI2V-5B; HunyuanVideo 13B bf16 both experts; HunyuanVideo 1.5; CogVideoX-5B bf16; Open-Sora 2.0 11B bf16; Mochi-1 bf16 (~42 GB); LTX-Video; Pyramid Flow; SVD / SV3D / SV4D; NVIDIA Cosmos Predict 2.

Audio / Speech / TTS

  • ASR: Whisper v3 large / turbo (~50x realtime); Parakeet-TDT; Canary 1B; Qwen3-ASR; SenseVoice
  • TTS: CosyVoice 2/3; Kokoro 82M; XTTS v2; Stable Audio Open; Step-Audio-EditX
  • Realtime / S2S: Kyutai Moshi 7B; Step-Audio 2 mini/R1; Qwen2.5-Omni-7B
  • Music / SFX: MusicGen / AudioGen / Bark; SeamlessM4T v2

Multi-model / multi-tenant serving

  • Two independent 70B streams -- one per card, simplest form of tenant isolation
  • Dense 70B bf16 + supporting stack -- LLM on card 1, image/video/audio on card 2
  • 200B MoE across both cards -- minimal tensor-parallel overhead (2-way split)
  • fp8-native frontier -- DeepSeek V3 family, Hunyuan-Large fp8 with Blackwell native paths

Target workloads

  • Dense 70B bf16 inference -- two cards tensor-parallel with minimal overhead, or one model per card for streaming
  • 100-150B MoE at Q4-Q6 (GLM-4.5-Air, Qwen3.5-122B-A10B, Hunyuan-A13B, Llama 4 Scout)
  • FP8-native frontier inference (DeepSeek V3 family, Hunyuan, Llama 4) -- Blackwell runs fp8 natively
  • Scientific computation requiring datacenter-grade Gen5 NVMe throughput and ECC memory
  • Image + video generation studio at bf16 (Wan 2.2 T2V-A14B, HunyuanVideo 13B, FLUX.1 [dev])
  • Rack-density-constrained deployments -- 2U form factor vs the 4U Rome equivalent at same VRAM
  • Procurement specs mandating PCIe 5.0 / DDR5 platform or redundant PSU

Measured performance

Published references | NVIDIA RTX Pro 6000 Blackwell Server Edition datasheet + community benchmarks

Benchmark Result
Per-card INT8 TOPS (NVIDIA datasheet) 2 000 TOPS
Aggregate INT8 TOPS (2 cards) 4 000 TOPS
Memory bandwidth per card ~1 800 GB/s, 96 GB ECC GDDR7
Llama 3.3 70B bf16 per-card (community) 15-25 tok/s single-stream, 60-90 tok/s batch -- expected improvement from Gen5 host-side memory path in streaming batch workloads vs Gen4 host
Gen5 host-side advantage (single-card same silicon) PCIe 5.0 x16 end-to-end reduces host-device transfer latency for streaming batch workloads; on-card compute-bound tasks see identical throughput to Gen4-hosted builds
Dual-card tensor-parallel 70B (community) ~30-45 tok/s single-stream expected
Blackwell fp8 native DeepSeek-V3 fp8, Hunyuan-A13B fp8 run without bf16 upcast

Published external references, not measured on Kentino hardware. Kentino will publish first-party numbers after the first customer build.

Not ideal for

  • Very high concurrency multi-tenant serving -- 4x L40 or 6x L4 distributes better across more cards
  • Heavy KV cache at very long context -- step up to K-AI 576 Genoa RTXPro6000 12000TOPS
  • Training -- Kentino does not sell H-class NVLink fabrics
  • Budget inference at this VRAM pool -- the 4U Rome K-AI 192 RTXPro6000 4000TOPS build is lower-cost if Gen4 host-side is acceptable and PSU redundancy is not required

Warranty and lead time

2 years
parts warranty
1 year
labor warranty
14-21 days
lead time

NVIDIA OEM 3-year warranty on RTX Pro 6000 Server Edition + 36-month chassis warranty + Kentino integration warranty. Build includes assembly, BIOS/firmware configuration, IPMI setup, driver install, burn-in testing, and functional verification. Lead time of 14-21 business days reflects reseller order for Turin-class components; confirmed at order placement.

Recommended add-ons

  • Expand to 4-card configuration -- chassis has 4 GPU bays natively (current build uses 2 of 4), upgrade path to K-AI 384 Turin2U RTXPro6000 8000TOPS
  • Add 25 GbE or 100 GbE via OCP 3.0 slot (Mellanox ConnectX-5/6 OCP variant)
  • Additional Kioxia CD8-P NVMe in the 2 remaining U.2 bays for RAID or scratch storage
  • Upgrade storage tier to Samsung PM1743 or Kioxia CM7-V for higher endurance (3 DWPD)
  • 24U rack cabinet + online UPS 5 kVA
Προβολή όλων των λεπτομερειών