{"product_id":"k-ai-192-turin2u-rtxpro6000-4000tops-2-rtx-pro-6000-blackwell-server-edition-2u-turin-sp5","title":"K-AI 192 Turin2U RTXPro6000 4000TOPS — 2× RTX Pro 6000 Blackwell Server Edition — 2U Turin SP5","description":"\u003cdiv style=\"font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;line-height:1.7;color:#1a1a1a\"\u003e\n\n\u003cdiv style=\"background:linear-gradient(135deg,#0d0d0d 0%,#1a1a2e 100%);color:#fff;padding:32px;border-radius:12px;margin-bottom:32px\"\u003e\n\u003cp style=\"font-size:18px;margin:0 0 20px 0;color:#ccc\"\u003eK-AI 192 Turin2U RTXPro6000 4000TOPS\u003c\/p\u003e\n\u003cp style=\"font-size:28px;font-weight:700;margin:0 0 16px 0;line-height:1.3\"\u003e192 GB ECC Blackwell Flagship Pair\u003cbr\u003e2x RTX Pro 6000 Server Edition | EPYC Turin SP5 | 4 000 TOPS INT8\u003c\/p\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-top:24px\"\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e4 000\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eINT8 TOPS\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e192 GB\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eECC VRAM\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003eBlackwell\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003efp8 native\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e2-card\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eminimal TP\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"margin-top:20px;font-size:15px;color:#aaa\"\u003eTwo passive RTX Pro 6000 Blackwell Server Edition cards -- 96 GB ECC each. Less tensor-parallel overhead than 4- or 8-card builds. Datacenter flagship pair on a Gen5\/DDR5 2U platform with genuine 1+1 redundant power.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003cp style=\"font-size:17px;color:#333;margin-bottom:24px\"\u003eA 2U rack-mount inference server with two passive RTX Pro 6000 Blackwell Server Edition cards (96 GB ECC GDDR7 per card), one AMD EPYC 9335 Turin CPU (32C\/64T, 3.0\/4.4 GHz), 512 GB DDR5-4800 ECC, 5.76 TB datacenter Gen5 NVMe, and a 1+1 redundant 2.7 kW 80+ Platinum CRPS power supply. Starting from €56 600 ex VAT. For 70B dense bf16 and mid-size MoE, fewer big cards beat more small cards -- two-card tensor parallelism has minimal communication overhead, and each 96 GB card carries a complete copy of most models.\u003c\/p\u003e\n\n\u003cp style=\"font-size:17px;color:#333;margin-bottom:32px\"\u003eThe same 192 GB Blackwell pair as our 4U Rome build, in a 2U rack-dense ASRock chassis with full Gen5 host-side, DDR5-4800 memory, and a genuine 1+1 redundant 2.7 kW Platinum CRPS power supply. Pick this build when rack density matters, when your grant or procurement spec mandates a modern PCIe 5.0 \/ DDR5 platform, or when redundant power is a requirement rather than an upsell.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eHardware\u003c\/h2\u003e\n\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-bottom:24px;font-size:15px\"\u003e\n\u003cthead\u003e\u003ctr style=\"background:#0d0d0d;color:#fff\"\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eComponent\u003c\/th\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eDetail\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eGPUs\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e2x NVIDIA RTX Pro 6000 Blackwell Server Edition 96 GB ECC GDDR7 (passive, 600 W, PCIe 5.0 x16, dual-slot)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eVRAM pool\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e192 GB ECC (96 GB x 2) -- each card holds a 70B bf16 model standalone\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCPU\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eAMD EPYC 9335 Turin (32C\/64T, 3.0\/4.4 GHz, 210 W, SP5, 128x PCIe 5.0 lanes, Zen5c, 256 MB L3)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eMotherboard\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eASRock Rack 2U4G-GENOA\/M3 (SP5, 4x PCIe 5.0 x16 dual-slot GPU, 8x DDR5 1DPC, OCP 3.0, IPMI AST2600)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eSystem RAM\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e512 GB DDR5-4800 ECC RDIMM (8x 64 GB, 1DPC fully populated -- max bandwidth configuration)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eBoot \/ storage\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eKioxia CD8-P 3.84 TB Gen5 U.3 (hot-tier, 1 DWPD, ~12 GB\/s read) + Kioxia CD8-P 1.92 TB Gen5 U.3 (boot OS tier) -- 5.76 TB total datacenter Gen5 NVMe\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003ePower supply\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e1+1 redundant 2.7 kW 80+ Platinum CRPS (2x 1350 W at 230 V) -- genuine N+1 redundancy; one PSU sustains full inference load\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eChassis\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e2U rack-mount with front-to-back directed airflow (80 mm high-static-pressure fans). 24\/7-capable.\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCooling\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eSP5 active CPU heatsink + 3x 80x38 mm front intake + 1x 80x80 mm rear exhaust (designed for 4x passive GPU thermal load; 2-card layout provides ample thermal headroom)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eNetwork\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eIntel X710-T2L PCIe dual 10GBASE-T + OCP 3.0 slot available for 25\/100 GbE upgrade\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:32px\"\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003ePower envelope\u003c\/h3\u003e\n\u003cul style=\"margin:0;padding-left:18px;font-size:14px;color:#444\"\u003e\n\u003cli\u003eGPU draw: 2x 600 W = 1 200 W\u003c\/li\u003e\n\u003cli\u003eSystem total at full load: ~1 510 W\u003c\/li\u003e\n\u003cli\u003ePSU config: 1+1 redundant CRPS, 2x 1350 W at 230 V (2 700 W total)\u003c\/li\u003e\n\u003cli\u003eHeadroom: 44.1 % under typical inference load\u003c\/li\u003e\n\u003cli\u003eGenuine N+1 redundancy -- one PSU sustains full inference load; no single-PSU failure risk\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003eLane topology\u003c\/h3\u003e\n\u003cp style=\"margin:0;font-size:14px;color:#444\"\u003ePCIe Gen5 x16 end-to-end -- both host and card native Gen5. Direct root-complex connection, no PCIe switch. One PCIe 5.0 x16 single-slot + one PCIe 5.0 x8 slot remain available (NIC occupies the x8 slot). No NVLink -- inter-GPU peer-to-peer via PCIe. Gen5 bandwidth eliminates the Gen4 host-cap present in the 4U Rome sibling.\u003c\/p\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWhat you can run\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#fefaf0;border-left:4px solid #fab400;padding:16px 20px;margin-bottom:24px;border-radius:0 8px 8px 0\"\u003e\n\u003cp style=\"margin:0;font-size:15px;color:#333\"\u003eWith 192 GB ECC VRAM on just two Blackwell cards with native fp8\/fp4, this is the cleanest path to dense 70B at bf16 and mid-size MoE. Two independent 70B streams -- one per card -- or 200B MoE across both with minimal 2-way TP overhead.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eLLMs -- text \/ reasoning \/ coding\u003c\/h3\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin-bottom:8px\"\u003eChinese frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwen3 \/ Qwen3.5 (Alibaba):\u003c\/strong\u003e Qwen3-235B-A22B Q4 (~132 GB) comfortable with long ctx (~15-25 tok\/s single-stream across 2 cards); Qwen3-Coder-480B-A35B Q2 (~160 GB); Qwen3.5-122B-A10B fp8 (~75 GB); Qwen3-32B dense bf16 with huge KV; QwQ-32B bf16\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eDeepSeek:\u003c\/strong\u003e DeepSeek-V3\/R1 Q2 (~215 GB with small RAM spill) -- Blackwell runs fp8 natively; DeepSeek-R2 32B bf16 two concurrent streams (one per card)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eGLM \/ Z.ai:\u003c\/strong\u003e GLM-4.5 \/ 4.6 \/ 4.7 Q4 (~177 GB) -- hero config at this tier; GLM-4.5-Air fp8 or bf16 with huge KV\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eTencent Hunyuan:\u003c\/strong\u003e Hunyuan-Large Q3 (~160 GB) -- 389B MoE with 256k ctx; Hunyuan-A13B fp8 native (~80 GB) with huge KV\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eOthers:\u003c\/strong\u003e Baidu ERNIE-4.5-424B Q3 (~180 GB); InternVL3.5-241B-A28B Q4 (~135 GB); MiniMax-M1 Q3 (~180 GB)\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin:20px 0 8px 0\"\u003eWestern frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eMeta Llama:\u003c\/strong\u003e Llama 3.3 70B bf16 on one card -- two independent concurrent 70B streams (~20-30 tok\/s per stream); Llama 4 Scout bf16 (~218 GB, tight); Llama 4 Maverick Q3 (~188 GB)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eMistral:\u003c\/strong\u003e Mistral Large 2 \/ Pixtral Large \/ Devstral 2 123B Q6 (~88 GB) single-card or bf16 across both; Mistral Small 3 multi-stream\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eOpenAI (open weights):\u003c\/strong\u003e gpt-oss-120b MXFP4 native (80 GB) -- fits on ONE card, two independent concurrent streams\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eNVIDIA Nemotron:\u003c\/strong\u003e Llama-3.1-Nemotron Ultra 253B Q4 (~147 GB); Super 49B bf16 on single card\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eOthers:\u003c\/strong\u003e Cohere Command R+ 104B Q6 (~85 GB) on one card; Google Gemma 3 27B bf16 multiple concurrent streams\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVision-Language Models\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eInternVL3.5-241B-A28B Q4 (~135 GB); Qwen3-VL-235B-A22B Q4; Qwen3-VL-32B bf16 single-card; Pixtral Large 124B bf16 or Q6; Llama 3.2 90B Vision bf16 (~180 GB); Molmo 72B bf16 (~144 GB); GLM-4.6V 106B fp8; Gemma 3 27B multimodal x 2-3 concurrent streams.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eImage generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eFLUX.1 [dev] bf16 multiple concurrent streams; FLUX.1 Kontext [dev]; FLUX Tools; SD 3.5 Large bf16 concurrent; HunyuanImage-2.1 bf16 (~34 GB) x 2-4 concurrent; HunyuanImage-3.0 base (80B MoE, 13B active) bf16 -- fits on one card; HunyuanDiT; Kolors \/ Kolors 2.0; AuraFlow; OmniGen v1; PixArt-Sigma.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVideo generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eWan 2.2 MoE dual-expert bf16 full context -- fits on one card, two concurrent generation streams; Wan 2.2 TI2V-5B; HunyuanVideo 13B bf16 both experts; HunyuanVideo 1.5; CogVideoX-5B bf16; Open-Sora 2.0 11B bf16; Mochi-1 bf16 (~42 GB); LTX-Video; Pyramid Flow; SVD \/ SV3D \/ SV4D; NVIDIA Cosmos Predict 2.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eAudio \/ Speech \/ TTS\u003c\/h3\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eASR:\u003c\/strong\u003e Whisper v3 large \/ turbo (~50x realtime); Parakeet-TDT; Canary 1B; Qwen3-ASR; SenseVoice\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eTTS:\u003c\/strong\u003e CosyVoice 2\/3; Kokoro 82M; XTTS v2; Stable Audio Open; Step-Audio-EditX\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eRealtime \/ S2S:\u003c\/strong\u003e Kyutai Moshi 7B; Step-Audio 2 mini\/R1; Qwen2.5-Omni-7B\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eMusic \/ SFX:\u003c\/strong\u003e MusicGen \/ AudioGen \/ Bark; SeamlessM4T v2\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eMulti-model \/ multi-tenant serving\u003c\/h3\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eTwo independent 70B streams -- one per card, simplest form of tenant isolation\u003c\/li\u003e\n\u003cli\u003eDense 70B bf16 + supporting stack -- LLM on card 1, image\/video\/audio on card 2\u003c\/li\u003e\n\u003cli\u003e200B MoE across both cards -- minimal tensor-parallel overhead (2-way split)\u003c\/li\u003e\n\u003cli\u003efp8-native frontier -- DeepSeek V3 family, Hunyuan-Large fp8 with Blackwell native paths\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eTarget workloads\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eDense 70B bf16 inference -- two cards tensor-parallel with minimal overhead, or one model per card for streaming\u003c\/li\u003e\n\u003cli\u003e100-150B MoE at Q4-Q6 (GLM-4.5-Air, Qwen3.5-122B-A10B, Hunyuan-A13B, Llama 4 Scout)\u003c\/li\u003e\n\u003cli\u003eFP8-native frontier inference (DeepSeek V3 family, Hunyuan, Llama 4) -- Blackwell runs fp8 natively\u003c\/li\u003e\n\u003cli\u003eScientific computation requiring datacenter-grade Gen5 NVMe throughput and ECC memory\u003c\/li\u003e\n\u003cli\u003eImage + video generation studio at bf16 (Wan 2.2 T2V-A14B, HunyuanVideo 13B, FLUX.1 [dev])\u003c\/li\u003e\n\u003cli\u003eRack-density-constrained deployments -- 2U form factor vs the 4U Rome equivalent at same VRAM\u003c\/li\u003e\n\u003cli\u003eProcurement specs mandating PCIe 5.0 \/ DDR5 platform or redundant PSU\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eMeasured performance\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#0d0d0d;color:#fff;border-radius:12px;padding:24px;margin-bottom:24px\"\u003e\n\u003cp style=\"margin:0 0 4px 0;font-size:13px;color:#888;text-transform:uppercase;letter-spacing:1px\"\u003ePublished references | NVIDIA RTX Pro 6000 Blackwell Server Edition datasheet + community benchmarks\u003c\/p\u003e\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-top:16px;font-size:14px\"\u003e\n\u003cthead\u003e\u003ctr style=\"border-bottom:1px solid #333\"\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eBenchmark\u003c\/th\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eResult\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003ePer-card INT8 TOPS (NVIDIA datasheet)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e2 000 TOPS\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eAggregate INT8 TOPS (2 cards)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e4 000 TOPS\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eMemory bandwidth per card\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~1 800 GB\/s, 96 GB ECC GDDR7\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eLlama 3.3 70B bf16 per-card (community)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e15-25 tok\/s single-stream, 60-90 tok\/s batch -- expected improvement from Gen5 host-side memory path in streaming batch workloads vs Gen4 host\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eGen5 host-side advantage (single-card same silicon)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003ePCIe 5.0 x16 end-to-end reduces host-device transfer latency for streaming batch workloads; on-card compute-bound tasks see identical throughput to Gen4-hosted builds\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eDual-card tensor-parallel 70B (community)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~30-45 tok\/s single-stream expected\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eBlackwell fp8 native\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003eDeepSeek-V3 fp8, Hunyuan-A13B fp8 run without bf16 upcast\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\u003cp style=\"margin:12px 0 0 0;font-size:13px;color:#666\"\u003ePublished external references, not measured on Kentino hardware. Kentino will publish first-party numbers after the first customer build.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eNot ideal for\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eVery high concurrency multi-tenant serving -- 4x L40 or 6x L4 distributes better across more cards\u003c\/li\u003e\n\u003cli\u003eHeavy KV cache at very long context -- step up to K-AI 576 Genoa RTXPro6000 12000TOPS\u003c\/li\u003e\n\u003cli\u003eTraining -- Kentino does not sell H-class NVLink fabrics\u003c\/li\u003e\n\u003cli\u003eBudget inference at this VRAM pool -- the 4U Rome K-AI 192 RTXPro6000 4000TOPS build is lower-cost if Gen4 host-side is acceptable and PSU redundancy is not required\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWarranty and lead time\u003c\/h2\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:24px\"\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e2 years\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003eparts warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e1 year\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elabor warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e14-21 days\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elead time\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"font-size:14px;color:#666\"\u003eNVIDIA OEM 3-year warranty on RTX Pro 6000 Server Edition + 36-month chassis warranty + Kentino integration warranty. Build includes assembly, BIOS\/firmware configuration, IPMI setup, driver install, burn-in testing, and functional verification. Lead time of 14-21 business days reflects reseller order for Turin-class components; confirmed at order placement.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eRecommended add-ons\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eExpand to 4-card configuration -- chassis has 4 GPU bays natively (current build uses 2 of 4), upgrade path to K-AI 384 Turin2U RTXPro6000 8000TOPS\u003c\/li\u003e\n\u003cli\u003eAdd 25 GbE or 100 GbE via OCP 3.0 slot (Mellanox ConnectX-5\/6 OCP variant)\u003c\/li\u003e\n\u003cli\u003eAdditional Kioxia CD8-P NVMe in the 2 remaining U.2 bays for RAID or scratch storage\u003c\/li\u003e\n\u003cli\u003eUpgrade storage tier to Samsung PM1743 or Kioxia CM7-V for higher endurance (3 DWPD)\u003c\/li\u003e\n\u003cli\u003e24U rack cabinet + online UPS 5 kVA\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003c\/div\u003e\n","brand":"Kentino s.r.o.","offers":[{"title":"Default Title","offer_id":52942435975496,"sku":null,"price":56600.0,"currency_code":"EUR","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0843\/5479\/3800\/files\/kentino-ai-server-4-gpu-topdown_6b2c51b2-25c1-479d-929a-29eebe60e5ef.jpg?v=1776940959","url":"https:\/\/kentino.com\/zh\/products\/k-ai-192-turin2u-rtxpro6000-4000tops-2-rtx-pro-6000-blackwell-server-edition-2u-turin-sp5","provider":"Kentino","version":"1.0","type":"link"}