{"product_id":"k-ai-256-turindual-5090-8-rtx-5090-dual-socket-zen5c-flagship-request-quote-on-cpu","title":"K-AI 256 TurinDual 5090 — 8× RTX 5090 Dual-Socket Zen5c Flagship (Request Quote on CPU)","description":"\u003cdiv style=\"font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;line-height:1.7;color:#1a1a1a\"\u003e\n\n\u003cdiv style=\"background:linear-gradient(135deg,#0d0d0d 0%,#1a1a2e 100%);color:#fff;padding:32px;border-radius:12px;margin-bottom:32px\"\u003e\n\u003cp style=\"font-size:18px;margin:0 0 20px 0;color:#ccc\"\u003eK-AI 256 TurinDual 5090 13408TOPS\u003c\/p\u003e\n\u003cp style=\"font-size:28px;font-weight:700;margin:0 0 16px 0;line-height:1.3\"\u003e256 GB VRAM Flagship Inference Server\u003cbr\u003e8x RTX 5090 | Dual EPYC Turin | 13 408 TOPS INT8\u003c\/p\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-top:24px\"\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e13 408\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eTOPS INT8\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e256 GB\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eVRAM pool\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003efp8\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eBlackwell native\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003eGen5\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003ePCIe end-to-end\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"margin-top:16px;font-size:13px;color:#777\"\u003eCPU pricing finalized at order — Turin 9005-series market moves weekly in Q2 2026.\u003c\/p\u003e\n\u003cp style=\"margin-top:12px;font-size:15px;color:#aaa\"\u003ePublished external references. Not measured on Kentino hardware.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003cp style=\"font-size:17px;color:#333;margin-bottom:24px\"\u003eA 7U rack-mount flagship inference server with eight GeForce RTX 5090 (32 GB GDDR7, Blackwell, fp8 native) on a dual-socket EPYC Turin (Zen5c, SP5) platform with 768 GB DDR5-4800 ECC across all 12 channels, 2 TB NVMe boot, and 5x 1200 W server PSU. End-to-end PCIe Gen5 at the GPU via active retimer\/redriver risers. Runs vLLM, SGLang, llama.cpp, ComfyUI and every major open-weight inference stack out of the box.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eHardware\u003c\/h2\u003e\n\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-bottom:24px;font-size:15px\"\u003e\n\u003cthead\u003e\u003ctr style=\"background:#0d0d0d;color:#fff\"\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eComponent\u003c\/th\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eDetail\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eGPUs\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e8x NVIDIA GeForce RTX 5090 32 GB GDDR7 (Blackwell, 575 W TGP, PCIe 5.0 x16, fp8 native, 1676 INT8 TOPS\/card)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eVRAM pool\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e256 GB aggregate across 8 cards (no NVLink on consumer RTX 5090)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCPU\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e2x AMD EPYC Turin 9005-series (Zen5c, SP5, PCIe 5.0) — quote-pending at order\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eMotherboard\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eASRock Rack TURIN2D24XGM\/500W (dual SP5, PCIe 5.0, 24x DDR5 DIMM)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eSystem RAM\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e768 GB DDR5-4800 ECC RDIMM (12x 64 GB — all 12 channels populated; 12 slots remain for scale to 1.5 TB)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eBoot \/ storage\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e2 TB NVMe M.2 (PCIe 4.0 x4)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003ePower supply\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e5x 1200 W server PSU set (HP-compatible, 6 kW aggregate)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eChassis\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e7U 8-GPU (up to 10 PCIe slots, separate PSU bays)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCooling\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e2x SP5 tower coolers + rack-mount front-to-back airflow (industrial fans)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eRisers\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e8x active PCIe Gen5 x16 (retimer\/redriver) — end-to-end Gen5\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eNetwork\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eOnboard 10 GbE (board-dependent)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:32px\"\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003ePower envelope\u003c\/h3\u003e\n\u003cul style=\"margin:0;padding-left:18px;font-size:14px;color:#444\"\u003e\n\u003cli\u003eGPU draw: 8 x 575 W = 4 600 W\u003c\/li\u003e\n\u003cli\u003eSystem total at full load: ~5 520 W\u003c\/li\u003e\n\u003cli\u003ePSU total: 6 000 W (5x 1200 W) — 8% headroom at spec\u003c\/li\u003e\n\u003cli\u003eKentino ships with GPU power-cap at 500 W — total drops to ~4 920 W (~15% headroom)\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003eLane topology\u003c\/h3\u003e\n\u003cp style=\"margin:0;font-size:14px;color:#444\"\u003eDual Turin provides 2x 128 = 256 PCIe Gen5 lanes host-side. Active Gen5 risers carry Gen5 x16 end-to-end at each GPU — no PCIe switch required (one CPU per 4-card bank). No NVLink; inter-GPU P2P at Gen5 x16 (~60 GB\/s nominal per link).\u003c\/p\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWhat you can run\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#fefaf0;border-left:4px solid #fab400;padding:16px 20px;margin-bottom:24px;border-radius:0 8px 8px 0\"\u003e\n\u003cp style=\"margin:0;font-size:15px;color:#333\"\u003eWith 256 GB of pooled VRAM across 8 Blackwell cards with fp8 native, this server targets frontier 235-480B MoE at Q4 with real context, DeepSeek V3 family at Q2, and Kimi-K2 1.58-bit dynamic-quant at real throughput.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eLLMs — text \/ reasoning \/ coding\u003c\/h3\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin-bottom:8px\"\u003eChinese frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwen3-235B-A22B\u003c\/strong\u003e (Instruct \/ Thinking \/ \"2507\") Q4 (~132 GB) with long context + multi-user batching (~25-40 tok\/s single-stream on 8x RTX 5090, published reference)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eGLM-4.5 \/ 4.6 \/ 4.7\u003c\/strong\u003e Q4 (~177 GB) — flagship reasoning\/coding, 200k ctx on 4.6+\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eGLM-5 \/ GLM-5.1\u003c\/strong\u003e Q2 (~260 GB) with minor RAM spill — frontier coding close to Claude Opus 4.6\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eDeepSeek V3 \/ R1 \/ V3.1 \/ V3.2 \/ V3.2-Speciale\u003c\/strong\u003e Q2 (~215 GB) at useful inference speed (~28 tok\/s single-stream on 8x Blackwell, published reference)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eKimi-K2\u003c\/strong\u003e 1.58-bit UD-TQ1_0 (~240 GB) — trillion-parameter agent at real token throughput (~7-10 tok\/s single-stream, published reference)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eHunyuan-Large\u003c\/strong\u003e 389B\/52B MoE Q4 (~220 GB); \u003cstrong\u003eERNIE-4.5-424B-A47B\u003c\/strong\u003e Q4 (~240 GB)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwen3-Coder-480B-A35B\u003c\/strong\u003e Q4 (~270 GB tight with RAM spill) — SOTA open coding flagship\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eMiniMax-M1 \/ Text-01\u003c\/strong\u003e Q4 (~260 GB) 1M context; \u003cstrong\u003eQwen3.5-397B-A17B\u003c\/strong\u003e Q4 (~214 GB)\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin:20px 0 8px 0\"\u003eWestern frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eMistral Large 3\u003c\/strong\u003e (675B\/41B MoE, Apache 2.0) Q3 (~317 GB with spill) — Western frontier open weights\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eLlama 4 Maverick\u003c\/strong\u003e (400B\/17B, 128 experts) Q4 (~232 GB) multimodal\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eLlama-3.1-Nemotron Ultra 253B\u003c\/strong\u003e Q4 (~119 GB) — matches DeepSeek-R1 at half size\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003egpt-oss-120b\u003c\/strong\u003e MXFP4 native (80 GB) comfortably with room for multiple models\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eDevstral 2\u003c\/strong\u003e 123B (Modified MIT) Q6 — top open coding, 256k ctx\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eLlama 3.3 70B\u003c\/strong\u003e bf16 (~142 GB) multi-tenant serving (~30-40 tok\/s single-stream per RTX 5090 pair TP2, published reference)\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVision-Language Models\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eQwen3-VL-235B-A22B full bf16 (~240 GB on-card); InternVL3.5-241B-A28B (~135 GB Q4); Llama 3.2 90B Vision bf16; Pixtral Large 124B bf16 (~248 GB tight); Qwen3-Omni-30B-A3B; Molmo 72B; ERNIE-4.5-VL; GLM-4.6V full. Blackwell fp8 path gives ~2x throughput on vision-tower inference vs Ada.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eImage generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eFLUX.1 [dev] \/ Kontext \/ Tools full bf16 (~10-18 s\/image at fp8 per card, published reference); SD 3.5 Large; HunyuanImage-2.1 (17B, native 2K); HunyuanImage-3.0 80B\/13B MoE; AuraFlow; OmniGen; multi-worker ComfyUI farms.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVideo generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eWan 2.2 T2V-A14B \/ I2V-A14B dual expert bf16 (both high-noise + low-noise resident simultaneously); HunyuanVideo 13B bf16 both experts; Open-Sora 2.0 (11B) bf16; CogVideoX-5B; Mochi-1; LTX-Video; Pyramid Flow; SVD \/ SV3D \/ SV4D; NVIDIA Cosmos Predict 2.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eAudio \/ Speech \/ TTS\u003c\/h3\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eASR:\u003c\/strong\u003e Whisper v3 large \/ turbo (~50x realtime); Parakeet-TDT 1.1B; Canary 1B; Qwen3-ASR; SenseVoice\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eTTS:\u003c\/strong\u003e CosyVoice 2 \/ 3; Kokoro; Stable Audio Open; XTTS v2; Step-Audio-EditX\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eRealtime \/ S2S:\u003c\/strong\u003e Kyutai Moshi; Step-Audio 2 mini \/ R1; Qwen2.5-Omni-7B\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eMusic \/ SFX:\u003c\/strong\u003e MusicGen; AudioGen; Bark; SeamlessM4T v2\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eMulti-model \/ multi-tenant serving\u003c\/h3\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eFrontier-inference gateway — 200B+ MoE + concurrent 70B + image + video all resident\u003c\/li\u003e\n\u003cli\u003e8-way tensor-parallel for Kimi-K2 \/ DeepSeek V3 at real context\u003c\/li\u003e\n\u003cli\u003eMulti-tenant LLM API — 50-100 concurrent users on 235B Q4 via vLLM\/SGLang\u003c\/li\u003e\n\u003cli\u003eFull Chinese + Western frontier residency concurrently for evaluation \/ benchmarking\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eTarget workloads\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eFrontier open-weight inference backend for a 100-500 seat org, mixing Qwen3-235B, GLM-4.5+, and DeepSeek V3 Q2\u003c\/li\u003e\n\u003cli\u003eKimi-K2 1.58-bit agent platform at production throughput (tool-use, 200+ sequential calls)\u003c\/li\u003e\n\u003cli\u003eFull-fp8 DeepSeek V3 \/ R1 serving on Blackwell silicon\u003c\/li\u003e\n\u003cli\u003eMulti-node training head with Gen5 100 GbE \/ InfiniBand fabric\u003c\/li\u003e\n\u003cli\u003eDual-role inference + diffusion farm (Qwen3-235B + FLUX.1 + HunyuanVideo 13B concurrently)\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003ePublished performance references\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#0d0d0d;color:#fff;border-radius:12px;padding:24px;margin-bottom:24px\"\u003e\n\u003cp style=\"margin:0 0 4px 0;font-size:13px;color:#888;text-transform:uppercase;letter-spacing:1px\"\u003eExternal references | Not measured on Kentino hardware\u003c\/p\u003e\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-top:16px;font-size:14px\"\u003e\n\u003cthead\u003e\u003ctr style=\"border-bottom:1px solid #333\"\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eBenchmark\u003c\/th\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eResult\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eRTX 5090 per-card INT8 TOPS\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e1 676 TOPS\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eRTX 5090 memory bandwidth\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~1 800 GB\/s per card\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003evLLM — Qwen3-235B Q4_K_M on 4x RTX 5090 (single)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~90 tok\/s\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003evLLM — Qwen3-235B Q4_K_M on 4x RTX 5090 (batch-32)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e~450 tok\/s aggregate\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eSGLang — DeepSeek V3 Q2 on 8x Blackwell (single)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~28 tok\/s\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003ellama.cpp — Kimi-K2 UD-TQ1_0 on 8x Blackwell 256 GB\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~7-10 tok\/s\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\u003cp style=\"margin:12px 0 0 0;font-size:13px;color:#666\"\u003eKentino will publish first-party tok\/s after the first customer build with final Turin SKU.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eNot ideal for\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eBudget-conscious deployments (Turin premium vs Genoa or Rome alternatives)\u003c\/li\u003e\n\u003cli\u003eSingle-tenant 70B dense workloads (overkill — 4x RTX 5090 or 4x RTX Pro 6000 is the right tier)\u003c\/li\u003e\n\u003cli\u003eFrontier 600B+ at Q4+ full context (require 576 GB+ pool — see 6x RTX Pro 6000)\u003c\/li\u003e\n\u003cli\u003eSustained training from scratch (no NVLink on consumer RTX 5090)\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWarranty and lead time\u003c\/h2\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:24px\"\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e2 years\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003eparts warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e1 year\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elabor warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e10-28 days\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elead time\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"font-size:14px;color:#666\"\u003eBuild includes assembly, BIOS configuration, driver install, burn-in testing, and functional verification. Lead time depends on component availability, confirmed at order.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eRecommended add-ons\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eScale RAM to 1.5 TB DDR5 (24x 64 GB full population) — required for Kimi-K2 Q4 or DeepSeek V3 Q3 without RAM spill\u003c\/li\u003e\n\u003cli\u003eNVIDIA ConnectX-5 100 GbE MCX555A-ECAT — Gen5 fabric for cluster nodes\u003c\/li\u003e\n\u003cli\u003eMellanox ConnectX-6 25 GbE SFP28 for datacenter fabric\u003c\/li\u003e\n\u003cli\u003e4 TB NVMe Gen4 x4 for boot + model library\u003c\/li\u003e\n\u003cli\u003eFull 24U rack cabinet with managed PDU\u003c\/li\u003e\n\u003cli\u003eOnline UPS 8-10 kVA (critical — 5.5 kW peak draw)\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003c\/div\u003e\n","brand":"Kentino s.r.o.","offers":[{"title":"Default Title","offer_id":52940216631624,"sku":null,"price":0.0,"currency_code":"EUR","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0843\/5479\/3800\/files\/kentino-ai-server-4-gpu-topdown_6b2c51b2-25c1-479d-929a-29eebe60e5ef.jpg?v=1776940959","url":"https:\/\/kentino.com\/zh\/products\/k-ai-256-turindual-5090-8-rtx-5090-dual-socket-zen5c-flagship-request-quote-on-cpu","provider":"Kentino","version":"1.0","type":"link"}