{"product_id":"k-ai-192-romedual-4090-5288tops-8-rtx-4090-dual-epyc-milan","title":"K-AI 192 RomeDual 4090 5288TOPS — 8× RTX 4090 — Dual EPYC Milan","description":"\u003cdiv style=\"font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;line-height:1.7;color:#1a1a1a\"\u003e\n\n\u003cdiv style=\"background:linear-gradient(135deg,#0d0d0d 0%,#1a1a2e 100%);color:#fff;padding:32px;border-radius:12px;margin-bottom:32px\"\u003e\n\u003cp style=\"font-size:18px;margin:0 0 20px 0;color:#ccc\"\u003eK-AI 192 RomeDual 4090 5288TOPS\u003c\/p\u003e\n\u003cp style=\"font-size:28px;font-weight:700;margin:0 0 16px 0;line-height:1.3\"\u003e192 GB VRAM 8-GPU Inference Server\u003cbr\u003e8x RTX 4090 | Dual EPYC Milan | 5 288 TOPS INT8\u003c\/p\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-top:24px\"\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e5 288\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eINT8 TOPS\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e192 GB\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eVRAM pool\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e8-GPU\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003etensor parallel\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003edual\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eCPU 96C\/192T\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"margin-top:20px;font-size:15px;color:#aaa\"\u003eFlagship 8x gaming-GPU inference box. 192 GB pool at consumer-card economics on a dual-socket EPYC Milan platform.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003cp style=\"font-size:17px;color:#333;margin-bottom:24px\"\u003eA 7U 8-GPU chassis built around dual EPYC 7643 Milan CPUs (96C\/192T total), ASRock Rack ROME2D32GM-NL dual-SP3 motherboard, 512 GB DDR4 ECC, 2 TB NVMe boot, and a 5x 1200 W server PSU set. Eight GeForce RTX 4090 connect via active PCIe Gen4 retimer risers at full x16. The cheapest path to 192 GB frontier MoE inference on Kentino hardware.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eHardware\u003c\/h2\u003e\n\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-bottom:24px;font-size:15px\"\u003e\n\u003cthead\u003e\u003ctr style=\"background:#0d0d0d;color:#fff\"\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eComponent\u003c\/th\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eDetail\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eGPUs\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e8x NVIDIA GeForce RTX 4090 24 GB GDDR6X (Ada Lovelace, 450 W, PCIe 4.0 x16)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eVRAM pool\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e192 GB total across 8 cards (no NVLink on consumer RTX 4090)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCPU\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e2x AMD EPYC 7643 Milan (48C\/96T each — 96C\/192T total, 225 W each, 2x 128 PCIe 4.0 lanes)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eMotherboard\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eASRock Rack ROME2D32GM-NL (dual SP3, PCIe 4.0, 32x DDR4 ECC DIMM slots)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eSystem RAM\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e512 GB DDR4-2666 ECC RDIMM (8x 64 GB — 4 per socket for 8-channel balance)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eBoot \/ storage\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e2 TB NVMe M.2 (PCIe 4.0 x4)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003ePower supply\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e5x 1200 W server PSU set (HP-compatible, hot-swap) + full 12VHPWR adapter set\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eChassis\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e7U 8-GPU chassis (up to 10 PCIe cards including risers)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eRisers\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e8x active PCIe Gen4 x16 retimer risers (required over cable length)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCooling\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e2x Arctic Freezer 4U-M SP3 tower coolers + rack-mount front-to-back airflow (industrial fans)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eNetwork\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eOnboard dual 10 GbE (Intel X550)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:32px\"\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003ePower envelope\u003c\/h3\u003e\n\u003cul style=\"margin:0;padding-left:18px;font-size:14px;color:#444\"\u003e\n\u003cli\u003eGPU draw: 8 x 450 W = 3 600 W\u003c\/li\u003e\n\u003cli\u003eCPU draw: 2 x 225 W = 450 W\u003c\/li\u003e\n\u003cli\u003eSystem total at full load: ~4 200 W\u003c\/li\u003e\n\u003cli\u003ePSU total: 6 000 W all-active (5x 1200 W) — 30.0 % headroom\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003eLane topology\u003c\/h3\u003e\n\u003cp style=\"margin:0;font-size:14px;color:#444\"\u003eROME2D32GM-NL exposes 2x 128 PCIe Gen4 lanes — one 128-lane pool per EPYC socket — direct to GPU slots. Active Gen4 retimer risers for signal integrity. No PCIe switch. No NVLink. Measured 19-22 GB\/s inter-GPU peer-to-peer on 4-GPU bench.\u003c\/p\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWhat you can run\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#fefaf0;border-left:4px solid #fab400;padding:16px 20px;margin-bottom:24px;border-radius:0 8px 8px 0\"\u003e\n\u003cp style=\"margin:0;font-size:15px;color:#333\"\u003eWith 192 GB across 8 cards, this server handles 200B+ frontier MoE at Q4, 8-way tensor-parallel inference, tenant-isolated multi-model serving, and high-batch throughput at consumer-card economics.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eLLMs — text \/ reasoning \/ coding\u003c\/h3\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin-bottom:8px\"\u003eChinese frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwen3 \/ Qwen3.5 (Alibaba):\u003c\/strong\u003e Qwen3-235B-A22B Q4 (~132 GB) with long ctx — the hero config (~15-25 tok\/s single-stream on 8x RTX 4090); Qwen3-Coder-480B-A35B Q2 (~160 GB); Qwen3.5-122B-A10B fp8 (~75 GB) multi-stream; Qwen3-32B dense bf16 x multiple concurrent\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eDeepSeek:\u003c\/strong\u003e DeepSeek-V3\/R1 Q2 (~215 GB with 512 GB host spill); DeepSeek-R2 32B bf16 — up to 8 concurrent streams one per card (~30-40 tok\/s per stream)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eGLM \/ Z.ai:\u003c\/strong\u003e GLM-4.5 \/ 4.6 \/ 4.7 Q4 (~177 GB); GLM-4.5-Air fp8 or bf16; GLM-4.6V 106B\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eTencent Hunyuan:\u003c\/strong\u003e Hunyuan-Large Q3 (~160 GB); Hunyuan-A13B Q4\/Q6 (RTX 4090 is Ada — fp8 upcasts to bf16, use GGUF quants)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eOthers:\u003c\/strong\u003e Baidu ERNIE-4.5-424B Q3 (~180 GB); InternVL3.5-241B-A28B Q4 (~135 GB); Qwen3.5-397B Q3 (~170 GB); MiniMax-M1 Q3 (~180 GB)\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin:20px 0 8px 0\"\u003eWestern frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eMeta Llama:\u003c\/strong\u003e Llama 3.3 70B bf16 with massive KV (~20 tok\/s single-stream Q4, ~179 tok\/s batch-32 vLLM — Kentino measured on 4-GPU bench); Llama 4 Scout bf16 (~218 GB tight); Llama 4 Maverick Q3 (~188 GB)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eMistral:\u003c\/strong\u003e Mistral Large 2 \/ Pixtral Large 123B Q6 comfortable or bf16 (~248 GB spill); Mistral Small 3 multi-stream\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eOpenAI (open weights):\u003c\/strong\u003e gpt-oss-120b MXFP4 native (80 GB) with huge KV\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eNVIDIA Nemotron:\u003c\/strong\u003e Llama-3.1-Nemotron Ultra 253B Q4 (~147 GB); Super 49B bf16\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eOthers:\u003c\/strong\u003e Cohere Command R+ 104B Q6 (~85 GB); Google Gemma 3 27B bf16 x multiple streams\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVision-Language Models\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eInternVL3.5-241B-A28B Q4 (~135 GB); Qwen3-VL-235B-A22B Q4; Qwen3-VL-32B bf16 multi-stream; Llama 3.2 90B Vision bf16 (~180 GB); Pixtral Large 124B Q6; Molmo 72B bf16; GLM-4.6V 106B fp8\/Q6; Gemma 3 27B multimodal x multiple streams.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eImage generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eFLUX.1 [dev] bf16 — up to 8 concurrent generation streams (one per card, ~15-25 s\/image at fp8); FLUX.1 Kontext [dev]; FLUX Tools; SD 3.5 Large bf16 x 8; HunyuanImage-2.1 bf16 (~34 GB) x 2-4 concurrent; HunyuanImage-3.0 base (80B MoE, 13B active) bf16; HunyuanDiT; Kolors \/ Kolors 2.0; AuraFlow; OmniGen v1; PixArt-Sigma.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVideo generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eWan 2.2 MoE dual-expert bf16 with full ctx — multiple concurrent streams; Wan 2.2 TI2V-5B x 8 concurrent; HunyuanVideo 13B bf16 both experts; HunyuanVideo 1.5; CogVideoX-5B bf16; Open-Sora 2.0 11B bf16; Genmo Mochi-1 bf16; LTX-Video x 8 concurrent; Pyramid Flow; SVD \/ SV3D \/ SV4D; NVIDIA Cosmos.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eAudio \/ Speech \/ TTS\u003c\/h3\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eASR:\u003c\/strong\u003e Whisper v3 large \/ turbo x 8 concurrent (~50x realtime per stream); Parakeet-TDT; Canary 1B; Qwen3-ASR; SenseVoice\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eTTS:\u003c\/strong\u003e CosyVoice 2\/3; Kokoro 82M; XTTS v2; Stable Audio Open\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eRealtime \/ S2S:\u003c\/strong\u003e Kyutai Moshi 7B x 8 concurrent voice streams; Step-Audio 2 mini\/R1; Qwen2.5-Omni-7B\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eMusic \/ SFX:\u003c\/strong\u003e MusicGen \/ AudioGen \/ Bark; SeamlessM4T v2\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eMulti-model \/ multi-tenant serving\u003c\/h3\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e8-way tensor-parallel inference of 200-250B MoE at Q4 (Qwen3-235B, GLM-4.5\/4.6\/4.7)\u003c\/li\u003e\n\u003cli\u003eTenant-isolated 8-stream serving — one 24 GB Q4 model per card (e.g. 8x Qwen3-14B agents)\u003c\/li\u003e\n\u003cli\u003eLarge-batch 70B — tensor-parallel vLLM \/ SGLang batch-64 aggregate\u003c\/li\u003e\n\u003cli\u003eMixed fleet: 235B MoE on 4 cards (TP4) + FLUX + video + realtime voice on remaining 4\u003c\/li\u003e\n\u003cli\u003eFine-tuning lab — 7-34B LoRA \/ QLoRA with large batch\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eTarget workloads\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e8-GPU tensor-parallel inference at the 192 GB pool — Qwen3-235B Q4, GLM-4.5\/4.6\/4.7 Q4, Llama 4 Scout bf16\u003c\/li\u003e\n\u003cli\u003eDense 70B bf16 (Llama 3.3 70B) with massive KV headroom for long ctx and high batch\u003c\/li\u003e\n\u003cli\u003eHigh-throughput batch inference gateway — vLLM \/ SGLang tensor-parallel at large batch\u003c\/li\u003e\n\u003cli\u003eFine-tuning of 7-34B class models with high-batch LoRA \/ QLoRA\u003c\/li\u003e\n\u003cli\u003eWan 2.2 dual-expert \/ HunyuanImage-3.0 \/ FLUX.1 full workflow video-image studio\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eMeasured performance\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#0d0d0d;color:#fff;border-radius:12px;padding:24px;margin-bottom:24px\"\u003e\n\u003cp style=\"margin:0 0 4px 0;font-size:13px;color:#888;text-transform:uppercase;letter-spacing:1px\"\u003eKentino bench (4-GPU reference) | 2026-04-10 | 4x RTX 4090 + EPYC 7542 + 512 GB DDR4 + ROMED8-2T\u003c\/p\u003e\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-top:16px;font-size:14px\"\u003e\n\u003cthead\u003e\u003ctr style=\"border-bottom:1px solid #333\"\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eBenchmark\u003c\/th\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eResult\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eSustained compute (fp16, 4-card ref)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e647 TFLOPS\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003evLLM — Llama 3.3 70B AWQ INT4 (single)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e8.0 tok\/s\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003evLLM — Llama 3.3 70B AWQ INT4 (batch-32)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e179 tok\/s aggregate\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003ellama.cpp — Llama 3.3 70B Q4_K_M (single)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e20.3 tok\/s decode\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003e8-GPU aggregate compute (extrapolation)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~1 294 TFLOPS fp16 expected (near-linear)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003e235B Q4 tensor-parallel 8-way (community)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e15-25 tok\/s single-stream on 8x RTX 4090\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\u003cp style=\"margin:12px 0 0 0;font-size:13px;color:#666\"\u003e4-card data measured on Kentino hardware. 8-GPU extrapolation is published external reference. Kentino will publish first-party 8-GPU numbers after the first customer build.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eNot ideal for\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e5090-generation workloads (Blackwell fp8 native + higher TOPS) — see K-AI 256 TurinDual 5090\u003c\/li\u003e\n\u003cli\u003eTraining from scratch (no NVLink on consumer RTX 4090)\u003c\/li\u003e\n\u003cli\u003eECC-sensitive 24\/7 production — consumer RTX 4090 has no ECC; prefer 4x L40 or 2x RTX Pro 6000 Server Edition\u003c\/li\u003e\n\u003cli\u003eHunyuan \/ DeepSeek fp8 native — RTX 4090 is Ada, fp8 checkpoints upcast to bf16\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWarranty and lead time\u003c\/h2\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:24px\"\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e2 years\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003eparts warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e1 year\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elabor warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e10-28 days\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elead time\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"font-size:14px;color:#666\"\u003eBuild includes assembly, BIOS config with dual-socket NUMA tuning, driver install, burn-in, memtest, full 8-GPU stress test, and LLM environment setup. Lead time depends on component availability, confirmed at order.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eRecommended add-ons\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e4 TB additional NVMe for weight staging and MoE offload workloads\u003c\/li\u003e\n\u003cli\u003eNVIDIA ConnectX-5 100 GbE for multi-node serving\u003c\/li\u003e\n\u003cli\u003eRAM upgrade to 1 TB (16x 64 GB) or 2 TB (32x 64 GB) — board supports 32 DIMM slots\u003c\/li\u003e\n\u003cli\u003eFull 24U rack cabinet + online UPS 5 kVA\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003c\/div\u003e","brand":"Kentino s.r.o.","offers":[{"title":"Default Title","offer_id":52940202410312,"sku":null,"price":32280.0,"currency_code":"EUR","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0843\/5479\/3800\/files\/kentino-ai-server-4-gpu-topdown_6b2c51b2-25c1-479d-929a-29eebe60e5ef.jpg?v=1776940959","url":"https:\/\/kentino.com\/zh\/products\/k-ai-192-romedual-4090-5288tops-8-rtx-4090-dual-epyc-milan","provider":"Kentino","version":"1.0","type":"link"}