{"title":"AI Servers","description":"\u003cp\u003eOur state-of-the-art AI servers are engineered to deliver exceptional performance for the most demanding computational tasks in artificial intelligence and high-performance computing. Designed with scalability and versatility in mind, these systems leverage cutting-edge GPU technology to provide massive parallel processing power. Our servers offer a range of configurations to suit various needs, from edge computing to large-scale data centers. With powerful multi-core processors, abundant high-speed memory, and fast storage options, these machines excel in AI model training, inference, complex simulations, and data-intensive applications. Whether you're pushing the boundaries of machine learning research or deploying mission-critical AI solutions, our servers provide the robust infrastructure needed to drive innovation and achieve breakthrough results in the ever-evolving field of artificial intelligence.\u003c\/p\u003e","products":[{"product_id":"instruct-l-12-gpu-server","title":"INSTRUCT L40 12 GPU Server (Legacy)","description":"\u003cdiv style=\"background:#dc2626;color:#fff;padding:20px;border-radius:8px;margin-bottom:24px;border-left:6px solid #991b1b\"\u003e\n\u003cp style=\"font-size:18px;font-weight:700;margin:0 0 8px 0\"\u003eThis product listing is kept for reference only.\u003c\/p\u003e\n\u003cp style=\"margin:0\"\u003eThis server has been replaced by the new \u003cstrong\u003eK-AI product line\u003c\/strong\u003e. For the current equivalent or an upgraded configuration, please see \u003ca href=\"\/fi\/collections\/ai-servers\" style=\"color:#fca5a5;text-decoration:underline\"\u003eour AI Servers collection\u003c\/a\u003e.\u003c\/p\u003e\n\u003cp style=\"margin:8px 0 0 0\"\u003e\u003cstrong\u003eRecommended replacement:\u003c\/strong\u003e \u003ca href=\"\/fi\/collections\/ai-servers\" style=\"color:#fca5a5;text-decoration:underline\"\u003eK-AI 288 Rome L40\u003c\/a\u003e (6x L40, passive) or \u003ca href=\"\/fi\/collections\/ai-servers\" style=\"color:#fca5a5;text-decoration:underline\"\u003eK-AI 576 Genoa RTXPro6000\u003c\/a\u003e (6x RTX Pro 6000, 576 GB ECC VRAM)\u003c\/p\u003e\n\u003c\/div\u003e\n\u003ch2 class=\"font-600 text-xl font-bold\"\u003e⚠️ LEGACY PRODUCT - DEPRECATED CONFIGURATION\u003c\/h2\u003e\n\u003ch2 class=\"font-600 text-xl font-bold\"\u003e\u003c\/h2\u003e\n\u003ch2 class=\"font-600 text-xl font-bold\"\u003eThis 12 GPU multi-case server configuration is no longer in active production due to reliability issues with multi-case interconnects. Listed for reference and historical purposes only. For current production AI builds, please see our 8 GPU systems with PCIe-based GPUs (RTX 5090, RTX Pro 6000 Blackwell, L40, L4, Intel Arc Pro B70, AMD configurations) or contact us for a custom configuration.\u003c\/h2\u003e\n\u003ch2 class=\"font-600 text-xl font-bold\"\u003e\u003c\/h2\u003e\n\u003ch2 class=\"font-600 text-xl font-bold\"\u003eSpecifications\u003c\/h2\u003e\n\u003cul class=\"-mt-1 list-disc space-y-2 pl-8\"\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003e\n\u003cstrong\u003eGPU:\u003c\/strong\u003e 12x NVIDIA L40 48GB VRAM (576GB total)\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003e\n\u003cstrong\u003eMotherboard:\u003c\/strong\u003e ASRock Rack ROMED8-2T\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003e\n\u003cstrong\u003eCPU:\u003c\/strong\u003e AMD EPYC 7713\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003e\n\u003cstrong\u003eRAM:\u003c\/strong\u003e 1024GB CT128G4ZFJ426S\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003e\n\u003cstrong\u003ePower Supply: 4x AX1600i\u003c\/strong\u003e 6400W\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003e\n\u003cstrong\u003eCase:\u003c\/strong\u003e Non-standard U8 Rack Mount\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003e\n\u003cstrong\u003eStorage:\u003c\/strong\u003e\n\u003cul class=\"-mt-1 list-disc space-y-2 pl-8\"\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003e4TB NVMe SSD\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003e500GB SATA Drive\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003ch2 class=\"font-600 text-xl font-bold\"\u003eKey Features\u003c\/h2\u003e\n\u003col class=\"-mt-1 list-decimal space-y-2 pl-8\"\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003eHigh-Performance Computing: Equipped with 12 powerful NVIDIA L40 GPUs, providing an impressive 576GB of VRAM for intensive AI and machine learning tasks.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003eServer-Grade Components: Features the reliable ASRock Rack ROMED8-2T motherboard and a top-tier AMD EPYC 7713 CPU for exceptional processing power.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003eMassive Memory: 1024GB of high-speed RAM ensures smooth multitasking and efficient data processing for even the most demanding workloads.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003eRobust Power Supply: Quad power supply setup with 4x AX1600i 1600W units, ensuring stable and ample power delivery under heavy loads.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003eExpandable Storage: Comes with a fast 4TB NVMe SSD for primary storage and an additional 500GB SATA drive for extra capacity.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003eCompact Design: Housed in a non-standard U8 rack mount case, optimizing space efficiency while maintaining performance.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003eScalable Configuration: Supports up to 4 nodes per rack, allowing for impressive computational density and scalability.\u003c\/li\u003e\n\u003c\/ol\u003e\n\u003ch2 class=\"font-600 text-xl font-bold\"\u003eIdeal Use Cases\u003c\/h2\u003e\n\u003cul class=\"-mt-1 list-disc space-y-2 pl-8\"\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003eDeep Learning and AI Model Training\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003eHigh-Performance Computing (HPC) Applications\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003eLarge-Scale Data Analysis and Visualization\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003eScientific Simulations and Research\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003eRendering and 3D Modeling\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003ch2 class=\"font-600 text-xl font-bold\"\u003eSpecial Notes\u003c\/h2\u003e\n\u003cul class=\"-mt-1 list-disc space-y-2 pl-8\"\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003eNon-Standard Case: This server utilizes a custom U8 case, providing a more compact form factor compared to traditional server cases. This design allows for improved space efficiency in data centers and server rooms.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003eMulti-Node Configuration: The rack design supports up to 4 nodes per rack, enabling a highly dense and powerful computing cluster. This configuration is ideal for organizations requiring massive parallel processing capabilities or looking to maximize computational power per square foot.\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003cp class=\"whitespace-pre-wrap break-words\"\u003eThis INSTRUCT12 GPU Server is the ultimate solution for organizations and researchers requiring extreme computational power in a space-efficient design. With its massive GPU capacity, server-grade components, and the ability to house up to 4 nodes per rack, it's built to tackle the most complex calculations and data processing tasks while optimizing data center space utilization.\u003c\/p\u003e\n\u003cp class=\"whitespace-pre-wrap break-words\"\u003eDelivery 2 - 6 weeks \u003c\/p\u003e","brand":"Kentino","offers":[{"title":"Default Title","offer_id":49061507334472,"sku":"","price":474670.68,"currency_code":"EUR","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0843\/5479\/3800\/files\/front_02.png?v=1737993966"},{"product_id":"instruct-12-h100-gpu-server","title":"INSTRUCT H100 12 GPU Server (Legacy)","description":"\u003cdiv style=\"background:#dc2626;color:#fff;padding:20px;border-radius:8px;margin-bottom:24px;border-left:6px solid #991b1b\"\u003e\n\u003cp style=\"font-size:18px;font-weight:700;margin:0 0 8px 0\"\u003eThis product listing is kept for reference only.\u003c\/p\u003e\n\u003cp style=\"margin:0\"\u003eThis server has been replaced by the new \u003cstrong\u003eK-AI product line\u003c\/strong\u003e. For the current equivalent or an upgraded configuration, please see \u003ca href=\"\/fi\/collections\/ai-servers\" style=\"color:#fca5a5;text-decoration:underline\"\u003eour AI Servers collection\u003c\/a\u003e.\u003c\/p\u003e\n\u003cp style=\"margin:8px 0 0 0\"\u003e\u003cstrong\u003eRecommended replacement:\u003c\/strong\u003e Discontinued. H100 builds are no longer offered. See our \u003ca href=\"\/fi\/collections\/ai-servers\" style=\"color:#fca5a5;text-decoration:underline\"\u003eRTX Pro 6000 and L40 multi-GPU servers\u003c\/a\u003e for comparable enterprise VRAM density.\u003c\/p\u003e\n\u003c\/div\u003e\n\u003ch2 class=\"font-600 text-xl font-bold\"\u003e⚠️ LEGACY PRODUCT - DEPRECATED CONFIGURATION\u003c\/h2\u003e\n\u003ch2 class=\"font-600 text-xl font-bold\"\u003e\u003c\/h2\u003e\n\u003ch2 class=\"font-600 text-xl font-bold\"\u003eThis 12 GPU multi-case server configuration is no longer in active production due to reliability issues with multi-case interconnects. Listed for reference and historical purposes only. For current production AI builds, please see our 8 GPU systems or contact us for a custom configuration with PCIe-based GPUs (RTX 5090, RTX Pro 6000 Blackwell, L40, L4, Intel Arc Pro B70, AMD configurations).\u003c\/h2\u003e\n\u003ch2 class=\"font-600 text-xl font-bold\"\u003e\u003c\/h2\u003e\n\u003ch2 class=\"font-600 text-xl font-bold\"\u003eSpecifications\u003c\/h2\u003e\n\u003cul class=\"-mt-1 list-disc space-y-2 pl-8\"\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003e\n\u003cstrong\u003eGPU:\u003c\/strong\u003e 12x NVIDIA H100 80GB (960 GB VRAM total)\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003e\n\u003cstrong\u003eMotherboard:\u003c\/strong\u003e ASRock Rack ROME2D16-2T\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003e\n\u003cstrong\u003eCPU:\u003c\/strong\u003e 2x AMD EPYC 7713\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003e\n\u003cstrong\u003eRAM:\u003c\/strong\u003e 2048GB CT128G4ZFJ426S\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003e\n\u003cstrong\u003eGPU-Motherboard Connection:\u003c\/strong\u003e RYSER PCIe 4.0 x16 Cable\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003e\n\u003cstrong\u003ePower Supply:\u003c\/strong\u003e 4x AX1600i 1600W\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003e\n\u003cstrong\u003eCase:\u003c\/strong\u003e 24U Rack Mount\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003e\n\u003cstrong\u003eStorage:\u003c\/strong\u003e\n\u003cul class=\"-mt-1 list-disc space-y-2 pl-8\"\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003e2TB NVMe SSD\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003e500GB SATA Drive\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003ch2 class=\"font-600 text-xl font-bold\"\u003eKey Features\u003c\/h2\u003e\n\u003col class=\"-mt-1 list-decimal space-y-2 pl-8\"\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003eUnparalleled GPU Performance: Equipped with 12 NVIDIA H100 GPUs, each with 80GB VRAM, providing a massive 960 GB of total VRAM for the most demanding AI, machine learning, and HPC workloads.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003eDual-CPU Power: Features two top-tier AMD EPYC 7713 CPUs, delivering exceptional multi-threaded performance for complex computations and data processing tasks.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003eServer-Grade Components: Utilizes the high-performance ASRock Rack ROME2D16-2T motherboard, designed for maximum reliability and efficiency in data center environments.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003eMassive Memory Capacity: 2048GB (2TB) of high-speed RAM ensures seamless multitasking and efficient data processing for even the most memory-intensive applications.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003eHigh-Speed GPU Integration: Employs the RYSER PCIe 4.0 x16 cable for lightning-fast, full-bandwidth connection between the GPUs and the motherboard, ensuring maximum performance and data transfer speeds.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003eRobust Power Supply: Four AX1600i 1600W units provide ample and stable power delivery to support the high-performance components under extreme loads.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003eExpandable Storage: Comes with a fast 2TB NVMe SSD for primary storage and an additional 500GB SATA drive for extra capacity.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003eProfessional-Grade Cooling: Housed in a spacious 24U rack mount case, providing optimal airflow and thermal management for sustained high-performance operation.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003eScalable Configuration: Designed for data centers and high-performance computing environments, supporting clustered configurations for massive computational power.\u003c\/li\u003e\n\u003c\/ol\u003e\n\u003ch2 class=\"font-600 text-xl font-bold\"\u003eIdeal Use Cases\u003c\/h2\u003e\n\u003cul class=\"-mt-1 list-disc space-y-2 pl-8\"\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003eLarge-Scale AI Model Training and Inference\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003eHigh-Performance Computing (HPC) Applications\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003eAdvanced Scientific Simulations and Research\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003eReal-Time Big Data Analytics\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003eComplex Rendering and Visualization Tasks\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003eQuantum Computing Simulations\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003eFinancial Modeling and Risk Analysis\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003eGenomics and Bioinformatics\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003eClimate Modeling and Weather Prediction\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003ch2 class=\"font-600 text-xl font-bold\"\u003eSpecial Notes\u003c\/h2\u003e\n\u003cul class=\"-mt-1 list-disc space-y-2 pl-8\"\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003eUnprecedented Computational Power: With 12 NVIDIA H100 GPUs and dual AMD EPYC CPUs, this node configuration represents the pinnacle of computational capabilities, suitable for the most demanding AI and HPC workloads.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003eMassive Memory Resources: The combination of 960 GB GPU VRAM and 2048 GB system RAM provides extraordinary capacity for handling the largest AI models and most data-intensive applications.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003ePCIe 4.0 Advantage: The RYSER PCIe 4.0 x16 cable ensures that each GPU can operate at full bandwidth, maximizing data throughput and minimizing latency.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\"\u003eFuture-Proof Investment: This node is designed to handle not just current AI and HPC challenges, but also future advancements in these rapidly evolving fields.\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003cp class=\"whitespace-pre-wrap break-words\"\u003eThe INSTRUCT H12 Node Configuration represents the absolute cutting edge in computational power. It's engineered for organizations and researchers pushing the boundaries of what's possible in AI, machine learning, and high-performance computing. With its state-of-the-art H100 GPUs, dual EPYC CPUs, and robust design, it's built to tackle the most complex and demanding computational tasks in the world.\u003c\/p\u003e","brand":"Kentino","offers":[{"title":"Default Title","offer_id":49061653381448,"sku":"","price":474670.68,"currency_code":"EUR","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0843\/5479\/3800\/files\/front_02.png?v=1737993966"},{"product_id":"inference-70b-l40-ai-server","title":"Inference 70B L40 Ai server","description":"\u003cdiv style=\"background:#dc2626;color:#fff;padding:20px;border-radius:8px;margin-bottom:24px;border-left:6px solid #991b1b\"\u003e\n\u003cp style=\"font-size:18px;font-weight:700;margin:0 0 8px 0\"\u003eThis product listing is kept for reference only.\u003c\/p\u003e\n\u003cp style=\"margin:0\"\u003eThis server has been replaced by the new \u003cstrong\u003eK-AI product line\u003c\/strong\u003e. For the current equivalent or an upgraded configuration, please see \u003ca href=\"\/fi\/collections\/ai-servers\" style=\"color:#fca5a5;text-decoration:underline\"\u003eour AI Servers collection\u003c\/a\u003e.\u003c\/p\u003e\n\u003cp style=\"margin:8px 0 0 0\"\u003e\u003cstrong\u003eRecommended replacement:\u003c\/strong\u003e \u003ca href=\"\/fi\/collections\/ai-servers\" style=\"color:#fca5a5;text-decoration:underline\"\u003eK-AI 192 Rome L40 1448TOPS\u003c\/a\u003e (4x NVIDIA L40, same platform, updated build)\u003c\/p\u003e\n\u003c\/div\u003e\n\u003ch1 class=\"font-600 text-2xl font-bold\" level=\"1\"\u003e70B L40 Computer\u003c\/h1\u003e\n\u003ch2 class=\"font-600 text-xl font-bold\" level=\"2\"\u003eSpecifications\u003c\/h2\u003e\n\u003cul class=\"-mt-1 list-disc space-y-2 pl-8\" depth=\"0\"\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"0\"\u003e\n\u003cstrong\u003eGPU:\u003c\/strong\u003e 6x NVIDIA L40 (288 GB VRAM total)\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"1\"\u003e\n\u003cstrong\u003eMotherboard:\u003c\/strong\u003e ASRock Rack ROMED8-2T\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"2\"\u003e\n\u003cstrong\u003eCPU:\u003c\/strong\u003e AMD EPYC 7542\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"3\"\u003e\n\u003cstrong\u003eRAM:\u003c\/strong\u003e 512GB SK Hynix 2666MHz REG ECC DDR4 LRDIMM (8 x 64GB)\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"4\"\u003e\n\u003cstrong\u003eGPU-Motherboard Connection:\u003c\/strong\u003e RYSER PCIe 4.0 x16 Cable\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"5\"\u003e\n\u003cstrong\u003ePower Supply:\u003c\/strong\u003e 2x AX1600i 1000W\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"6\"\u003e\n\u003cstrong\u003eCase:\u003c\/strong\u003e 4U Rack Mount\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"7\"\u003e\n\u003cstrong\u003eStorage:\u003c\/strong\u003e\n\u003cul class=\"-mt-1 list-disc space-y-2 pl-8\" depth=\"1\"\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"0\"\u003e2TB NVMe SSD\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"1\"\u003e500GB SATA Drive\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003ch2 class=\"font-600 text-xl font-bold\" level=\"2\"\u003eKey Features\u003c\/h2\u003e\n\u003col class=\"-mt-1 list-decimal space-y-2 pl-8\" depth=\"0\"\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"0\"\u003eHigh-Performance GPU Compute: Equipped with 6 NVIDIA L40 GPUs, providing a total of 288 GB VRAM for demanding AI, machine learning, and visualization workloads.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"1\"\u003eServer-Grade Components: Features the reliable ASRock Rack ROMED8-2T motherboard and a powerful AMD EPYC 7542 CPU for exceptional processing capabilities.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"2\"\u003eAmple Memory: 512GB of high-speed SK Hynix DDR4 RAM ensures smooth multitasking and efficient data processing for complex computations.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"3\"\u003eHigh-Speed GPU Integration: Utilizes the RYSER PCIe 4.0 x16 cable for fast, full-bandwidth connection between the GPUs and the motherboard, ensuring optimal performance and data transfer speeds.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"4\"\u003eRobust Power Supply: Dual AX1600i 1000W units provide stable and ample power delivery to support the high-performance components under heavy loads.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"5\"\u003eExpandable Storage: Comes with a fast 2TB NVMe SSD for primary storage and an additional 500GB SATA drive for extra capacity.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"6\"\u003eProfessional-Grade Cooling: Housed in a spacious 24U rack mount case, providing optimal airflow and thermal management for sustained high-performance operation.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"7\"\u003eVersatile Configuration: Designed for a wide range of high-performance computing tasks, from AI and machine learning to professional visualization and rendering.\u003c\/li\u003e\n\u003c\/ol\u003e\n\u003ch2 class=\"font-600 text-xl font-bold\" level=\"2\"\u003eIdeal Use Cases\u003c\/h2\u003e\n\u003cul class=\"-mt-1 list-disc space-y-2 pl-8\" depth=\"0\"\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"0\"\u003eLarge Language Model Inference (e.g., 70B parameter models)\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"1\"\u003eAI and Machine Learning Research\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"2\"\u003eData Analytics and Visualization\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"3\"\u003eProfessional 3D Rendering and Animation\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"4\"\u003eScientific Simulations\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"5\"\u003eHigh-Performance Computing (HPC) Applications\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"6\"\u003eComputer Vision and Image Processing\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"7\"\u003eFinancial Modeling and Risk Analysis\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003ch2 class=\"font-600 text-xl font-bold\" level=\"2\"\u003eSpecial Notes\u003c\/h2\u003e\n\u003cul class=\"-mt-1 list-disc space-y-2 pl-8\" depth=\"0\"\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"0\"\u003eOptimized for 70B Models: With 288 GB of total GPU VRAM, this system is specifically designed to handle large language models with up to 70 billion parameters, making it ideal for cutting-edge AI research and applications.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"1\"\u003eNVIDIA L40 Advantage: The L40 GPUs offer a balance of compute performance and memory, suitable for a wide range of AI, HPC, and professional visualization workloads.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"2\"\u003ePCIe 4.0 Performance: The RYSER PCIe 4.0 x16 cable ensures that each GPU can operate at full bandwidth, maximizing data throughput and minimizing latency.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"3\"\u003eScalable Design: While optimized for 70B parameter models, this system can be easily scaled or clustered for even larger workloads or multi-user environments.\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003cp class=\"whitespace-pre-wrap break-words\"\u003eThe 70B L40 Computer represents a powerful and versatile solution for organizations and researchers working with large AI models, particularly in the realm of natural language processing and generation. Its balanced configuration of NVIDIA L40 GPUs, AMD EPYC CPU, and high-speed memory makes it suitable for a wide range of high-performance computing tasks beyond AI, including scientific simulations, data analytics, and professional visualization.\u003c\/p\u003e\n\u003cp class=\"whitespace-pre-wrap break-words\"\u003eDelivery 2 - 6 weeks \u003c\/p\u003e","brand":"Kentino s.r.o.","offers":[{"title":"Default Title","offer_id":49061684216136,"sku":"","price":41646.27,"currency_code":"EUR","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0843\/5479\/3800\/files\/front_02.png?v=1737993966"},{"product_id":"inference-35b-rtx4090-ai-server","title":"Inference 35B RTX4090 AI Server","description":"\u003cdiv style=\"background:#dc2626;color:#fff;padding:20px;border-radius:8px;margin-bottom:24px;border-left:6px solid #991b1b\"\u003e\n\u003cp style=\"font-size:18px;font-weight:700;margin:0 0 8px 0\"\u003eThis product listing is kept for reference only.\u003c\/p\u003e\n\u003cp style=\"margin:0\"\u003eThis server has been replaced by the new \u003cstrong\u003eK-AI product line\u003c\/strong\u003e. For the current equivalent or an upgraded configuration, please see \u003ca href=\"\/fi\/collections\/ai-servers\" style=\"color:#fca5a5;text-decoration:underline\"\u003eour AI Servers collection\u003c\/a\u003e.\u003c\/p\u003e\n\u003cp style=\"margin:8px 0 0 0\"\u003e\u003cstrong\u003eRecommended replacement:\u003c\/strong\u003e \u003ca href=\"\/fi\/collections\/ai-servers\" style=\"color:#fca5a5;text-decoration:underline\"\u003eK-AI 96 Rome 4090 2644TOPS\u003c\/a\u003e (4x RTX 4090, same platform, updated build)\u003c\/p\u003e\n\u003c\/div\u003e\n\u003ch2 class=\"font-600 text-xl font-bold\" level=\"2\"\u003eSpecifications\u003c\/h2\u003e\n\u003cul class=\"-mt-1 list-disc space-y-2 pl-8\" depth=\"0\"\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"0\"\u003e\n\u003cstrong\u003eGPU:\u003c\/strong\u003e 4x NVIDIA RTX 4090 (96 GB VRAM total)\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"1\"\u003e\n\u003cstrong\u003eMotherboard:\u003c\/strong\u003e ASRock Rack ROMED8-2T\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"2\"\u003e\n\u003cstrong\u003eCPU:\u003c\/strong\u003e AMD EPYC 7542\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"3\"\u003e\n\u003cstrong\u003eRAM:\u003c\/strong\u003e 256GB A-Tech DDR4-2666 ECC REG RDIMM (8 x 32GB)\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"4\"\u003e\n\u003cstrong\u003eGPU-Motherboard Connection:\u003c\/strong\u003e RYSER PCIe 4.0 x16 Cable\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"5\"\u003e\n\u003cstrong\u003ePower Supply:\u003c\/strong\u003e 2x LL2000FC 4 Kw\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"6\"\u003e\n\u003cstrong\u003eCase:\u003c\/strong\u003e 24U Rack Mount\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"7\"\u003e\n\u003cstrong\u003eStorage:\u003c\/strong\u003e\n\u003cul class=\"-mt-1 list-disc space-y-2 pl-8\" depth=\"1\"\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"0\"\u003e2TB NVMe SSD\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"1\"\u003e500GB SATA Drive\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003ch2 class=\"font-600 text-xl font-bold\" level=\"2\"\u003eKey Features\u003c\/h2\u003e\n\u003col class=\"-mt-1 list-decimal space-y-2 pl-8\" depth=\"0\"\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"0\"\u003eOptimized for AI Inference: Equipped with 4 NVIDIA RTX 4090 GPUs, providing a total of 96 GB VRAM, specifically configured for high-performance AI inference tasks, including large language models up to 70B parameters.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"1\"\u003eServer-Grade Components: Features the reliable ASRock Rack ROMED8-2T motherboard and a powerful AMD EPYC 7542 CPU for exceptional processing capabilities.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"2\"\u003eHigh-Speed Memory: 256GB of A-Tech DDR4-2666 ECC REG RDIMM ensures reliable and efficient data processing for complex AI workloads.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"3\"\u003eFast GPU Integration: Utilizes the RYSER PCIe 4.0 x16 cable for rapid, full-bandwidth connection between the GPUs and the motherboard, maximizing inference performance.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"4\"\u003eRobust Power Supply: An AX1600i 1500W unit provides stable and ample power delivery to support the high-performance components under intensive inference loads.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"5\"\u003eEfficient Storage: Comes with a fast 2TB NVMe SSD for quick data access and an additional 500GB SATA drive for extra capacity.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"6\"\u003eProfessional-Grade Cooling: Housed in a spacious 24U rack mount case, ensuring optimal thermal management for sustained high-performance operation.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"7\"\u003eInference-Focused Design: Optimized for running large AI models efficiently, making it ideal for organizations deploying AI services at scale.\u003c\/li\u003e\n\u003c\/ol\u003e\n\u003ch2 class=\"font-600 text-xl font-bold\" level=\"2\"\u003eIdeal Use Cases\u003c\/h2\u003e\n\u003cul class=\"-mt-1 list-disc space-y-2 pl-8\" depth=\"0\"\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"0\"\u003eLarge Language Model Inference (up to 70B parameters)\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"1\"\u003eReal-time AI-powered Applications\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"2\"\u003eNatural Language Processing Services\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"3\"\u003eComputer Vision and Image Recognition\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"4\"\u003eAI-driven Customer Service and Chatbots\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"5\"\u003eRecommendation Systems\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"6\"\u003eFinancial Modeling and Predictions\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"7\"\u003eScientific Data Analysis\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003ch2 class=\"font-600 text-xl font-bold\" level=\"2\"\u003eSpecial Notes\u003c\/h2\u003e\n\u003cul class=\"-mt-1 list-disc space-y-2 pl-8\" depth=\"0\"\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"0\"\u003eRTX 4090 Advantage: Leveraging the latest NVIDIA RTX 4090 GPUs, this server offers exceptional performance for AI inference tasks, combining high compute power with advanced features like Tensor Cores.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"1\"\u003eOptimized for 70B Models: With 96 GB of total GPU VRAM, this system is specifically designed to handle large language models with up to 70 billion parameters, making it ideal for deploying state-of-the-art AI services.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"2\"\u003eInference Efficiency: The combination of RTX 4090 GPUs and the AMD EPYC CPU allows for highly efficient inference, enabling high throughput and low latency for AI applications.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"3\"\u003eScalable Solution: While optimized for 70B parameter models, this server can be easily integrated into larger clusters for even more demanding workloads or multi-model deployments.\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003cp class=\"whitespace-pre-wrap break-words\"\u003eThe Inference 70B RTX4090 AI Server is a cutting-edge solution for organizations looking to deploy large AI models efficiently. It strikes an optimal balance between performance and cost, making it an excellent choice for businesses and research institutions that need to run complex AI models in production environments. Whether you're deploying language models, computer vision systems, or other AI applications, this server provides the power and reliability needed for seamless AI inference at scale.\u003c\/p\u003e\n\u003cp class=\"whitespace-pre-wrap break-words\"\u003eDelivery 2 - 6 weeks \u003c\/p\u003e","brand":"Kentino","offers":[{"title":"Default Title","offer_id":49061697913160,"sku":"","price":14909.0,"currency_code":"EUR","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0843\/5479\/3800\/files\/front_02.png?v=1737993966"},{"product_id":"inference-8b-2-gpu-ai-server","title":"Inference 8B 2 GPU 4090 AI Server","description":"\u003cdiv style=\"background:#dc2626;color:#fff;padding:20px;border-radius:8px;margin-bottom:24px;border-left:6px solid #991b1b\"\u003e\n\u003cp style=\"font-size:18px;font-weight:700;margin:0 0 8px 0\"\u003eThis product listing is kept for reference only.\u003c\/p\u003e\n\u003cp style=\"margin:0\"\u003eThis server has been replaced by the new \u003cstrong\u003eK-AI product line\u003c\/strong\u003e. For the current equivalent or an upgraded configuration, please see \u003ca href=\"\/fi\/collections\/ai-servers\" style=\"color:#fca5a5;text-decoration:underline\"\u003eour AI Servers collection\u003c\/a\u003e.\u003c\/p\u003e\n\u003cp style=\"margin:8px 0 0 0\"\u003e\u003cstrong\u003eRecommended replacement:\u003c\/strong\u003e \u003ca href=\"\/fi\/collections\/ai-servers\" style=\"color:#fca5a5;text-decoration:underline\"\u003eK-AI 48 Rome 4090 1322TOPS\u003c\/a\u003e (2x RTX 4090, same platform, updated build)\u003c\/p\u003e\n\u003c\/div\u003e\n\u003ch2 class=\"font-600 text-xl font-bold\" level=\"2\"\u003eSpecifications\u003c\/h2\u003e\n\u003cul class=\"-mt-1 list-disc space-y-2 pl-8\" depth=\"0\"\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"0\"\u003e\n\u003cstrong\u003eGPU:\u003c\/strong\u003e 2x NVIDIA RTX 4090 (48 GB VRAM total)\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"1\"\u003e\n\u003cstrong\u003eMotherboard:\u003c\/strong\u003e ASRock Rack ROMED8-2T\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"2\"\u003e\n\u003cstrong\u003eCPU:\u003c\/strong\u003e AMD EPYC 7542\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"3\"\u003e\n\u003cstrong\u003eRAM:\u003c\/strong\u003e 128GB A-Tech DDR4-2666 ECC REG RDIMM (8 x 16GB)\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"4\"\u003e\n\u003cstrong\u003eGPU-Motherboard Connection:\u003c\/strong\u003e PCIe 4.0 x16\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"5\"\u003e\n\u003cstrong\u003ePower Supply:\u003c\/strong\u003e AX1600i 1500W\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"6\"\u003e\n\u003cstrong\u003eCase:\u003c\/strong\u003e 4U Rack Mount\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"7\"\u003e\n\u003cstrong\u003eStorage:\u003c\/strong\u003e\n\u003cul class=\"-mt-1 list-disc space-y-2 pl-8\" depth=\"1\"\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"0\"\u003e2TB NVMe SSD\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"1\"\u003e500GB SATA Drive\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003ch2 class=\"font-600 text-xl font-bold\" level=\"2\"\u003eKey Features\u003c\/h2\u003e\n\u003col class=\"-mt-1 list-decimal space-y-2 pl-8\" depth=\"0\"\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"0\"\u003eEfficient AI Inference: Equipped with 2 NVIDIA RTX 4090 GPUs, providing a total of 48 GB VRAM, optimized for running AI models up to 8B parameters with high efficiency.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"1\"\u003eServer-Grade Components: Features the reliable ASRock Rack ROMED8-2T motherboard and a powerful AMD EPYC 7542 CPU for robust processing capabilities.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"2\"\u003eBalanced Memory Configuration: 128GB of A-Tech DDR4-2666 ECC REG RDIMM ensures reliable and efficient data processing for AI workloads.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"3\"\u003eHigh-Speed Connectivity: Utilizes PCIe 4.0 x16 for rapid connection between the GPUs and the motherboard, maximizing inference performance.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"4\"\u003eReliable Power Supply: An AX1600i 1500W unit provides stable and ample power delivery to support the high-performance components under intensive inference loads.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"5\"\u003eEfficient Storage: Comes with a fast 2TB NVMe SSD for quick data access and an additional 500GB SATA drive for extra capacity.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"6\"\u003eProfessional-Grade Cooling: Housed in a spacious 24U rack mount case, ensuring optimal thermal management for sustained high-performance operation.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"7\"\u003eCost-Effective Inference Solution: Optimized for running medium-sized AI models efficiently, making it ideal for organizations deploying AI services with a focus on cost-effectiveness.\u003c\/li\u003e\n\u003c\/ol\u003e\n\u003ch2 class=\"font-600 text-xl font-bold\" level=\"2\"\u003eIdeal Use Cases\u003c\/h2\u003e\n\u003cul class=\"-mt-1 list-disc space-y-2 pl-8\" depth=\"0\"\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"0\"\u003eMedium-sized Language Model Inference (up to 8B parameters)\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"1\"\u003eReal-time AI-powered Applications\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"2\"\u003eNatural Language Processing Services\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"3\"\u003eComputer Vision and Image Recognition\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"4\"\u003eAI-driven Customer Service and Chatbots\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"5\"\u003eRecommendation Systems\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"6\"\u003eFinancial Modeling and Predictions\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"7\"\u003eEdge AI Deployments\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003ch2 class=\"font-600 text-xl font-bold\" level=\"2\"\u003eSpecial Notes\u003c\/h2\u003e\n\u003cul class=\"-mt-1 list-disc space-y-2 pl-8\" depth=\"0\"\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"0\"\u003eRTX 4090 Efficiency: Leveraging two NVIDIA RTX 4090 GPUs, this server offers exceptional performance for AI inference tasks, providing a balance between power and cost-effectiveness.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"1\"\u003eOptimized for 8B Models: With 48 GB of total GPU VRAM, this system is specifically designed to handle language models and other AI applications with up to 8 billion parameters, making it ideal for deploying a wide range of modern AI services.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"2\"\u003eInference Performance: The combination of RTX 4090 GPUs and the AMD EPYC CPU allows for highly efficient inference, enabling high throughput and low latency for AI applications while maintaining a more accessible price point.\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"3\"\u003eScalable and Flexible: While optimized for 8B parameter models, this server can be easily integrated into larger clusters or used as a standalone solution for various AI deployment scenarios.\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003cp class=\"whitespace-pre-wrap break-words\"\u003eThe Inference 8B 2 GPU AI Server is a well-balanced solution for organizations looking to deploy medium-sized AI models efficiently and cost-effectively. It provides an excellent balance between performance and investment, making it an ideal choice for businesses and research institutions that need to run modern AI models in production environments without the overhead of larger, more expensive systems. This server is perfect for deploying a wide range of language models, computer vision systems, and other AI applications that require robust performance but don't necessarily need the capacity for the largest models available.\u003c\/p\u003e\n\u003cp class=\"whitespace-pre-wrap break-words\"\u003eDelivery 2 - 6 weeks \u003c\/p\u003e","brand":"Kentino s.r.o.","offers":[{"title":"Default Title","offer_id":49061811388744,"sku":"","price":10909.0,"currency_code":"EUR","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0843\/5479\/3800\/files\/front_02.png?v=1737993966"},{"product_id":"pcpraha-epic-elite","title":"PCPRAHA Epic Elite","description":"\u003cp\u003e \u003c\/p\u003e\n\u003cdiv class=\"product-description\"\u003e\n\u003ch3\u003ePCPRAHA Epic Elite: Powerful AI Server for Your Most Demanding Computational Tasks\u003c\/h3\u003e\n\u003cp\u003e\u003cstrong\u003ePCPRAHA Epic Elite\u003c\/strong\u003e is a high-performance server designed to meet the highest computational power requirements. It's ideal for professionals in artificial intelligence (AI), machine learning (ML), data analysis, graphic rendering, and scientific research. This server will provide your company or project with all the necessary resources to achieve impressive results.\u003c\/p\u003e\n\u003ch4\u003ePCPRAHA Epic Elite Applications\u003c\/h4\u003e\n\u003cp\u003e\u003cstrong\u003e1. Artificial Intelligence and Machine Learning:\u003c\/strong\u003e\u003cbr\u003eThe server is equipped with an AMD EPYC™ 9754 processor and two MSI GeForce RTX 4090 SUPRIM X graphics cards, making it ideal for deep learning, neural networks, and other AI processing. The massive memory (384 GB DDR5-4800 ECC) enables simultaneous processing and analysis of large data volumes, accelerating model training and reducing system response time.\u003c\/p\u003e\n\u003cp\u003e\u003cstrong\u003e2. Rendering and Graphics Processing:\u003c\/strong\u003e\u003cbr\u003eTwo RTX 4090 graphics cards provide the highest level of performance when working with 3D graphics, animation, and video. The server can quickly process complex graphic projects, making it an ideal choice for studios working with virtual reality (VR), architectural renders, and other graphic projects.\u003c\/p\u003e\n\u003cp\u003e\u003cstrong\u003e3. Big Data Analysis:\u003c\/strong\u003e\u003cbr\u003eThis server is capable of working with enormous volumes of data, ideal for companies involved in analytics, predictive modeling, and real-time data processing. It significantly accelerates work and enables faster data-driven decision making.\u003c\/p\u003e\n\u003cp\u003e\u003cstrong\u003e4. Cloud Computing and Virtualization:\u003c\/strong\u003e\u003cbr\u003eWith this server, you can develop virtual machines, run cloud applications, and support infrastructure for scalable online services. It's an excellent choice for companies looking to create flexible and reliable cloud solutions.\u003c\/p\u003e\n\u003ch4\u003eProfit Potential\u003c\/h4\u003e\n\u003cp\u003e\u003cstrong\u003e1. Computational Power Rental:\u003c\/strong\u003e\u003cbr\u003eYou can rent out PCPRAHA Epic Elite's computational capacity for projects related to AI model training or graphic rendering. In the market, server rental with similar specifications can cost from €5,000 to €10,000 monthly, depending on tasks and rental duration. This server can cover its costs within 6-12 months of active use.\u003c\/p\u003e\n\u003cp\u003e\u003cstrong\u003e2. Custom AI Solution Development:\u003c\/strong\u003e\u003cbr\u003eBy using the server for creating and testing AI solutions, you can provide custom AI model and algorithm development services, opening opportunities for profit from client contracts. These services typically range from €20,000 to €100,000 per project, depending on complexity.\u003c\/p\u003e\n\u003cp\u003e\u003cstrong\u003e3. Custom Graphics Rendering:\u003c\/strong\u003e\u003cbr\u003eIf you're involved in rendering and visualizations, this server will significantly accelerate project processing, allowing you to handle more orders in less time. The average price for a single render ranges from €500 to €5,000, opening substantial business development opportunities.\u003c\/p\u003e\n\u003ch4\u003eKey Technical Specifications:\u003c\/h4\u003e\n\u003cul\u003e\n\u003cli\u003e\n\u003cstrong\u003eMotherboard:\u003c\/strong\u003e ASRock – GENOAD8X-2T\/BCM\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eProcessor:\u003c\/strong\u003e AMD EPYC™ 9754\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eCooling:\u003c\/strong\u003e Supermicro SNK-P0084AP4\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eRAM:\u003c\/strong\u003e 384 GB DDR5-4800 ECC\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eGraphics Cards:\u003c\/strong\u003e 2 x MSI GeForce RTX 4090 SUPRIM X 24G\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eStorage:\u003c\/strong\u003e U.2\/U.3 NVMe Mobile Rack Cage\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003ePower Supply:\u003c\/strong\u003e LX2600W\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eCase:\u003c\/strong\u003e 24″ U4 Rack Mount Case\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003cdiv class=\"additional-features\"\u003e\n\u003ch4\u003eAdditional Features\u003c\/h4\u003e\n\u003cul class=\"features-list\"\u003e\n\u003cli\u003eProfessional-grade ECC memory for enhanced reliability\u003c\/li\u003e\n\u003cli\u003eDual RTX 4090 configuration for maximum GPU performance\u003c\/li\u003e\n\u003cli\u003eEnterprise-class cooling solution\u003c\/li\u003e\n\u003cli\u003eRack-mountable design for data center deployment\u003c\/li\u003e\n\u003cli\u003eHigh-efficiency power supply for demanding workloads\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/div\u003e\n\u003cdiv class=\"warranty-support\"\u003e\n\u003ch4\u003eWarranty \u0026amp; Support\u003c\/h4\u003e\n\u003cul\u003e\n\u003cli\u003eProfessional technical support\u003c\/li\u003e\n\u003cli\u003eExtended warranty options available\u003c\/li\u003e\n\u003cli\u003eInstallation and setup assistance\u003c\/li\u003e\n\u003cli\u003eRemote configuration support\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e","brand":"Pcpraha","offers":[{"title":"Default Title","offer_id":49282672525640,"sku":"","price":12444.0,"currency_code":"EUR","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0843\/5479\/3800\/files\/bluerack.png?v=1737994278"},{"product_id":"hpe-proliant-dl385","title":"HPE ProLiant DL385","description":"\u003cp\u003e \u003c\/p\u003e\n\u003cdiv class=\"product-description\"\u003e\n\u003ch1\u003eHPE ProLiant DL385 Generation 10 CTO Server with NVIDIA A100 GPUs\u003c\/h1\u003e\n\u003cdiv class=\"product-overview\"\u003e\n\u003cp\u003eThis server is equipped with two 32-core AMD EPYC 7502 processors (64 cores, 128 threads) and 512 GB of HPE DDR4 memory. For storage, it features 2x 480GB M.2 NVMe drives and 2x 1.6TB Enterprise SAS SSDs. The server supports an 8-position 2.5\" SFF backplane and is equipped with an HPE Smart Array S100i RAID controller.\u003c\/p\u003e\n\u003cp\u003eIt includes 2x NVIDIA A100 80GB GPU accelerators, making it ideal for AI and machine learning. The server features four integrated GB NICs and two 10GB SFP+ ports. Redundancy is ensured through 2x 800W power supplies, and management is simplified with iLO5.\u003c\/p\u003e\n\u003c\/div\u003e\n\u003cdiv class=\"specifications\"\u003e\n\u003ch2\u003eTechnical Specifications\u003c\/h2\u003e\n\u003ctable class=\"specs-table\"\u003e\n\u003ctbody\u003e\n\u003ctr\u003e\n\u003cth\u003eProcessors\u003c\/th\u003e\n\u003ctd\u003eTwo 32-core AMD EPYC 7502 (2.5 GHz, Max Turbo 3.4 GHz)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003cth\u003eCores \/ Threads\u003c\/th\u003e\n\u003ctd\u003e64 cores \/ 128 logical processors\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003cth\u003eMemory\u003c\/th\u003e\n\u003ctd\u003e512 GB (8x 64GB) HPE DDR4\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003cth\u003ePrimary Storage\u003c\/th\u003e\n\u003ctd\u003e2x 480GB M.2 NVMe\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003cth\u003eSecondary Storage\u003c\/th\u003e\n\u003ctd\u003e2x 1.6 TB Enterprise 2.5\" SAS SSD\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003cth\u003eBackplane\u003c\/th\u003e\n\u003ctd\u003e1x 8Bay 2.5\" SFF (8SFF)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003cth\u003eRAID\u003c\/th\u003e\n\u003ctd\u003eHPE Smart Array S100i SR Gen10 SW RAID + P208i RAID Controller\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003cth\u003eGraphics Units (GPU)\u003c\/th\u003e\n\u003ctd\u003e2x NVIDIA A100 80 GB SXM4 PCIe GPU Accelerator\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003cth\u003ePower Supply\u003c\/th\u003e\n\u003ctd\u003e2x redundant 800W power supply\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003cth\u003eNetwork Connectivity\u003c\/th\u003e\n\u003ctd\u003eQuad Integrated GB NICs + Dual port 10GB SFP+\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003cth\u003eRack Rails\u003c\/th\u003e\n\u003ctd\u003eIncluded\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003cth\u003eManagement\u003c\/th\u003e\n\u003ctd\u003eiLO5 Standard\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003cth\u003eDevice Status\u003c\/th\u003e\n\u003ctd\u003eRefurbished\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003cth\u003eWarranty\u003c\/th\u003e\n\u003ctd\u003e24 months RTB (Return To Base)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003cth\u003eTested with\u003c\/th\u003e\n\u003ctd\u003eWindows Server 2022\/2019, VMware ESXi 7.0, Windows Server 2016\/2019 HPC, Ubuntu, CentOS 7 64-bit, Proxmox VE\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\u003c\/div\u003e\n\u003cdiv class=\"key-features\"\u003e\n\u003ch2\u003eKey Features and Benefits\u003c\/h2\u003e\n\u003cdiv class=\"feature-section\"\u003e\n\u003ch3\u003e1. Performance Potential with NVIDIA A100 GPU\u003c\/h3\u003e\n\u003cul\u003e\n\u003cli\u003e\n\u003cstrong\u003eNVIDIA A100 GPU:\u003c\/strong\u003e Designed for the most demanding computational tasks such as machine learning, deep learning, and big data processing. With 80GB memory per accelerator, they provide high performance and capacity for parallel data processing.\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eAI and ML Optimization:\u003c\/strong\u003e The server efficiently handles AI and ML tasks, crucial for organizations working with extensive datasets requiring fast computations.\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/div\u003e\n\u003cdiv class=\"feature-section\"\u003e\n\u003ch3\u003e2. High Scalability and Flexibility\u003c\/h3\u003e\n\u003cul\u003e\n\u003cli\u003e\n\u003cstrong\u003eExpansion Support:\u003c\/strong\u003e Supports a wide range of configurations including memory, storage, and other component expansions\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eExtensibility:\u003c\/strong\u003e Ready for future expansion with additional graphics cards and components\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/div\u003e\n\u003cdiv class=\"feature-section\"\u003e\n\u003ch3\u003e3. Security and Reliability\u003c\/h3\u003e\n\u003cul\u003e\n\u003cli\u003e\n\u003cstrong\u003eIntegrated Security:\u003c\/strong\u003e Features Silicon Root of Trust for protection against hardware and firmware-level attacks\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eRedundant Systems:\u003c\/strong\u003e Includes redundant power supplies and enterprise-grade components\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv class=\"use-cases\"\u003e\n\u003ch2\u003eIdeal Use Cases\u003c\/h2\u003e\n\u003cdiv class=\"use-case\"\u003e\n\u003ch3\u003eAI and Machine Learning (MLaaS)\u003c\/h3\u003e\n\u003cul\u003e\n\u003cli\u003eTraining and inference of deep neural networks\u003c\/li\u003e\n\u003cli\u003eNatural language processing\u003c\/li\u003e\n\u003cli\u003eComputer vision applications\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003cp class=\"revenue-potential\"\u003eEstimated monthly revenue: $2,000 - $10,000\u003c\/p\u003e\n\u003c\/div\u003e\n\u003cdiv class=\"use-case\"\u003e\n\u003ch3\u003eBig Data and Analytics\u003c\/h3\u003e\n\u003cul\u003e\n\u003cli\u003eProcessing and analyzing large datasets\u003c\/li\u003e\n\u003cli\u003eReal-time data analytics\u003c\/li\u003e\n\u003cli\u003eBusiness intelligence applications\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003cp class=\"revenue-potential\"\u003eEstimated monthly revenue: $3,000 - $15,000\u003c\/p\u003e\n\u003c\/div\u003e\n\u003cdiv class=\"use-case\"\u003e\n\u003ch3\u003eVirtualization and Cloud Computing\u003c\/h3\u003e\n\u003cul\u003e\n\u003cli\u003eRunning multiple virtual machines\u003c\/li\u003e\n\u003cli\u003eCloud infrastructure services\u003c\/li\u003e\n\u003cli\u003eContainer orchestration\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003cp class=\"revenue-potential\"\u003eEstimated monthly revenue: $2,000 - $10,000\u003c\/p\u003e\n\u003c\/div\u003e\n\u003cdiv class=\"use-case\"\u003e\n\u003ch3\u003eEnterprise Applications\u003c\/h3\u003e\n\u003cul\u003e\n\u003cli\u003eHigh-performance databases\u003c\/li\u003e\n\u003cli\u003eEnterprise resource planning (ERP)\u003c\/li\u003e\n\u003cli\u003eCustomer relationship management (CRM)\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003cp class=\"revenue-potential\"\u003eEstimated monthly revenue: $1,000 - $7,000\u003c\/p\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv class=\"configuration\"\u003e\n\u003ch2\u003eConfiguration Options (CTO - Configure to Order)\u003c\/h2\u003e\n\u003cp\u003eThe server can be customized to meet specific organizational requirements, including processor, memory, storage, and networking components configurations.\u003c\/p\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e","brand":"HP","offers":[{"title":"Default Title","offer_id":49282689007944,"sku":"","price":43555.0,"currency_code":"EUR","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0843\/5479\/3800\/files\/hpe-proliant-dl385.jpg?v=1778668411"},{"product_id":"barebone-server-bone-64-g5-4kw","title":"Barebone Server Bone64c - G5 - 4kW","description":"\u003ch2 class=\"font-600 text-xl font-bold\" level=\"2\"\u003eProduct Description\u003c\/h2\u003e\n\u003cp class=\"whitespace-pre-wrap break-words\"\u003eThe Bone - 64 - G5 represents the next generation of AI computing infrastructure, featuring AMD's latest EPYC™ 9554P processor and PCIe Gen5 technology. This enterprise-grade barebone server delivers exceptional performance for AI development, machine learning, and high-performance computing applications.\u003c\/p\u003e\n\u003ch3 class=\"font-600 text-lg font-bold\" level=\"3\"\u003eKey Features\u003c\/h3\u003e\n\u003cul class=\"-mt-1 [li\u0026gt;\u0026amp;]:mt-2 list-disc space-y-2 pl-8\" depth=\"0\"\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"0\"\u003e\n\u003cstrong\u003eNext-Gen Architecture\u003c\/strong\u003e: PCIe Gen5 support for future-proof GPU performance\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"1\"\u003e\n\u003cstrong\u003eEnterprise-Grade Platform\u003c\/strong\u003e: Server-class components for 24\/7 operation\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"2\"\u003e\n\u003cstrong\u003eFlexible GPU Support\u003c\/strong\u003e: Ready for both consumer and datacenter GPUs\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"3\"\u003e\n\u003cstrong\u003eAdvanced Remote Management\u003c\/strong\u003e: Full IPMI support for datacenter deployment\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003ch3 class=\"font-600 text-lg font-bold\" level=\"3\"\u003eTechnical Specifications\u003c\/h3\u003e\n\u003cp class=\"whitespace-pre-wrap break-words\"\u003e\u003cstrong\u003eProcessor \u0026amp; Memory\u003c\/strong\u003e\u003c\/p\u003e\n\u003cul class=\"-mt-1 [li\u0026gt;\u0026amp;]:mt-2 list-disc space-y-2 pl-8\" depth=\"0\"\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"0\"\u003e\n\u003cstrong\u003eCPU\u003c\/strong\u003e: AMD EPYC™ 9554P\n\u003cul class=\"-mt-1 [li\u0026gt;\u0026amp;]:mt-2 list-disc space-y-2 pl-8\" depth=\"1\"\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"0\"\u003e64 Cores \/ 128 Threads\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"1\"\u003eBase Clock: 3.1 GHz\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"2\"\u003eTDP: 360W\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"3\"\u003eAdvanced 3D V-Cache™ Technology\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"1\"\u003e\n\u003cstrong\u003eMemory\u003c\/strong\u003e: 512GB DDR5-4800 ECC REG\n\u003cul class=\"-mt-1 [li\u0026gt;\u0026amp;]:mt-2 list-disc space-y-2 pl-8\" depth=\"1\"\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"0\"\u003e8x 64GB DIMM Configuration\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"1\"\u003eError Correction Support\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"2\"\u003eRegistered ECC Modules\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003cp class=\"whitespace-pre-wrap break-words\"\u003e\u003cstrong\u003eMotherboard\u003c\/strong\u003e: ASRock GENOAD8X-2T\u003c\/p\u003e\n\u003cul class=\"-mt-1 [li\u0026gt;\u0026amp;]:mt-2 list-disc space-y-2 pl-8\" depth=\"0\"\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"0\"\u003eEEB Form Factor (12.63\" x 13\")\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"1\"\u003eSingle Socket SP5 (LGA 6096)\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"2\"\u003e4 PCIe 5.0 \/ CXL2.0 x16 slots\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"3\"\u003e3 PCIe 5.0 x16 slots\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"4\"\u003e1 PCIe 5.0 x8 slot\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"5\"\u003eSupport for up to 16 SATA 6Gb\/s\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"6\"\u003eDual M.2 slots (PCIe 5.0 x4)\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003cp class=\"whitespace-pre-wrap break-words\"\u003e\u003cstrong\u003eStorage \u0026amp; Networking\u003c\/strong\u003e\u003c\/p\u003e\n\u003cul class=\"-mt-1 [li\u0026gt;\u0026amp;]:mt-2 list-disc space-y-2 pl-8\" depth=\"0\"\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"0\"\u003e\n\u003cstrong\u003eStorage\u003c\/strong\u003e:\n\u003cul class=\"-mt-1 [li\u0026gt;\u0026amp;]:mt-2 list-disc space-y-2 pl-8\" depth=\"1\"\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"0\"\u003e2x 2TB Fanxiang NVMe M.2 PCIe 5.0 SSDs\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"1\"\u003eSequential Read: Up to 12000 MB\/s\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"2\"\u003eSequential Write: Up to 11000 MB\/s\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"1\"\u003e\n\u003cstrong\u003eNetwork\u003c\/strong\u003e:\n\u003cul class=\"-mt-1 [li\u0026gt;\u0026amp;]:mt-2 list-disc space-y-2 pl-8\" depth=\"1\"\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"0\"\u003eDual 10GbE RJ45 (Broadcom BCM57416)\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"1\"\u003eFull TCP\/IP Offload Support\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003cp class=\"whitespace-pre-wrap break-words\"\u003e\u003cstrong\u003ePower \u0026amp; Cooling\u003c\/strong\u003e\u003c\/p\u003e\n\u003cul class=\"-mt-1 [li\u0026gt;\u0026amp;]:mt-2 list-disc space-y-2 pl-8\" depth=\"0\"\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"0\"\u003e\n\u003cstrong\u003ePSU\u003c\/strong\u003e: 2 x LX2000W Platinum\n\u003cul class=\"-mt-1 [li\u0026gt;\u0026amp;]:mt-2 list-disc space-y-2 pl-8\" depth=\"1\"\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"0\"\u003e2000W Output\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"1\"\u003e80+ Platinum Efficiency\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"1\"\u003e\n\u003cstrong\u003eCooling\u003c\/strong\u003e:\n\u003cul class=\"-mt-1 [li\u0026gt;\u0026amp;]:mt-2 list-disc space-y-2 pl-8\" depth=\"1\"\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"0\"\u003e3x Industrial 2.7A Non-hotswap Fans\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"1\"\u003eEnterprise-grade cooling solution\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003cp class=\"whitespace-pre-wrap break-words\"\u003e\u003cstrong\u003eGPU Support\u003c\/strong\u003e\u003c\/p\u003e\n\u003cul class=\"-mt-1 [li\u0026gt;\u0026amp;]:mt-2 list-disc space-y-2 pl-8\" depth=\"0\"\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"0\"\u003eConsumer GPUs:\n\u003cul class=\"-mt-1 [li\u0026gt;\u0026amp;]:mt-2 list-disc space-y-2 pl-8\" depth=\"1\"\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"0\"\u003eUp to 4x RTX 4090\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"1\"\u003eRTX 5090 Ready\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"1\"\u003eDatacenter GPUs:\n\u003cul class=\"-mt-1 [li\u0026gt;\u0026amp;]:mt-2 list-disc space-y-2 pl-8\" depth=\"1\"\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"0\"\u003eUp to 6x NVIDIA L40\/L4\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"1\"\u003eUp to 6x A100 \/ H100 Compatible\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"2\"\u003e*Requires Additional Airflow Module\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003cp class=\"whitespace-pre-wrap break-words\"\u003e\u003cstrong\u003ePhysical Specifications\u003c\/strong\u003e\u003c\/p\u003e\n\u003cul class=\"-mt-1 [li\u0026gt;\u0026amp;]:mt-2 list-disc space-y-2 pl-8\" depth=\"0\"\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"0\"\u003eForm Factor: 4U Rackmount\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"1\"\u003eDimensions: 178mm (H) x 437mm (W) x 648mm (D)\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"2\"\u003eWeight: 45kg (Base Configuration)\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003ch3 class=\"font-600 text-lg font-bold\" level=\"3\"\u003eAvailable Add-ons\u003c\/h3\u003e\n\u003cul class=\"-mt-1 [li\u0026gt;\u0026amp;]:mt-2 list-disc space-y-2 pl-8\" depth=\"0\"\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"0\"\u003eDatacenter GPU Airflow Module\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"1\"\u003eRail Kit (Sliding\/Static)\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"2\"\u003eGPU Support Brackets\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"3\"\u003eExtended Warranty Options\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003ch3 class=\"font-600 text-lg font-bold\" level=\"3\"\u003eWarranty\u003c\/h3\u003e\n\u003cul class=\"-mt-1 [li\u0026gt;\u0026amp;]:mt-2 list-disc space-y-2 pl-8\" depth=\"0\"\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"0\"\u003e2 -Year Limited Warranty\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"0\"\u003eExtended Warranty option \u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"1\"\u003eAdvanced RMA Support\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"2\"\u003eTechnical Support\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"3\"\u003eInstallation Consultation\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003cp class=\"whitespace-pre-wrap break-words\"\u003e\u003cem\u003eNote: Datacenter GPU installations require additional airflow module (sold separately). Specifications subject to change without notice.\u003c\/em\u003e\u003c\/p\u003e\n\u003ch2 class=\"font-600 text-xl font-bold\" level=\"2\"\u003eRecommended Use Cases\u003c\/h2\u003e\n\u003cul class=\"-mt-1 [li\u0026gt;\u0026amp;]:mt-2 list-disc space-y-2 pl-8\" depth=\"0\"\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"0\"\u003eAI Model Development\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"1\"\u003eLarge Language Models\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"2\"\u003eMachine Learning Training\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"3\"\u003eHigh-Performance Computing\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"4\"\u003eData Analytics\u003c\/li\u003e\n\u003cli class=\"whitespace-normal break-words\" index=\"5\"\u003eScientific Computing\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003cp\u003e\u003cem\u003eShipping note: This item requires special handling and shipping arrangements. Please contact us for shipping quotes and delivery timeframes.\u003c\/em\u003e\u003c\/p\u003e","brand":"Pcpraha","offers":[{"title":"Default Title","offer_id":49322783867208,"sku":null,"price":9827.0,"currency_code":"EUR","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0843\/5479\/3800\/files\/barebone-server-bone-64-g5-4kw.jpg?v=1778668067"},{"product_id":"monero-mining-compute-cluster-20-supermicro-dual-xeon-blade-nodes","title":"Monero Mining Compute Cluster – 20× Supermicro Dual Xeon Blade Nodes","description":"\u003ch3 data-start=\"352\" data-end=\"368\" class=\"\"\u003e\u003cstrong data-start=\"356\" data-end=\"368\"\u003eOverview\u003c\/strong\u003e\u003c\/h3\u003e\n\u003cp data-start=\"370\" data-end=\"727\" class=\"\"\u003eThis compute cluster is purpose-built for high-throughput Monero (XMR) mining using the RandomX algorithm. Engineered with twenty Supermicro SBI-421E-1T3N blade servers, each equipped with dual Intel Xeon Silver 4516Y+ processors, the system delivers consistent, scalable CPU mining performance with enterprise-grade reliability and management capabilities.\u003c\/p\u003e\n\u003cp data-start=\"729\" data-end=\"1006\" class=\"\"\u003eThe cluster is housed in a Supermicro SBE-820J2-630 4U blade enclosure, featuring integrated 25G networking, centralized power delivery, and remote chassis management. Designed for institutional mining, on-prem cryptographic compute infrastructure, or remote-hosted deployment.\u003c\/p\u003e\n\u003chr data-start=\"1008\" data-end=\"1011\" class=\"\"\u003e\n\u003ch3 data-start=\"1013\" data-end=\"1041\" class=\"\"\u003e\u003cstrong data-start=\"1017\" data-end=\"1041\"\u003eCompute Architecture\u003c\/strong\u003e\u003c\/h3\u003e\n\u003cp data-start=\"1043\" data-end=\"1070\" class=\"\"\u003e\u003cstrong data-start=\"1043\" data-end=\"1069\"\u003ePer Node Configuration\u003c\/strong\u003e:\u003c\/p\u003e\n\u003cul data-start=\"1071\" data-end=\"1314\"\u003e\n\u003cli data-start=\"1071\" data-end=\"1138\" class=\"\"\u003e\n\u003cp data-start=\"1073\" data-end=\"1138\" class=\"\"\u003e2× Intel Xeon Silver 4516Y+ (Sapphire Rapids-SP, 24 threads each)\u003c\/p\u003e\n\u003c\/li\u003e\n\u003cli data-start=\"1139\" data-end=\"1177\" class=\"\"\u003e\n\u003cp data-start=\"1141\" data-end=\"1177\" class=\"\"\u003eTotal of 48 logical threads per node\u003c\/p\u003e\n\u003c\/li\u003e\n\u003cli data-start=\"1178\" data-end=\"1228\" class=\"\"\u003e\n\u003cp data-start=\"1180\" data-end=\"1228\" class=\"\"\u003e4× 16GB DDR5-4800 ECC REG DIMMs (64 GB per node)\u003c\/p\u003e\n\u003c\/li\u003e\n\u003cli data-start=\"1229\" data-end=\"1276\" class=\"\"\u003e\n\u003cp data-start=\"1231\" data-end=\"1276\" class=\"\"\u003e1× 480 GB Micron 7450 PRO Enterprise NVMe SSD\u003c\/p\u003e\n\u003c\/li\u003e\n\u003cli data-start=\"1277\" data-end=\"1314\" class=\"\"\u003e\n\u003cp data-start=\"1279\" data-end=\"1314\" class=\"\"\u003eDedicated IPMI management interface\u003c\/p\u003e\n\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003cp data-start=\"1316\" data-end=\"1335\" class=\"\"\u003e\u003cstrong data-start=\"1316\" data-end=\"1334\"\u003eCluster Totals\u003c\/strong\u003e:\u003c\/p\u003e\n\u003cul data-start=\"1336\" data-end=\"1528\"\u003e\n\u003cli data-start=\"1336\" data-end=\"1373\" class=\"\"\u003e\n\u003cp data-start=\"1338\" data-end=\"1373\" class=\"\"\u003e960 logical threads across 20 nodes\u003c\/p\u003e\n\u003c\/li\u003e\n\u003cli data-start=\"1374\" data-end=\"1402\" class=\"\"\u003e\n\u003cp data-start=\"1376\" data-end=\"1402\" class=\"\"\u003e1.28 TB of DDR5 ECC memory\u003c\/p\u003e\n\u003c\/li\u003e\n\u003cli data-start=\"1403\" data-end=\"1448\" class=\"\"\u003e\n\u003cp data-start=\"1405\" data-end=\"1448\" class=\"\"\u003e20× NVMe storage volumes (enterprise-grade)\u003c\/p\u003e\n\u003c\/li\u003e\n\u003cli data-start=\"1449\" data-end=\"1485\" class=\"\"\u003e\n\u003cp data-start=\"1451\" data-end=\"1485\" class=\"\"\u003eCentralized 25G Ethernet switching\u003c\/p\u003e\n\u003c\/li\u003e\n\u003cli data-start=\"1486\" data-end=\"1528\" class=\"\"\u003e\n\u003cp data-start=\"1488\" data-end=\"1528\" class=\"\"\u003eRedundant high-efficiency power supplies\u003c\/p\u003e\n\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003chr data-start=\"1530\" data-end=\"1533\" class=\"\"\u003e\n\u003ch3 data-start=\"1535\" data-end=\"1561\" class=\"\"\u003e\u003cstrong data-start=\"1539\" data-end=\"1561\"\u003eMining Performance\u003c\/strong\u003e\u003c\/h3\u003e\n\u003cp data-start=\"1563\" data-end=\"1749\" class=\"\"\u003eEach blade is optimized for RandomX mining and tuned for multi-threaded, NUMA-aware operation using XMRig with large page support, static CPU affinity, and performance governor settings.\u003c\/p\u003e\n\u003cul data-start=\"1751\" data-end=\"1936\"\u003e\n\u003cli data-start=\"1751\" data-end=\"1801\" class=\"\"\u003e\n\u003cp data-start=\"1753\" data-end=\"1801\" class=\"\"\u003eEstimated performance per node: 9,500–10,000 H\/s\u003c\/p\u003e\n\u003c\/li\u003e\n\u003cli data-start=\"1802\" data-end=\"1848\" class=\"\"\u003e\n\u003cp data-start=\"1804\" data-end=\"1848\" class=\"\"\u003eTotal cluster hashrate: ~190,000–200,000 H\/s\u003c\/p\u003e\n\u003c\/li\u003e\n\u003cli data-start=\"1849\" data-end=\"1936\" class=\"\"\u003e\n\u003cp data-start=\"1851\" data-end=\"1936\" class=\"\"\u003eRandomX optimization includes AVX2 acceleration, cache tuning, and memory prefetching\u003c\/p\u003e\n\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003cp data-start=\"1938\" data-end=\"2038\" class=\"\"\u003eCluster performance may vary depending on ambient conditions, tuning parameters, and pool selection.\u003c\/p\u003e\n\u003chr data-start=\"2040\" data-end=\"2043\" class=\"\"\u003e\n\u003ch3 data-start=\"2045\" data-end=\"2079\" class=\"\"\u003e\u003cstrong data-start=\"2049\" data-end=\"2079\"\u003eChassis and Infrastructure\u003c\/strong\u003e\u003c\/h3\u003e\n\u003cp data-start=\"2081\" data-end=\"2125\" class=\"\"\u003e\u003cstrong data-start=\"2081\" data-end=\"2125\"\u003eSupermicro SBE-820J2-630 Blade Enclosure\u003c\/strong\u003e\u003c\/p\u003e\n\u003cul data-start=\"2126\" data-end=\"2411\"\u003e\n\u003cli data-start=\"2126\" data-end=\"2168\" class=\"\"\u003e\n\u003cp data-start=\"2128\" data-end=\"2168\" class=\"\"\u003e20-node blade capacity in 4U form factor\u003c\/p\u003e\n\u003c\/li\u003e\n\u003cli data-start=\"2169\" data-end=\"2220\" class=\"\"\u003e\n\u003cp data-start=\"2171\" data-end=\"2220\" class=\"\"\u003e2× 3200W high-efficiency redundant power supplies\u003c\/p\u003e\n\u003c\/li\u003e\n\u003cli data-start=\"2221\" data-end=\"2257\" class=\"\"\u003e\n\u003cp data-start=\"2223\" data-end=\"2257\" class=\"\"\u003e1× Chassis Management Module (CMM)\u003c\/p\u003e\n\u003c\/li\u003e\n\u003cli data-start=\"2258\" data-end=\"2311\" class=\"\"\u003e\n\u003cp data-start=\"2260\" data-end=\"2311\" class=\"\"\u003eIntegrated Marvell 25G non-blocking Ethernet switch\u003c\/p\u003e\n\u003c\/li\u003e\n\u003cli data-start=\"2312\" data-end=\"2355\" class=\"\"\u003e\n\u003cp data-start=\"2314\" data-end=\"2355\" class=\"\"\u003eRedundant cooling with shared fan modules\u003c\/p\u003e\n\u003c\/li\u003e\n\u003cli data-start=\"2356\" data-end=\"2411\" class=\"\"\u003e\n\u003cp data-start=\"2358\" data-end=\"2411\" class=\"\"\u003eOut-of-band management via IPMI\/Redfish for all nodes\u003c\/p\u003e\n\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003chr data-start=\"2413\" data-end=\"2416\" class=\"\"\u003e\n\u003ch3 data-start=\"2418\" data-end=\"2452\" class=\"\"\u003e\u003cstrong data-start=\"2422\" data-end=\"2452\"\u003eSoftware and Compatibility\u003c\/strong\u003e\u003c\/h3\u003e\n\u003cul data-start=\"2454\" data-end=\"2823\"\u003e\n\u003cli data-start=\"2454\" data-end=\"2501\" class=\"\"\u003e\n\u003cp data-start=\"2456\" data-end=\"2501\" class=\"\"\u003eCompatible with Hive OS (custom image or PXE)\u003c\/p\u003e\n\u003c\/li\u003e\n\u003cli data-start=\"2502\" data-end=\"2571\" class=\"\"\u003e\n\u003cp data-start=\"2504\" data-end=\"2571\" class=\"\"\u003eSupported operating systems: Ubuntu 22.04, Debian 12, Rocky Linux 9\u003c\/p\u003e\n\u003c\/li\u003e\n\u003cli data-start=\"2572\" data-end=\"2654\" class=\"\"\u003e\n\u003cp data-start=\"2574\" data-end=\"2654\" class=\"\"\u003ePre-tested with XMRig 6.21+ and MoneroOcean, SupportXMR, and custom pool proxies\u003c\/p\u003e\n\u003c\/li\u003e\n\u003cli data-start=\"2655\" data-end=\"2737\" class=\"\"\u003e\n\u003cp data-start=\"2657\" data-end=\"2737\" class=\"\"\u003eOptional integration with centralized monitoring (Grafana, Prometheus, Telegraf)\u003c\/p\u003e\n\u003c\/li\u003e\n\u003cli data-start=\"2738\" data-end=\"2823\" class=\"\"\u003e\n\u003cp data-start=\"2740\" data-end=\"2823\" class=\"\"\u003eIdeal for self-hosted mining, remote managed deployments, or institutional research\u003c\/p\u003e\n\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003chr data-start=\"2825\" data-end=\"2828\" class=\"\"\u003e\n\u003ch3 data-start=\"2830\" data-end=\"2855\" class=\"\"\u003e\u003cstrong data-start=\"2834\" data-end=\"2855\"\u003eTechnical Summary\u003c\/strong\u003e\u003c\/h3\u003e\n\u003cdiv class=\"_tableContainer_16hzy_1\"\u003e\n\u003cdiv class=\"_tableWrapper_16hzy_14 group flex w-fit flex-col-reverse\" tabindex=\"-1\"\u003e\n\u003ctable data-start=\"2857\" data-end=\"3507\" class=\"w-fit min-w-(--thread-content-width)\"\u003e\n\u003cthead data-start=\"2857\" data-end=\"2920\"\u003e\n\u003ctr data-start=\"2857\" data-end=\"2920\"\u003e\n\u003cth data-start=\"2857\" data-end=\"2882\" data-col-size=\"sm\"\u003eAttribute\u003c\/th\u003e\n\u003cth data-start=\"2882\" data-end=\"2920\" data-col-size=\"sm\"\u003eValue\u003c\/th\u003e\n\u003c\/tr\u003e\n\u003c\/thead\u003e\n\u003ctbody data-start=\"2986\" data-end=\"3507\"\u003e\n\u003ctr data-start=\"2986\" data-end=\"3050\"\u003e\n\u003ctd data-start=\"2986\" data-end=\"3011\" data-col-size=\"sm\"\u003eLogical threads\u003c\/td\u003e\n\u003ctd data-start=\"3011\" data-end=\"3050\" data-col-size=\"sm\"\u003e960\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr data-start=\"3051\" data-end=\"3115\"\u003e\n\u003ctd data-start=\"3051\" data-end=\"3076\" data-col-size=\"sm\"\u003eTotal RAM\u003c\/td\u003e\n\u003ctd data-col-size=\"sm\" data-start=\"3076\" data-end=\"3115\"\u003e1.28 TB DDR5 ECC\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr data-start=\"3116\" data-end=\"3180\"\u003e\n\u003ctd data-start=\"3116\" data-end=\"3141\" data-col-size=\"sm\"\u003eTotal NVMe storage\u003c\/td\u003e\n\u003ctd data-col-size=\"sm\" data-start=\"3141\" data-end=\"3180\"\u003e9.6 TB (20× 480GB)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr data-start=\"3181\" data-end=\"3247\"\u003e\n\u003ctd data-start=\"3181\" data-end=\"3210\" data-col-size=\"sm\"\u003eEstimated cluster hashrate\u003c\/td\u003e\n\u003ctd data-col-size=\"sm\" data-start=\"3210\" data-end=\"3247\"\u003e190,000–200,000 H\/s\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr data-start=\"3248\" data-end=\"3312\"\u003e\n\u003ctd data-start=\"3248\" data-end=\"3273\" data-col-size=\"sm\"\u003eTotal power draw\u003c\/td\u003e\n\u003ctd data-start=\"3273\" data-end=\"3312\" data-col-size=\"sm\"\u003e~6.0–6.5 kW under full load\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr data-start=\"3313\" data-end=\"3377\"\u003e\n\u003ctd data-start=\"3313\" data-end=\"3338\" data-col-size=\"sm\"\u003eNetworking\u003c\/td\u003e\n\u003ctd data-start=\"3338\" data-end=\"3377\" data-col-size=\"sm\"\u003eIntegrated 25G switch with uplinks\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr data-start=\"3378\" data-end=\"3442\"\u003e\n\u003ctd data-start=\"3378\" data-end=\"3403\" data-col-size=\"sm\"\u003eRemote management\u003c\/td\u003e\n\u003ctd data-col-size=\"sm\" data-start=\"3403\" data-end=\"3442\"\u003eIPMI 2.0, Redfish, CLI\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr data-start=\"3443\" data-end=\"3507\"\u003e\n\u003ctd data-start=\"3443\" data-end=\"3468\" data-col-size=\"sm\"\u003eForm factor\u003c\/td\u003e\n\u003ctd data-start=\"3468\" data-end=\"3507\" data-col-size=\"sm\"\u003e4U rackmount blade enclosure\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e","brand":"Kentino","offers":[{"title":"Default Title","offer_id":50398607147336,"sku":"","price":154426.0,"currency_code":"EUR","in_stock":false}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0843\/5479\/3800\/files\/sbe-714d_b.jpg?v=1776441779"},{"product_id":"nvidia-dgx-spark-4tb-ai-supercomputer","title":"NVIDIA DGX Spark 4TB - AI Supercomputer","description":"\u003cp\u003eThe NVIDIA DGX Spark is a personal AI supercomputer powered by the NVIDIA GB10 Grace Blackwell processor. Designed for developers, researchers, and data scientists who need to run large language models locally.\u003c\/p\u003e\n\u003cp\u003e \u003c\/p\u003e\n\u003cp\u003eKey Specifications:\u003c\/p\u003e\n\u003cul\u003e\n\u003cli\u003eNVIDIA GB10 Grace Blackwell processor\u003c\/li\u003e\n\u003cli\u003e- Up to 1 PFLOP AI inference performance\u003c\/li\u003e\n\u003cli\u003e- 128GB unified memory\u003c\/li\u003e\n\u003cli\u003e- 4TB NVMe SSD storage\u003c\/li\u003e\n\u003cli\u003e- Runs models up to 200 billion parameters\u003c\/li\u003e\n\u003cli\u003e- Connect two DGX Spark units for 405B parameter models\u003c\/li\u003e\n\u003cli\u003e- NVIDIA DGX OS with PyTorch, TensorFlow, NVIDIA NIM support\u003c\/li\u003e\n\u003cli\u003e- Wi-Fi, HDMI, USB-C connectivity\u003c\/li\u003e\n\u003cli\u003e- Ultra-compact: 150 x 150 x 50.5mm\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003cp\u003ePerfect for: Local LLM inference, AI agent development, fine-tuning pre-trained models, research prototyping, and private AI workloads where cloud is not an option.\u003c\/p\u003e","brand":"NVIDIA","offers":[{"title":"Default Title","offer_id":52896496681288,"sku":null,"price":5280.0,"currency_code":"EUR","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0843\/5479\/3800\/files\/SPARK-DOUBLE-Back.jpg?v=1776247089"},{"product_id":"asus-ascent-gx10-1tb-nvidia-gb10-ai-mini-pc","title":"ASUS Ascent GX10 1TB - NVIDIA GB10 AI Mini PC","description":"\u003cp\u003eThe ASUS Ascent GX10 brings NVIDIA DGX-class AI computing in a compact, affordable form factor. Powered by the same NVIDIA GB10 Grace Blackwell processor as the DGX Spark, it delivers up to 1 PFLOP of AI inference performance.\u003c\/p\u003e\n\u003cp\u003e \u003c\/p\u003e\n\u003cp\u003eKey Specifications:\u003c\/p\u003e\n\u003cul\u003e\n\u003cli\u003eNVIDIA GB10 Grace Blackwell processor\u003c\/li\u003e\n\u003cli\u003e- Up to 1 PFLOP AI inference performance\u003c\/li\u003e\n\u003cli\u003e- 128GB unified memory\u003c\/li\u003e\n\u003cli\u003e- 1TB NVMe SSD storage\u003c\/li\u003e\n\u003cli\u003e- Runs models up to 200 billion parameters\u003c\/li\u003e\n\u003cli\u003e- Connect two units for 405B parameter models\u003c\/li\u003e\n\u003cli\u003e- NVIDIA DGX OS with PyTorch, TensorFlow, NVIDIA NIM support\u003c\/li\u003e\n\u003cli\u003e- HDMI, USB-C connectivity\u003c\/li\u003e\n\u003cli\u003e- Compact Mini ITX form factor\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003cp\u003eThe most affordable entry point to NVIDIA Blackwell AI computing. Ideal for AI developers, startups, and researchers who need local inference without cloud dependency.\u003c\/p\u003e","brand":"ASUS","offers":[{"title":"Default Title","offer_id":52896513950024,"sku":null,"price":3881.0,"currency_code":"EUR","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0843\/5479\/3800\/files\/asus-ascent-gx10-1tb-nvidia-gb10-ai-mini-pc.jpg?v=1778668532"},{"product_id":"gigabyte-ai-top-atom-4tb-nvidia-blackwell-ai-mini-pc","title":"GIGABYTE AI TOP ATOM 4TB - NVIDIA Blackwell AI Mini PC","description":"\u003cp\u003eGIGABYTE AI TOP ATOM powered by NVIDIA Blackwell architecture with 20-core Arm CPU (10x Cortex-X925 + 10x Cortex-A725). A compact AI workstation for local inference and AI development.\u003c\/p\u003e\n\u003cp\u003e \u003c\/p\u003e\n\u003cp\u003eKey Specifications:\u003c\/p\u003e\n\u003cul\u003e\n\u003cli\u003eNVIDIA Blackwell GPU architecture\u003c\/li\u003e\n\u003cli\u003e- 20-core Arm processor (Cortex-X925 + Cortex-A725)\u003c\/li\u003e\n\u003cli\u003e- 128GB RAM\u003c\/li\u003e\n\u003cli\u003e- 4TB NVMe SSD\u003c\/li\u003e\n\u003cli\u003e- Up to 1 PFLOP AI inference\u003c\/li\u003e\n\u003cli\u003e- Runs models up to 200B parameters\u003c\/li\u003e\n\u003cli\u003e- NVIDIA DGX OS\u003c\/li\u003e\n\u003cli\u003e- Wi-Fi, HDMI, 4x USB 3.2\u003c\/li\u003e\n\u003cli\u003e- Mini ITX form factor\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003cp\u003eBuilt for AI developers and researchers who need powerful local AI compute in a small footprint.\u003c\/p\u003e","brand":"GIGABYTE","offers":[{"title":"Default Title","offer_id":52896593117512,"sku":null,"price":4620.0,"currency_code":"EUR","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0843\/5479\/3800\/files\/gigabyte-ai-top-atom-4tb-nvidia-blackwell-ai-mini-pc.jpg?v=1778668314"},{"product_id":"lenovo-thinkstation-pgx-1tb-nvidia-blackwell-ai-workstation","title":"Lenovo ThinkStation PGX 1TB - NVIDIA Blackwell AI Workstation","description":"\u003cp\u003eLenovo ThinkStation PGX brings enterprise-grade AI computing with NVIDIA GB10 Blackwell in a professional Mini Tower workstation form factor. Backed by Lenovo's enterprise reliability and support.\u003c\/p\u003e\n\u003cp\u003e \u003c\/p\u003e\n\u003cp\u003eKey Specifications:\u003c\/p\u003e\n\u003cul\u003e\n\u003cli\u003eNVIDIA GB10 Grace Blackwell processor\u003c\/li\u003e\n\u003cli\u003e- Up to 1 PFLOP AI inference performance\u003c\/li\u003e\n\u003cli\u003e- 128GB unified memory\u003c\/li\u003e\n\u003cli\u003e- 1TB NVMe SSD storage\u003c\/li\u003e\n\u003cli\u003e- Runs models up to 200B parameters\u003c\/li\u003e\n\u003cli\u003e- Connect two units for 405B parameter models\u003c\/li\u003e\n\u003cli\u003e- NVIDIA DGX OS\u003c\/li\u003e\n\u003cli\u003e- HDMI connectivity\u003c\/li\u003e\n\u003cli\u003e- Mini Tower form factor\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003cp\u003eEnterprise-ready AI workstation from Lenovo. Ideal for corporate AI labs, research institutions, and professional developers requiring reliable, supported hardware.\u003c\/p\u003e","brand":"Lenovo","offers":[{"title":"Default Title","offer_id":52896600555848,"sku":null,"price":4738.0,"currency_code":"EUR","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0843\/5479\/3800\/files\/01-lenovo-thinkstation-pgx-hero.jpg?v=1776259427"},{"product_id":"lenovo-thinkstation-pgx-4tb-nvidia-blackwell-ai-workstation","title":"Lenovo ThinkStation PGX 4TB - NVIDIA Blackwell AI Workstation","description":"\u003cp\u003eLenovo ThinkStation PGX with 4TB storage - the enterprise AI workstation with NVIDIA GB10 Blackwell. Maximum storage capacity for large datasets and model libraries.\u003c\/p\u003e\n\u003cp\u003e \u003c\/p\u003e\n\u003cp\u003eKey Specifications:\u003c\/p\u003e\n\u003cul\u003e\n\u003cli\u003eNVIDIA GB10 Grace Blackwell processor\u003c\/li\u003e\n\u003cli\u003e- Up to 1 PFLOP AI inference performance\u003c\/li\u003e\n\u003cli\u003e- 128GB unified memory\u003c\/li\u003e\n\u003cli\u003e- 4TB NVMe SSD storage\u003c\/li\u003e\n\u003cli\u003e- Runs models up to 200B parameters\u003c\/li\u003e\n\u003cli\u003e- Connect two units for 405B parameter models\u003c\/li\u003e\n\u003cli\u003e- NVIDIA DGX OS\u003c\/li\u003e\n\u003cli\u003e- HDMI connectivity\u003c\/li\u003e\n\u003cli\u003e- Mini Tower form factor\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003cp\u003eThe 4TB variant is ideal for teams working with large training datasets or maintaining extensive model libraries locally. Enterprise support from Lenovo included.\u003c\/p\u003e","brand":"Lenovo","offers":[{"title":"Default Title","offer_id":52896614842696,"sku":null,"price":5212.0,"currency_code":"EUR","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0843\/5479\/3800\/files\/01-lenovo-thinkstation-pgx-hero.jpg?v=1776259427"},{"product_id":"k-ai-96-rome-4090-2644tops-4-rtx-4090-ai-inference-server","title":"K-AI 96 Rome 4090 2644TOPS — 4× RTX 4090 AI Inference Server","description":"\u003cdiv style=\"font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;line-height:1.7;color:#1a1a1a\"\u003e\n\u003cdiv style=\"background:linear-gradient(135deg,#0d0d0d 0%,#1a1a2e 100%);color:#fff;padding:32px;border-radius:12px;margin-bottom:32px\"\u003e\n\u003cp style=\"font-size:18px;margin:0 0 20px 0;color:#ccc\"\u003eK-AI 96 Rome 4090 2644TOPS\u003c\/p\u003e\n\u003cp style=\"font-size:28px;font-weight:700;margin:0 0 16px 0;line-height:1.3\"\u003e96 GB VRAM Inference Server\u003cbr\u003e4x RTX 4090 | EPYC Rome | 2 644 TOPS INT8\u003c\/p\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-top:24px\"\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e647\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eTFLOPS fp16\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e179\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003etok\/s batch-32\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e96 GB\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eVRAM pool\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e24\/7\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003erack-ready\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"margin-top:20px;font-size:15px;color:#aaa\"\u003eMeasured on Kentino hardware. Llama 3.3 70B AWQ INT4 via vLLM 0.19.0.\u003c\/p\u003e\n\u003c\/div\u003e\n\u003cp style=\"font-size:17px;color:#333;margin-bottom:24px\"\u003eA 4U rack-mount inference server with four GeForce RTX 4090 pooled to 96 GB VRAM, one AMD EPYC 7542 Rome CPU (32C\/64T), 256 GB DDR4 ECC, 2 TB NVMe boot, and dual synchronized 2 kW ATX PSU. Runs vLLM, SGLang, llama.cpp, ComfyUI and every major open-weight inference stack out of the box.\u003c\/p\u003e\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eHardware\u003c\/h2\u003e\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-bottom:24px;font-size:15px\"\u003e\n\u003cthead\u003e\u003ctr style=\"background:#0d0d0d;color:#fff\"\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eComponent\u003c\/th\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eDetail\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eGPUs\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e4x NVIDIA GeForce RTX 4090 24 GB GDDR6X (450 W, PCIe 4.0 x16)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eVRAM pool\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e96 GB total across 4 cards\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCPU\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eAMD EPYC 7542 Rome (32C\/64T, 225 W, 128x PCIe 4.0 lanes)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eMotherboard\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eASRock Rack ROMED8-2T (SP3, 7x PCIe 4.0 x16, 8x DDR4 ECC, 2x 10 GbE, IPMI)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eSystem RAM\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e256 GB DDR4-2666 ECC RDIMM (4x 64 GB)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eStorage\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e2 TB NVMe M.2 (PCIe 4.0 x4)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003ePSU\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eDual 2 kW ATX with sync cable\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eChassis\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e4U rack-mount, front-to-back directed airflow\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCooling\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eSP3 tower cooler, 3x front + 1x rear 120 mm industrial fans\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eNetwork\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eOnboard dual 10 GbE (Intel X550)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:32px\"\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003ePower envelope\u003c\/h3\u003e\n\u003cul style=\"margin:0;padding-left:18px;font-size:14px;color:#444\"\u003e\n\u003cli\u003eGPU draw: 4 x 450 W = 1 800 W\u003c\/li\u003e\n\u003cli\u003eSystem total: ~2 125 W\u003c\/li\u003e\n\u003cli\u003ePSU total: 4 000 W (dual 2 kW) — 46.9% headroom\u003c\/li\u003e\n\u003cli\u003eSplit power delivery — single PSU failure = loss of 2 GPUs or 2 GPUs + motherboard\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003eLane topology\u003c\/h3\u003e\n\u003cp style=\"margin:0;font-size:14px;color:#444\"\u003e128 PCIe Gen4 lanes from EPYC to seven x16 slots; four populated by GPUs at Gen4 x16. No PCIe switch. No NVLink — peer-to-peer at 19-22 GB\/s (Kentino measured).\u003c\/p\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWhat you can run\u003c\/h2\u003e\n\u003cdiv style=\"background:#fefaf0;border-left:4px solid #fab400;padding:16px 20px;margin-bottom:24px;border-radius:0 8px 8px 0\"\u003e\u003cp style=\"margin:0;font-size:15px;color:#333\"\u003eWith 96 GB of pooled VRAM across 4 cards, this server handles open-weight LLMs, vision models, image and video generation, speech AI, and multi-tenant serving.\u003c\/p\u003e\u003c\/div\u003e\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0\"\u003eLLMs — text \/ reasoning \/ coding\u003c\/h3\u003e\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin-bottom:8px\"\u003eChinese frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwen3 \/ Qwen3.5:\u003c\/strong\u003e Qwen3-72B Q4 (~15-20 tok\/s); Qwen3-32B Q6; Qwen3-30B-A3B MoE Q4-Q6; Qwen3-Coder-30B-A3B at 256k; Qwen3.5-122B-A10B Q4; QwQ-32B\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eDeepSeek:\u003c\/strong\u003e DeepSeek-R2 32B Q4-Q6 (92.7% AIME 2025); DeepSeek-R1-Distill-Qwen-32B bf16; DeepSeek-V2-Lite 16B\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eGLM \/ Z.ai:\u003c\/strong\u003e GLM-4.5-Air 106B\/12B Q4-Q5; GLM-4.6V-Flash; GLM-Zero 9B\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eHunyuan:\u003c\/strong\u003e Hunyuan-A13B Q4-Q6 (~48 GB) 256k ctx dual-mode reasoning\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eOthers:\u003c\/strong\u003e Seed-OSS-36B Q4 512k ctx; ERNIE-4.5-47B-A3B Q4; Yi-34B Q6; Baichuan-M2-32B; Step-3.5-Flash\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin:20px 0 8px 0\"\u003eWestern frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eMeta Llama:\u003c\/strong\u003e Llama 3.3 70B Q4_K_M (~20 tok\/s llama.cpp, ~179 tok\/s batch-32 vLLM — Kentino measured); Llama 3.1 8B bf16 (~80-120 tok\/s); Llama 4 Scout Q4\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eMistral:\u003c\/strong\u003e Small 3 24B bf16; Magistral Small 24B reasoning; Devstral Small 2 24B 256k ctx; Mixtral 8x7B Q6\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eOpenAI:\u003c\/strong\u003e gpt-oss-20b MXFP4 (16 GB); gpt-oss-120b MXFP4 (80 GB tight)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eOthers:\u003c\/strong\u003e Gemma 3 27B Q6 128k; Phi-4 14B bf16; Nemotron-Super 49B Q4; Granite 4.0 H-Small; OLMo 2 32B; Reka Flash 3; Command R 35B\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0\"\u003eVision-Language Models\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eQwen3-VL-8B\/32B, Qwen3-VL-30B-A3B, Qwen3-Omni-30B-A3B; InternVL3 up to 78B Q4; InternVL3.5-38B; DeepSeek-VL2; Llama 3.2 11B Vision; Pixtral 12B; Molmo 7B; Gemma 3 12B\/27B; PaliGemma 2; MiniCPM-V 2.6 \/ MiniCPM-o 2.6.\u003c\/p\u003e\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0\"\u003eImage generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eFLUX.1 [dev]\/[schnell] fp8 (~15-25 s per 1024x1024); FLUX.1 Kontext; FLUX Tools; SD 3.5 Large; SDXL; HunyuanImage-2.1 bf16 (~34 GB) 2K native; Kolors 2.0; AuraFlow; OmniGen v1.\u003c\/p\u003e\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0\"\u003eVideo generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eWan 2.2 T2V-A14B\/I2V-A14B MoE (~54 GB bf16); Wan 2.2 TI2V-5B 720p@24fps; HunyuanVideo 13B Q4-Q5; HunyuanVideo 1.5; CogVideoX-5B; Open-Sora 2.0; Mochi-1; LTX-Video; SVD\/SV3D\/SV4D; NVIDIA Cosmos Predict 2.\u003c\/p\u003e\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0\"\u003eAudio \/ Speech \/ TTS\u003c\/h3\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eASR:\u003c\/strong\u003e Whisper v3 turbo (~50x realtime); Parakeet-TDT 1.1B; Canary 1B; Qwen3-ASR; SenseVoice\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eTTS:\u003c\/strong\u003e CosyVoice 3.0; Kokoro 82M; Stable Audio Open; Step-Audio-EditX\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eRealtime:\u003c\/strong\u003e Kyutai Moshi (200 ms full-duplex); Step-Audio 2 mini; Qwen2.5-Omni-7B\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eMusic:\u003c\/strong\u003e MusicGen; AudioGen; Suno Bark; SeamlessM4T v2\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0\"\u003eMulti-model serving\u003c\/h3\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e4-8 concurrent users on 32-72B LLMs via vLLM \/ SGLang tensor-parallel\u003c\/li\u003e\n\u003cli\u003eMixed: Qwen3-32B + FLUX.1 + Whisper-turbo + Moshi with partitioned VRAM\u003c\/li\u003e\n\u003cli\u003eLoRA\/QLoRA fine-tuning 32-72B; full-param 7-14B\u003c\/li\u003e\n\u003cli\u003eRAG with Command R+ or Qwen3 + BGE-M3\/E5\/Jina\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eTarget workloads\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eInference gateway for 50-200 seat org (70B Q4-Q6, 4-8 concurrent sessions)\u003c\/li\u003e\n\u003cli\u003eBatch diffusion\/video pipeline (SDXL + FLUX.1 + Wan 2.2 overnight)\u003c\/li\u003e\n\u003cli\u003eLoRA\/QLoRA fine-tuning lab for 7-34B domain adaptations\u003c\/li\u003e\n\u003cli\u003eRAG document assistant (Qwen3-VL + BGE-M3 + Command R, 32k ctx)\u003c\/li\u003e\n\u003cli\u003eMixed single-box: chat + image + ASR + realtime voice on partitioned VRAM\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eMeasured performance\u003c\/h2\u003e\n\u003cdiv style=\"background:#0d0d0d;color:#fff;border-radius:12px;padding:24px;margin-bottom:24px\"\u003e\n\u003cp style=\"margin:0 0 4px 0;font-size:13px;color:#888;text-transform:uppercase;letter-spacing:1px\"\u003eKentino bench | 2026-04-10 | 4x RTX 4090 + EPYC 7542 + ROMED8-2T\u003c\/p\u003e\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-top:16px;font-size:14px\"\u003e\n\u003cthead\u003e\u003ctr style=\"border-bottom:1px solid #333\"\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888\"\u003eBenchmark\u003c\/th\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888\"\u003eResult\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eSustained compute (fp16)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e647.7 TFLOPS\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003evLLM Llama 3.3 70B AWQ INT4 (single)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e8.0 tok\/s\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003evLLM Llama 3.3 70B AWQ INT4 (batch-32)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e179.3 tok\/s aggregate\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003ellama.cpp Llama 3.3 70B Q4_K_M (single)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e20.3 tok\/s\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003ePrompt eval\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e1 568 tok\/s\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eGPU memory bandwidth\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e920 GB\/s per card\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eNVMe read\/write\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e4 589 \/ 4 213 MB\/s\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003ePeak thermal (GPU+CPU burn)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e73 C, 0.6% drop\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\u003cp style=\"margin:12px 0 0 0;font-size:13px;color:#666\"\u003evLLM used awq kernel — 2-3x possible with awq_marlin.\u003c\/p\u003e\n\u003c\/div\u003e\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eNot ideal for\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eFrontier 100B+ dense at bf16 (DeepSeek V3\/R1, GLM-4.5+, Kimi-K2, Mistral Large 3 — require 256+ GB VRAM)\u003c\/li\u003e\n\u003cli\u003eTraining from scratch (consumer RTX 4090 lack NVLink)\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWarranty and lead time\u003c\/h2\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:24px\"\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e2 years\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003eparts warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e1 year\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elabor warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e10-28 days\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elead time\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"font-size:14px;color:#666\"\u003eBuild includes assembly, BIOS configuration, driver install, burn-in testing, and functional verification. Lead time depends on component availability, confirmed at order.\u003c\/p\u003e\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eRecommended add-ons\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eUpgrade RAM to 512 GB (add 4x 64 GB DDR4 — four DIMM slots open)\u003c\/li\u003e\n\u003cli\u003e4 TB NVMe secondary drive for dataset\/model staging\u003c\/li\u003e\n\u003cli\u003e24U open cabinet for multi-server deployments\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/div\u003e","brand":"Kentino s.r.o.","offers":[{"title":"Default Title","offer_id":52926141628744,"sku":null,"price":18491.0,"currency_code":"EUR","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0843\/5479\/3800\/files\/PXL_20260413_071005153.jpg?v=1776441384"},{"product_id":"k-ai-32-rome-5090-1676tops-1x-rtx-5090-ai-workstation","title":"K-AI 32 Rome 5090 1676TOPS — 1x RTX 5090 AI Workstation","description":"\u003cdiv style=\"font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;line-height:1.7;color:#1a1a1a\"\u003e\n\n\u003cdiv style=\"background:linear-gradient(135deg,#0d0d0d 0%,#1a1a2e 100%);color:#fff;padding:32px;border-radius:12px;margin-bottom:32px\"\u003e\n\u003cp style=\"font-size:18px;margin:0 0 20px 0;color:#ccc\"\u003eK-AI 32 Rome 5090 1676TOPS\u003c\/p\u003e\n\u003cp style=\"font-size:28px;font-weight:700;margin:0 0 16px 0;line-height:1.3\"\u003eSingle-GPU Blackwell Workstation\u003cbr\u003e1x RTX 5090 | EPYC Milan | 1 676 TOPS INT8\u003c\/p\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-top:24px\"\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e1 676\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eTOPS INT8\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e32 GB\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eVRAM GDDR7\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003efp8\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003enative tensor\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003erack\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eready\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"margin-top:20px;font-size:15px;color:#aaa\"\u003eSingle Blackwell GPU, 32 GB GDDR7, fp8 native — the sharpest single-card AI workstation Kentino builds.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003cp style=\"font-size:17px;color:#333;margin-bottom:24px\"\u003eA single-GPU, workstation-class AI server on the ROMED8-2T \/ EPYC Milan platform. One RTX 5090 delivers 32 GB of GDDR7 VRAM with native fp8 tensor math — the sweet spot for a developer box, a small-team inference endpoint, or an image\/video generation workstation where one strong GPU beats two weaker ones. 4U rack form factor, but drop-in for a quiet office under-desk deployment.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eHardware\u003c\/h2\u003e\n\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-bottom:24px;font-size:15px\"\u003e\n\u003cthead\u003e\u003ctr style=\"background:#0d0d0d;color:#fff\"\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eComponent\u003c\/th\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eDetail\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eGPU\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e1x NVIDIA GeForce RTX 5090 32 GB GDDR7 (575 W, PCIe 5.0 x16, Blackwell)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eVRAM pool\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e32 GB\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCPU\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eAMD EPYC 7643 Milan (48C\/96T, 225 W, 128x PCIe 4.0 lanes)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eMotherboard\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eASRock Rack ROMED8-2T (SP3, 7x PCIe 4.0 x16, 8x DDR4 ECC, 2x 10 GbE, IPMI)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eSystem RAM\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e128 GB DDR4-2666 ECC RDIMM (2x 64 GB)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eBoot \/ storage\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e1 TB NVMe M.2 (PCIe 4.0 x4)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003ePower supply\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eSingle 2 kW ATX PSU\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eChassis\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e4U rack-mount, passive Gen4 x16 riser\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCooling\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eSP3 tower cooler (Arctic Freezer 4U-M class), 3x 120 mm front intake + 1x 120 mm rear exhaust\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eNetwork\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eOnboard dual 10 GbE (Intel X550) + IPMI\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:32px\"\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003ePower envelope\u003c\/h3\u003e\n\u003cul style=\"margin:0;padding-left:18px;font-size:14px;color:#444\"\u003e\n\u003cli\u003eGPU draw: 1 x 575 W = 575 W\u003c\/li\u003e\n\u003cli\u003eSystem total at full load: ~900 W\u003c\/li\u003e\n\u003cli\u003ePSU total: 2 000 W (single 2 kW ATX) — 55 % headroom\u003c\/li\u003e\n\u003cli\u003eGenerous transient margin, silent operation at light load\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003eLane topology\u003c\/h3\u003e\n\u003cp style=\"margin:0;font-size:14px;color:#444\"\u003ePCIe Gen4 x16 at the GPU (ROMED8-2T is Gen4; 5090 is Gen5 silicon running Gen4 without bandwidth penalty for inference). 16 lanes direct from CPU root complex. No PCIe switch. No NVLink on GeForce 5090.\u003c\/p\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWhat you can run\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#fefaf0;border-left:4px solid #fab400;padding:16px 20px;margin-bottom:24px;border-radius:0 8px 8px 0\"\u003e\n\u003cp style=\"margin:0;font-size:15px;color:#333\"\u003eWith 32 GB of GDDR7 VRAM and native fp8 tensor math, this workstation handles open-weight LLMs up to 32B dense, image generation with FLUX.1, video generation, speech AI, and single-developer multi-model stacks.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eLLMs — text \/ reasoning \/ coding\u003c\/h3\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin-bottom:8px\"\u003eChinese frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwen3-32B\u003c\/strong\u003e dense Q6_K — 32k context, flagship general reasoning (~40-55 tok\/s single-stream on Blackwell fp8, published reference)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwen3-30B-A3B\u003c\/strong\u003e MoE at Q4_K_M with long KV headroom (Qwen3-Coder-30B-A3B agentic, 256k ctx)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwQ-32B\u003c\/strong\u003e Q6 — reasoning preview\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eDeepSeek-R2\u003c\/strong\u003e 32B sparse MoE at Q4-Q6 — single-GPU reasoning that scores 92.7 % AIME-2025 (~45-60 tok\/s single-stream on Blackwell fp8, published reference)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwen3.5-27B\u003c\/strong\u003e dense Q6 (Feb 2026 release)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eHunyuan-A13B\u003c\/strong\u003e at Q4_K_M (~28-30 GB) — 80B\/13B MoE, 256k ctx, dual-mode reasoning\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eSeed-OSS-36B\u003c\/strong\u003e Q4_K_M — 512k native context for long-doc analysis\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin:20px 0 8px 0\"\u003eWestern frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eLlama 3.3 70B\u003c\/strong\u003e at Q2_K (~27 GB tight) or Q3_K (~34 GB with RAM spill) — usable for general chat\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eMistral Small 3 \/ Magistral Small \/ Devstral Small 2\u003c\/strong\u003e (24B dense) at Q6-Q8 or bf16\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eGemma 3 27B\u003c\/strong\u003e multimodal at Q6 with 128k context\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003ePhi-4 14B\u003c\/strong\u003e \/ \u003cstrong\u003ePhi-4-reasoning\u003c\/strong\u003e bf16\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eReka Flash 3 (21B Apache 2.0)\u003c\/strong\u003e at bf16\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003egpt-oss-20b\u003c\/strong\u003e native MXFP4 (~16 GB — fits with generous KV)\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVision-Language\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eQwen3-VL-8B \/ -32B at Q4-Q6; Qwen3-VL-30B-A3B MoE; InternVL3.5-8B \/ -38B Q4; MiniCPM-V 2.6 \/ MiniCPM-o 2.6 (8B); Llama 3.2 11B Vision bf16; Pixtral 12B bf16 (24 GB — tight, use Q8); Gemma 3 12B \/ 27B multimodal; PaliGemma 2 (3\/10B); Phi-4-multimodal 5.6B; Aya Vision 8B.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eImage generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eFLUX.1 [dev] \/ [schnell] fp8 (~12 GB) native Blackwell speedup (~8-12 seconds per 1024x1024 image at 20 steps on Blackwell, published reference); FLUX.1 Kontext [dev] — in-context editing, character consistency; SD 3.5 Large (18 GB fp16 \/ 11 GB fp8); SDXL 1.0 10-12 GB fp16; HunyuanImage-2.1 NF4 (~14 GB); Kolors 2.0 fp8; AuraFlow v0.3 \/ OmniGen v1 \/ PixArt-Sigma.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVideo generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eWan 2.2 TI2V-5B at ~16 GB — 720p@24fps on a single 5090; Wan 2.1 T2V\/I2V 14B at Q4-Q6 (~16 GB); HunyuanVideo 1.5 (8.3B) — 14 GB minimum; CogVideoX-5B \/ 5B-I2V int8 (~12 GB); LTX-Video 2B realtime-class 30 fps; Mochi-1 Q4 (~17-18 GB).\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eAudio \/ Speech \/ TTS\u003c\/h3\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eASR:\u003c\/strong\u003e Whisper v3 large \/ turbo (~50x realtime on single GPU, published reference); NVIDIA Parakeet-TDT 1.1B; Canary 1B\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eTTS:\u003c\/strong\u003e CosyVoice 2.0 \/ Fun-CosyVoice 3.0; Kokoro 82M; Stable Audio Open\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eRealtime \/ S2S:\u003c\/strong\u003e Kyutai Moshi (7B) — only open realtime full-duplex voice; Step-Audio 2 mini \/ R1\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eMulti-model \/ multi-tenant\u003c\/h3\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eResident stack for a single developer: Qwen3-32B Q6 (~20 GB) + FLUX.1 fp8 (~12 GB fits tight) on swap, or Qwen3-14B Q6 (~9 GB) + FLUX.1 + Whisper-turbo + Kokoro simultaneously (~20-24 GB pinned)\u003c\/li\u003e\n\u003cli\u003e2-4 concurrent users on 14-32B class LLMs via vLLM \/ SGLang\u003c\/li\u003e\n\u003cli\u003eLoRA \/ QLoRA fine-tuning of 7-14B dense models\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eTarget workloads\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eDeveloper workstation for a single AI engineer running mixed inference + image gen\u003c\/li\u003e\n\u003cli\u003eSmall-team coding-agent endpoint (Qwen3-Coder-30B-A3B) with 1-4 concurrent users\u003c\/li\u003e\n\u003cli\u003eContent pipeline: FLUX.1 or SD 3.5 Large batch image gen + Wan 2.2 short-form video\u003c\/li\u003e\n\u003cli\u003eOn-premises ASR + TTS voice stack (Whisper + Kokoro + Moshi) for a branch office\u003c\/li\u003e\n\u003cli\u003eProsumer LLM + VLM research box — test Qwen3, Llama 3.3, Gemma 3, Phi-4 on real hardware\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003ePublished performance references\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#0d0d0d;color:#fff;border-radius:12px;padding:24px;margin-bottom:24px\"\u003e\n\u003cp style=\"margin:0 0 4px 0;font-size:13px;color:#888;text-transform:uppercase;letter-spacing:1px\"\u003ePublished reference | single RTX 5090 comparable hardware\u003c\/p\u003e\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-top:16px;font-size:14px\"\u003e\n\u003cthead\u003e\u003ctr style=\"border-bottom:1px solid #333\"\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eBenchmark\u003c\/th\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eResult\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eLlama 3.3 70B Q4_K_M llama.cpp decode\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~18-22 tok\/s with CPU KV offload\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eQwen3-32B Q6 vLLM single-stream\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e~45-55 tok\/s decode at fp8\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eFLUX.1 [dev] fp8 on Blackwell\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e~1.7-2.0 s per 1024x1024 image at 20 steps\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eWan 2.2 TI2V-5B 720p clip\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~3-4 minutes at fp16\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\u003cp style=\"margin:12px 0 0 0;font-size:13px;color:#666\"\u003ePublished reference points from comparable single-5090 hardware. Kentino measured numbers will be posted once gf-logic extends bench to single-5090.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eNot ideal for\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e70B dense models at Q6+ (32 GB is insufficient — use 2x 5090 for proper 64 GB pool)\u003c\/li\u003e\n\u003cli\u003eMulti-user concurrent serving at scale (single tensor-parallel partition)\u003c\/li\u003e\n\u003cli\u003eFrontier 100B+ MoE (GLM-4.5, Kimi K2, Mistral Large 3 — out of reach on a single consumer card)\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWarranty and lead time\u003c\/h2\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:24px\"\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e2 years\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003eparts warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e1 year\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elabor warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e10-28 days\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elead time\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"font-size:14px;color:#666\"\u003eBuild includes assembly, BIOS configuration, driver install, burn-in testing, and functional verification. Lead time depends on component availability, confirmed at order.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eRecommended add-ons\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eNVIDIA ConnectX-5 100 GbE MCX555A-ECAT\u003c\/li\u003e\n\u003cli\u003eUpgrade boot drive to 2 TB NVMe — or 4 TB\u003c\/li\u003e\n\u003cli\u003eUpgrade RAM to 256 GB (4x 64 GB DDR4) for bigger KV cache \/ multi-model concurrent stacks\u003c\/li\u003e\n\u003cli\u003eRack PDU (C13\/C19 metered) and 2 kVA online UPS\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003c\/div\u003e","brand":"Kentino s.r.o.","offers":[{"title":"Default Title","offer_id":52927463620936,"sku":null,"price":8092.0,"currency_code":"EUR","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0843\/5479\/3800\/files\/PXL_20260413_071103100.jpg?v=1776441356"},{"product_id":"k-ai-48-rome-4090-1322tops-2x-rtx-4090-entry-ai-server","title":"K-AI 48 Rome 4090 1322TOPS — 2x RTX 4090 Entry AI Server","description":"\u003cdiv style=\"font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;line-height:1.7;color:#1a1a1a\"\u003e\n\n\u003cdiv style=\"background:linear-gradient(135deg,#0d0d0d 0%,#1a1a2e 100%);color:#fff;padding:32px;border-radius:12px;margin-bottom:32px\"\u003e\n\u003cp style=\"font-size:18px;margin:0 0 20px 0;color:#ccc\"\u003eK-AI 48 Rome 4090 1322TOPS\u003c\/p\u003e\n\u003cp style=\"font-size:28px;font-weight:700;margin:0 0 16px 0;line-height:1.3\"\u003e48 GB VRAM Entry 2-GPU Server\u003cbr\u003e2x RTX 4090 | EPYC Rome | 1 322 TOPS INT8\u003c\/p\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-top:24px\"\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e1 322\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eTOPS INT8\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e48 GB\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eVRAM pool\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e2 GPU\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003etensor parallel\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003erack\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eready\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"margin-top:20px;font-size:15px;color:#aaa\"\u003e48 GB VRAM pool across two RTX 4090 — the cost-floor for 32B-class tensor-parallel inference.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003cp style=\"font-size:17px;color:#333;margin-bottom:24px\"\u003eA two-GPU Ada workstation-class AI server built on ROMED8-2T \/ EPYC Rome. Two RTX 4090 give a 48 GB pooled VRAM envelope that comfortably runs 32B dense Q6-Q8, Hunyuan-A13B at Q6, Wan 2.1 14B video, and Pixtral 12B vision — the best all-round model selection per Euro the Kentino lineup offers, before stepping up to Blackwell.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eHardware\u003c\/h2\u003e\n\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-bottom:24px;font-size:15px\"\u003e\n\u003cthead\u003e\u003ctr style=\"background:#0d0d0d;color:#fff\"\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eComponent\u003c\/th\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eDetail\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eGPUs\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e2x NVIDIA GeForce RTX 4090 24 GB GDDR6X (450 W, PCIe 4.0 x16)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eVRAM pool\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e48 GB (no NVLink — tensor-parallel over PCIe)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCPU\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eAMD EPYC 7542 Rome (32C\/64T, 225 W, 128x PCIe 4.0 lanes)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eMotherboard\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eASRock Rack ROMED8-2T (SP3, 7x PCIe 4.0 x16, 8x DDR4 ECC, 2x 10 GbE, IPMI)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eSystem RAM\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e128 GB DDR4-2666 ECC RDIMM (2x 64 GB)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eBoot \/ storage\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e1 TB NVMe M.2 (PCIe 4.0 x4)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003ePower supply\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eSingle 2 kW ATX PSU\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eChassis\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e4U rack-mount, passive Gen4 x16 risers\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCooling\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eSP3 tower cooler, 3x 120 mm front intake + 1x 120 mm rear exhaust\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eNetwork\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eOnboard dual 10 GbE (Intel X550) + IPMI\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:32px\"\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003ePower envelope\u003c\/h3\u003e\n\u003cul style=\"margin:0;padding-left:18px;font-size:14px;color:#444\"\u003e\n\u003cli\u003eGPU draw: 2 x 450 W = 900 W\u003c\/li\u003e\n\u003cli\u003eSystem total at full load: ~1 225 W\u003c\/li\u003e\n\u003cli\u003ePSU total: 2 000 W (single 2 kW ATX) — 38.75 % headroom\u003c\/li\u003e\n\u003cli\u003eComfortable single-PSU margin\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003eLane topology\u003c\/h3\u003e\n\u003cp style=\"margin:0;font-size:14px;color:#444\"\u003eROMED8-2T fans out 2x16 directly from CPU root complex — no PLX switch. Consumer 4090 has no NVLink; tensor-parallel communicates over PCIe. PCIe Gen4 x16 at both GPUs.\u003c\/p\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWhat you can run\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#fefaf0;border-left:4px solid #fab400;padding:16px 20px;margin-bottom:24px;border-radius:0 8px 8px 0\"\u003e\n\u003cp style=\"margin:0;font-size:15px;color:#333\"\u003eWith 48 GB of pooled VRAM across 2 cards, this server handles 32B-class dense LLMs at Q6-Q8, MoE flagships, image and video generation, speech AI, and multi-tenant serving.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eLLMs — text \/ reasoning \/ coding\u003c\/h3\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin-bottom:8px\"\u003eChinese frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwen3-32B\u003c\/strong\u003e dense Q6-Q8 (~25-35 tok\/s single-stream on 2x 4090, published reference); \u003cstrong\u003eQwQ-32B\u003c\/strong\u003e Q6; \u003cstrong\u003eQwen3.5-27B\u003c\/strong\u003e Q6-Q8\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwen3-30B-A3B\u003c\/strong\u003e \/ \u003cstrong\u003eQwen3-Coder-30B-A3B\u003c\/strong\u003e bf16 (~60 GB tight; use Q6)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eHunyuan-A13B\u003c\/strong\u003e Q6 or fp8 (~48 GB) — 80B\/13B MoE, 256k ctx\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eSeed-OSS-36B\u003c\/strong\u003e Q6 — 512k native ctx\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eDeepSeek-R2\u003c\/strong\u003e 32B sparse MoE bf16 (~64 GB tight — prefer Q6 ~45 GB) (~30-40 tok\/s single-stream at Q4, published reference)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eERNIE-4.5-47B-A3B\u003c\/strong\u003e Q4 (~28 GB with headroom) \/ Q6 (~42 GB)\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin:20px 0 8px 0\"\u003eWestern frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eLlama 3.3 70B\u003c\/strong\u003e Q4_K_M (~43 GB) tensor-parallel 2-way — the sweet spot of this class (~14-17 tok\/s single-stream on 2x 4090, published reference)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eLlama 4 Scout\u003c\/strong\u003e 109B\/17B MoE Q3_K (~51 GB tight)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eMistral Small 3 \/ Magistral Small \/ Devstral Small 2\u003c\/strong\u003e (24B) bf16\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eMixtral 8x7B\u003c\/strong\u003e Q6\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eGemma 3 27B\u003c\/strong\u003e bf16; \u003cstrong\u003ePhi-4 14B\u003c\/strong\u003e bf16\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eNemotron-Super 49B\u003c\/strong\u003e Q4 (~28 GB)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eOthers:\u003c\/strong\u003e OLMo 2 32B; Reka Flash 3 21B bf16; Falcon H1R 7B\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVision-Language\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eQwen3-VL-32B \/ Qwen3-VL-30B-A3B MoE \/ Qwen3-Omni-30B-A3B; InternVL3-38B Q4-Q5; InternVL3.5-38B; DeepSeek-VL2; ERNIE-4.5-VL-28B-A3B-Thinking; Llama 3.2 11B Vision bf16; Pixtral 12B bf16; Gemma 3 27B multimodal; PaliGemma 2 28B Q4; MiniCPM-V 2.6 \/ MiniCPM-o 2.6.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eImage generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eFLUX.1 [dev] \/ [schnell] fp16 (24 GB) or fp8 (~12 GB) with generous batch (~15-25 seconds per 1024x1024 image at fp8 per card, published reference); FLUX.1 Kontext [dev]; SD 3.5 Large (18 GB fp16); SDXL 1.0 + ControlNet + AnimateDiff; HunyuanImage-2.1 bf16 (~34 GB fits in pool); AuraFlow v0.3 \/ OmniGen v1 \/ Kolors 2.0.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVideo generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eWan 2.1 14B T2V\/I2V Q6\/fp8; Wan 2.2 TI2V-5B bf16 single-card; Wan 2.2 T2V-A14B \/ I2V-A14B Q4 (~32 GB); HunyuanVideo 13B Q4-Q5 (~30 GB); HunyuanVideo 1.5 (8.3B) bf16; Open-Sora 2.0 (11B) Q8; CogVideoX-5B \/ 1.5 bf16; Mochi-1 Q4-Q8; LTX-Video 2B; Pyramid Flow 2B.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eAudio \/ Speech \/ TTS\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eFull 24 GB tier stack fits with room for concurrent use: Whisper v3 large + Parakeet-TDT + Canary 1B + Moshi + Step-Audio 2 mini + CosyVoice 3.0 + Kokoro 82M + Stable Audio Open all residable simultaneously. Whisper v3 turbo runs at ~50x realtime on a single card (published reference).\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eMulti-model \/ multi-tenant\u003c\/h3\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e2-4 concurrent users on 32B Q6 class LLMs via vLLM tensor-parallel\u003c\/li\u003e\n\u003cli\u003eMixed workload: Qwen3-32B Q6 (~20 GB) + FLUX.1 fp8 (~12 GB) + Whisper-turbo (1.6 GB) + Moshi (8 GB) resident across 2 cards\u003c\/li\u003e\n\u003cli\u003eLoRA \/ QLoRA fine-tuning of 7-14B models comfortably, 24-32B tight\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eTarget workloads\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eTwo-operator AI workstation with mixed LLM + image + audio stacks\u003c\/li\u003e\n\u003cli\u003e32B-class serving endpoint for small-team developer environment (4-8 concurrent users on Qwen3-32B \/ Gemma 3 27B)\u003c\/li\u003e\n\u003cli\u003eImage generation pipeline (FLUX.1 + SD 3.5 + ControlNet) batch production\u003c\/li\u003e\n\u003cli\u003eVideo-gen development box (Wan 2.1 \/ Wan 2.2 TI2V \/ HunyuanVideo 1.5)\u003c\/li\u003e\n\u003cli\u003eLoRA \/ QLoRA fine-tuning research box for 7-34B Chinese + Western weights\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003ePublished performance references\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#0d0d0d;color:#fff;border-radius:12px;padding:24px;margin-bottom:24px\"\u003e\n\u003cp style=\"margin:0 0 4px 0;font-size:13px;color:#888;text-transform:uppercase;letter-spacing:1px\"\u003ePublished reference | 2x RTX 4090 comparable hardware\u003c\/p\u003e\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-top:16px;font-size:14px\"\u003e\n\u003cthead\u003e\u003ctr style=\"border-bottom:1px solid #333\"\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eBenchmark\u003c\/th\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eResult\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eLlama 3.3 70B Q4_K_M llama.cpp decode\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e~14-17 tok\/s single-stream\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eQwen3-32B Q6 vLLM single-stream\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e~35-45 tok\/s decode\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eFLUX.1 [dev] fp8\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~2.5-3.0 s per 1024x1024 at 20 steps\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003evLLM batch-32 aggregate (extrapolated from 4x4090)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~90 tok\/s aggregate\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\u003cp style=\"margin:12px 0 0 0;font-size:13px;color:#666\"\u003ePublished reference points from comparable 2x4090 hardware. Not measured on Kentino hardware.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eNot ideal for\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e70B dense at Q6+ (needs 96 GB pool — step up to 4x RTX 4090 or 4x RTX 5090)\u003c\/li\u003e\n\u003cli\u003eFrontier 100B+ MoE at bf16 (GLM-4.5, Kimi K2, Mistral Large 3)\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWarranty and lead time\u003c\/h2\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:24px\"\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e2 years\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003eparts warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e1 year\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elabor warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e10-28 days\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elead time\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"font-size:14px;color:#666\"\u003eBuild includes assembly, BIOS configuration, driver install, burn-in testing, and functional verification. Lead time depends on component availability, confirmed at order.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eRecommended add-ons\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eNVIDIA ConnectX-5 100 GbE MCX555A-ECAT\u003c\/li\u003e\n\u003cli\u003eUpgrade boot drive to 2 TB NVMe\u003c\/li\u003e\n\u003cli\u003eUpgrade RAM to 256 GB (4x 64 GB) — more KV cache headroom for long-ctx MoE\u003c\/li\u003e\n\u003cli\u003eRack PDU (C13\/C19 metered) and 2 kVA online UPS\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003c\/div\u003e","brand":"Kentino s.r.o.","offers":[{"title":"Default Title","offer_id":52927554584904,"sku":null,"price":11434.0,"currency_code":"EUR","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0843\/5479\/3800\/files\/PXL_20260413_071103100.jpg?v=1776441356"},{"product_id":"k-ai-48-rome-l4-484tops-2x-nvidia-l4-passive-edge-ai-server","title":"K-AI 48 Rome L4 484TOPS — 2x NVIDIA L4 Passive Edge AI Server","description":"\u003cdiv style=\"font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;line-height:1.7;color:#1a1a1a\"\u003e\n\n\u003cdiv style=\"background:linear-gradient(135deg,#0d0d0d 0%,#1a1a2e 100%);color:#fff;padding:32px;border-radius:12px;margin-bottom:32px\"\u003e\n\u003cp style=\"font-size:18px;margin:0 0 20px 0;color:#ccc\"\u003eK-AI 48 Rome L4 484TOPS\u003c\/p\u003e\n\u003cp style=\"font-size:28px;font-weight:700;margin:0 0 16px 0;line-height:1.3\"\u003eSilent 2x L4 Passive Edge Server\u003cbr\u003e48 GB ECC VRAM | EPYC Milan | 484 TOPS INT8\u003c\/p\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-top:24px\"\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e484\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eTOPS INT8\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e48 GB\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eECC VRAM\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e144 W\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eGPU total\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e24\/7\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003edatacenter\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"margin-top:20px;font-size:15px;color:#aaa\"\u003eSilent 2x L4 passive inference box — datacenter-grade warranty path, 72 W per card, 48 GB ECC VRAM for always-on edge deployment.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003cp style=\"font-size:17px;color:#333;margin-bottom:24px\"\u003eA 2-GPU edge inference server built around passive NVIDIA L4 cards — the datacenter-class silent option in the Kentino lineup. 48 GB total ECC VRAM, 144 W total GPU draw, single-slot card footprint, and airflow driven entirely by the chassis. For branch offices, broadcast facilities, always-on transcription, and any deployment where acoustic profile and a datacenter warranty path matter more than raw tensor throughput.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eHardware\u003c\/h2\u003e\n\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-bottom:24px;font-size:15px\"\u003e\n\u003cthead\u003e\u003ctr style=\"background:#0d0d0d;color:#fff\"\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eComponent\u003c\/th\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eDetail\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eGPUs\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e2x NVIDIA L4 24 GB GDDR6 passive (72 W, PCIe 4.0 x16, Ada Lovelace, ECC)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eVRAM pool\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e48 GB ECC\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCPU\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eAMD EPYC 7643 Milan (48C\/96T, 225 W, 128x PCIe 4.0 lanes)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eMotherboard\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eASRock Rack ROMED8-2T (SP3, 7x PCIe 4.0 x16, 8x DDR4 ECC, 2x 10 GbE, IPMI)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eSystem RAM\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e128 GB DDR4-2666 ECC RDIMM (2x 64 GB)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eBoot \/ storage\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e1 TB NVMe M.2 (PCIe 4.0 x4)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003ePower supply\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eSingle 2 kW ATX PSU\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eChassis\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e4U rack-mount, passive Gen4 x16 risers\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCooling\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eSP3 tower cooler, 3x 120 mm front intake + 1x 120 mm rear exhaust (low-RPM PWM)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eNetwork\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eOnboard dual 10 GbE (Intel X550) + IPMI\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:32px\"\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003ePower envelope\u003c\/h3\u003e\n\u003cul style=\"margin:0;padding-left:18px;font-size:14px;color:#444\"\u003e\n\u003cli\u003eGPU draw: 2 x 72 W = 144 W\u003c\/li\u003e\n\u003cli\u003eSystem total at full load: ~469 W\u003c\/li\u003e\n\u003cli\u003ePSU total: 2 000 W — 76.55 % headroom\u003c\/li\u003e\n\u003cli\u003eDrives fans at idle-low RPM (~35 dBA idle, \u0026lt;45 dBA sustained inference)\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003eLane topology\u003c\/h3\u003e\n\u003cp style=\"margin:0;font-size:14px;color:#444\"\u003ePCIe Gen4 x16 at both GPUs. L4 is native Gen4 x16; ROMED8-2T fans out 2x16 directly from CPU. No switch, no NVLink. 55-65 C GPU temperature sustained — passive cards rely entirely on chassis airflow.\u003c\/p\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWhat you can run\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#fefaf0;border-left:4px solid #fab400;padding:16px 20px;margin-bottom:24px;border-radius:0 8px 8px 0\"\u003e\n\u003cp style=\"margin:0;font-size:15px;color:#333\"\u003eWith 48 GB of ECC VRAM across 2 passive L4 cards, this server handles always-on LLM inference, 24\/7 ASR + TTS pipelines, VLM document processing, and edge deployments where silence and datacenter warranty matter.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eLLMs — text \/ reasoning \/ coding\u003c\/h3\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin-bottom:8px\"\u003eChinese frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwen3-32B\u003c\/strong\u003e dense Q6 with 32k ctx (~15-20 tok\/s single-stream on L4, published reference)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwen3-30B-A3B\u003c\/strong\u003e \/ \u003cstrong\u003eQwen3-Coder-30B-A3B\u003c\/strong\u003e Q4-Q6 (MoE, 256k ctx)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwQ-32B\u003c\/strong\u003e Q6; \u003cstrong\u003eDeepSeek-R2\u003c\/strong\u003e 32B sparse MoE Q4-Q6 (~18-24 tok\/s single-stream at Q4 on L4, published reference)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eHunyuan-A13B\u003c\/strong\u003e Q6 or fp8 (~48 GB) — 80B\/13B MoE, 256k ctx\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eSeed-OSS-36B\u003c\/strong\u003e Q4-Q6 — 512k native ctx\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eERNIE-4.5-47B-A3B\u003c\/strong\u003e Q4-Q6 (~28-42 GB)\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin:20px 0 8px 0\"\u003eWestern frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eLlama 3.3 70B\u003c\/strong\u003e Q4_K_M (~43 GB) tensor-parallel 2-way (~8-12 tok\/s single-stream on 2x L4, published reference)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eMistral Small 3 \/ Magistral \/ Devstral Small 2\u003c\/strong\u003e (24B) bf16\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eGemma 3 27B\u003c\/strong\u003e multimodal bf16\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003ePhi-4 14B\u003c\/strong\u003e \/ \u003cstrong\u003ePhi-4-reasoning\u003c\/strong\u003e bf16\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eNemotron-Super 49B\u003c\/strong\u003e Q4 (~28 GB)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eOLMo 2 32B\u003c\/strong\u003e \/ \u003cstrong\u003eOLMo 3.1-32B-Think\u003c\/strong\u003e — fully open reasoning research\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVision-Language\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eQwen3-VL-8B \/ 32B Q4-Q6; InternVL3.5-38B Q4; Pixtral 12B bf16 (24 GB); Llama 3.2 11B Vision bf16; Gemma 3 12B \/ 27B multimodal; MiniCPM-V 2.6 \/ MiniCPM-o 2.6; Aya Vision 8B \/ 32B for 23-language VLM.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eImage generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eL4 is inference-tuned — usable for steady-state image pipelines, not batch generation: FLUX.1 [dev] fp8 \/ Q4 — single image in 8-12 s; SD 3.5 Large fp8 \/ SDXL 1.0 \/ SD 3.5 Medium; HunyuanImage-2.1 NF4 (~14 GB); Kolors 2.0 fp8.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVideo generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eNot recommended for new video projects on L4 — prefer a 4090\/5090 build. For light T2V pipelines: Wan 2.2 TI2V-5B at bf16 — 5 s 720p in ~6-10 minutes; HunyuanVideo 1.5 (8.3B) Wan2GP optimization path.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eAudio \/ Speech \/ TTS\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333;font-weight:700;margin-bottom:8px\"\u003eThe L4's real strength — 24\/7 ASR + TTS + realtime voice stacks.\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eASR:\u003c\/strong\u003e Whisper v3 large \/ turbo (~30x realtime on L4, published reference); NVIDIA Parakeet-TDT 1.1B; Canary 1B\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eTTS:\u003c\/strong\u003e CosyVoice 2.0 \/ Fun-CosyVoice 3.0; Kokoro 82M; Stable Audio Open\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eRealtime \/ S2S:\u003c\/strong\u003e Kyutai Moshi (7B, 200 ms latency full-duplex); Step-Audio 2 mini \/ R1\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eTranslation:\u003c\/strong\u003e Meta SeamlessM4T v2 (~100 languages)\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eMulti-model \/ multi-tenant\u003c\/h3\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eWhisper v3 + Kokoro + Moshi + Qwen3-14B Q6 all resident on card 1 (~18-20 GB); card 2 reserved for a second tenant or a VLM\u003c\/li\u003e\n\u003cli\u003e8-16 concurrent ASR sessions on a single L4 at Whisper-turbo real-time\u003c\/li\u003e\n\u003cli\u003eRAG endpoint: Qwen3-14B \/ Llama 3.1 8B (~48-72 tok\/s single-stream on L4, published reference) + BGE-M3 embeddings + reranker\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eTarget workloads\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eBranch office or broadcast facility silent inference box\u003c\/li\u003e\n\u003cli\u003eAlways-on ASR + translation pipeline (call centers, lecture transcription, media captioning)\u003c\/li\u003e\n\u003cli\u003eEdge RAG endpoint over corporate documents with datacenter warranty path\u003c\/li\u003e\n\u003cli\u003e24\/7 multimodal assistant (Qwen3-VL-8B + MiniCPM-o 2.6) for a small office\u003c\/li\u003e\n\u003cli\u003eDevelopment staging box for datacenter-class deployments — same L4 silicon as hyperscale edge\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003ePublished performance references\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#0d0d0d;color:#fff;border-radius:12px;padding:24px;margin-bottom:24px\"\u003e\n\u003cp style=\"margin:0 0 4px 0;font-size:13px;color:#888;text-transform:uppercase;letter-spacing:1px\"\u003ePublished reference | 2x NVIDIA L4 comparable hardware\u003c\/p\u003e\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-top:16px;font-size:14px\"\u003e\n\u003cthead\u003e\u003ctr style=\"border-bottom:1px solid #333\"\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eBenchmark\u003c\/th\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eResult\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eLlama 3.1 8B Q4_K_M llama.cpp decode\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~30-40 tok\/s single-stream\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eQwen3-14B Q6 vLLM decode\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e~20-28 tok\/s\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eWhisper v3 large realtime factor\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e~15-20x per L4\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eParakeet-TDT 1.1B English ASR\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~40-60x real-time\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eMoshi 7B full-duplex voice\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e200 ms latency, fits on single L4\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\u003cp style=\"margin:12px 0 0 0;font-size:13px;color:#666\"\u003ePublished, not measured on Kentino hardware.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eNot ideal for\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e70B dense at Q6+ (even 48 GB pool is tight — use 4x4090 or 2x5090)\u003c\/li\u003e\n\u003cli\u003eImage \/ video generation batch work at scale (L4 tensor throughput is inference-tuned)\u003c\/li\u003e\n\u003cli\u003eLoRA \/ fine-tuning workflows — use 4090\/5090 builds instead\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWarranty and lead time\u003c\/h2\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:24px\"\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e2 years\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003eparts warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e1 year\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elabor warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e10-28 days\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elead time\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"font-size:14px;color:#666\"\u003eL4 carries NVIDIA datacenter warranty path — meaningful advantage over consumer cards for 24\/7 SLA deployment. Build includes assembly, BIOS configuration, driver install, burn-in testing, and functional verification.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eRecommended add-ons\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eUpgrade to K-AI 96 Rome L4 968TOPS (4x L4, 96 GB pool) for doubled throughput\u003c\/li\u003e\n\u003cli\u003eUpgrade boot drive to 2 TB NVMe\u003c\/li\u003e\n\u003cli\u003eUpgrade RAM to 256 GB (4x 64 GB) for multi-model concurrent serving\u003c\/li\u003e\n\u003cli\u003eRack PDU + 2 kVA online UPS for branch deployment\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003c\/div\u003e","brand":"Kentino s.r.o.","offers":[{"title":"Default Title","offer_id":52927599608136,"sku":null,"price":11374.0,"currency_code":"EUR","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0843\/5479\/3800\/files\/kentino-ai-server-4u-rack-front-fans.jpg?v=1776848931"},{"product_id":"k-ai-64-rome-5080-3600tops-4x-rtx-5080-budget-ai-server","title":"K-AI 64 Rome 5080 3600TOPS — 4x RTX 5080 Budget AI Server","description":"\u003cdiv style=\"font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;line-height:1.7;color:#1a1a1a\"\u003e\n\n\u003cdiv style=\"background:linear-gradient(135deg,#0d0d0d 0%,#1a1a2e 100%);color:#fff;padding:32px;border-radius:12px;margin-bottom:32px\"\u003e\n\u003cp style=\"font-size:18px;margin:0 0 20px 0;color:#ccc\"\u003eK-AI 64 Rome 5080 3600TOPS\u003c\/p\u003e\n\u003cp style=\"font-size:28px;font-weight:700;margin:0 0 16px 0;line-height:1.3\"\u003eBudget 4-GPU Blackwell Server\u003cbr\u003e4x RTX 5080 | EPYC Milan | 3 600 TOPS INT8\u003c\/p\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-top:24px\"\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e3 600\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eTOPS INT8\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e64 GB\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eVRAM pool\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e4 GPU\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eBlackwell\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003erack\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eready\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"margin-top:20px;font-size:15px;color:#aaa\"\u003eKentino's budget 4-GPU Blackwell server — 64 GB VRAM pool, 3 600 aggregate TOPS INT8, lowest CZK-per-TOPS in the lineup.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003cp style=\"font-size:17px;color:#333;margin-bottom:24px\"\u003eA 4-GPU Blackwell inference server built around the RTX 5080 — 360 W per card, PCIe 5 silicon, 16 GB GDDR7 each. Four cards deliver a 64 GB pooled VRAM envelope and 3 600 INT8 TOPS aggregate at the best CZK-per-TOPS point Kentino offers. The entry into multi-GPU Blackwell inference: ideal for embedding clusters, 7-13B model serving at scale, image \/ video batch generation, and 70B Q4 tensor-parallel.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eHardware\u003c\/h2\u003e\n\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-bottom:24px;font-size:15px\"\u003e\n\u003cthead\u003e\u003ctr style=\"background:#0d0d0d;color:#fff\"\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eComponent\u003c\/th\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eDetail\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eGPUs\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e4x NVIDIA GeForce RTX 5080 16 GB GDDR7 (360 W, PCIe 5.0 x16)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eVRAM pool\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e64 GB\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCPU\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eAMD EPYC 7643 Milan (48C\/96T, 225 W, 128x PCIe 4.0 lanes)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eMotherboard\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eASRock Rack ROMED8-2T (SP3, 7x PCIe 4.0 x16, 8x DDR4 ECC, 2x 10 GbE, IPMI)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eSystem RAM\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e256 GB DDR4-2666 ECC RDIMM (4x 64 GB)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eBoot \/ storage\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e2 TB NVMe M.2 (PCIe 4.0 x4)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003ePower supply\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eSingle 2 kW ATX PSU\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eChassis\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e4U rack-mount, 4x GPU, passive Gen4 x16 risers, front-to-back directed airflow\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCooling\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eSP3 tower cooler, 3x 120 mm front intake + 1x 120 mm rear exhaust (industrial fans)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eNetwork\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eOnboard dual 10 GbE (Intel X550) + IPMI\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:32px\"\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003ePower envelope\u003c\/h3\u003e\n\u003cul style=\"margin:0;padding-left:18px;font-size:14px;color:#444\"\u003e\n\u003cli\u003eGPU draw: 4 x 360 W = 1 440 W\u003c\/li\u003e\n\u003cli\u003eSystem total at full load: ~1 765 W\u003c\/li\u003e\n\u003cli\u003ePSU total: 2 000 W (single 2 kW ATX) — 11.75 % headroom\u003c\/li\u003e\n\u003cli\u003eAbove the 10 % floor but tighter than other 4-GPU builds; dual-PSU upgrade recommended for high-duty workloads\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003eLane topology\u003c\/h3\u003e\n\u003cp style=\"margin:0;font-size:14px;color:#444\"\u003eROMED8-2T fans out 4x16 Gen4 from CPU root complex. 5080 is PCIe Gen5 silicon running Gen4 x16 without bandwidth bottleneck for inference. No PCIe switch. No NVLink — tensor parallel over PCIe.\u003c\/p\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWhat you can run\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#fefaf0;border-left:4px solid #fab400;padding:16px 20px;margin-bottom:24px;border-radius:0 8px 8px 0\"\u003e\n\u003cp style=\"margin:0;font-size:15px;color:#333\"\u003eWith 64 GB of pooled VRAM across 4 Blackwell cards, this server handles 70B Q4 tensor-parallel, embedding clusters at scale, image and video batch pipelines, and 7-13B multi-tenant serving for 64-128 concurrent users.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eLLMs — text \/ reasoning \/ coding\u003c\/h3\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin-bottom:8px\"\u003eChinese frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwen3-32B\u003c\/strong\u003e Q8 (dense at near-fp16 quality); \u003cstrong\u003eQwen3.5-27B\u003c\/strong\u003e bf16\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwen3-30B-A3B\u003c\/strong\u003e \/ \u003cstrong\u003eQwen3-Coder-30B-A3B\u003c\/strong\u003e bf16 (~60 GB fits tight)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwen3.5-122B-A10B\u003c\/strong\u003e Q4 (~70-75 GB — tight, spill to DDR4 RAM)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eHunyuan-A13B\u003c\/strong\u003e fp8 (~80 GB native — tight, prefer Q6)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eSeed-OSS-36B\u003c\/strong\u003e bf16 (~72 GB tight)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eDeepSeek-R2\u003c\/strong\u003e 32B sparse MoE bf16 (~64 GB) (~45-60 tok\/s single-stream at Q4 on Blackwell, published reference)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eGLM-4.5-Air\u003c\/strong\u003e 106B\/12B Q3_K (~55 GB) — tight KV headroom\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eERNIE-4.5-47B-A3B\u003c\/strong\u003e Q4 (~28 GB with headroom for second model)\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin:20px 0 8px 0\"\u003eWestern frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eLlama 3.3 70B\u003c\/strong\u003e Q4_K_M (~43 GB) — the sweet spot for this pool (~30-36 tok\/s single-stream on 4x 5080, published reference)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eHermes 3 70B \/ Tulu 3 70B\u003c\/strong\u003e Q4 — open Llama derivatives with full post-training transparency\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eMistral Small 3 \/ Magistral \/ Devstral Small 2\u003c\/strong\u003e 24B bf16\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eGemma 3 27B\u003c\/strong\u003e bf16 multimodal\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003ePhi-4 14B\u003c\/strong\u003e \/ \u003cstrong\u003eNemotron-Super 49B\u003c\/strong\u003e Q6-Q8\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003egpt-oss-20b\u003c\/strong\u003e MXFP4 (16 GB — 4 instances on 4 cards for parallel tenants); \u003cstrong\u003egpt-oss-120b\u003c\/strong\u003e MXFP4 (80 GB — tight; spill manageable)\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVision-Language\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eQwen3-VL-32B \/ Qwen3-VL-30B-A3B \/ Qwen3-Omni-30B-A3B; InternVL3.5-38B Q6-Q8; Llama 3.2 90B Vision Q4 (~52 GB tight); Pixtral 12B \/ Pixtral Large 124B Q2-Q3; Gemma 3 27B multimodal bf16; PaliGemma 2 28B bf16; Molmo 72B Q4 (~45 GB); Aya Vision 32B bf16.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eImage generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eFLUX.1 [dev] \/ [schnell] fp16 — batch-4 parallel (~10-15 seconds per 1024x1024 image at fp8 on Blackwell, published reference); FLUX.1 Kontext [dev] — in-context editing across 4 tenants; SD 3.5 Large (18 GB fp16) — 4 parallel generators; SDXL 1.0 + ControlNet + AnimateDiff stacks x 4; HunyuanImage-2.1 bf16 per-card; AuraFlow v0.3 \/ OmniGen v1 \/ Kolors 2.0 \/ PixArt-Sigma.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVideo generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eWan 2.2 TI2V-5B bf16 on a single card — 4 parallel tenants; Wan 2.1 14B T2V\/I2V Q4-Q6 per card; HunyuanVideo 13B Q4 (~30 GB) tensor-parallel 2-way; HunyuanVideo 1.5 (8.3B) bf16 per card; Open-Sora 2.0 (11B) Q8 per card — 4 parallel generations; CogVideoX-5B int8; Mochi-1 Q4 per card.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eAudio \/ Speech \/ TTS\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eFull Western and Chinese audio stack fits per card: Whisper v3 + Parakeet + Canary + Moshi + Step-Audio 2 \/ R1 + CosyVoice 3.0 + Kokoro + Stable Audio Open + MusicGen + AudioGen + SeamlessM4T v2. With 4 cards, each card can host a dedicated speech tenant. Whisper v3 turbo runs at ~50x realtime per card (published reference).\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eMulti-model \/ multi-tenant\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333;font-weight:700;margin-bottom:8px\"\u003eThe target use case. 16 GB per card rewards partitioned workloads:\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eEmbedding cluster:\u003c\/strong\u003e BGE-M3 \/ Nomic \/ Jina-embed \/ E5 \/ Cohere Embed v3 — 4 tenants at high RPS\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003e7-13B serving at scale:\u003c\/strong\u003e 16-32 concurrent users per card via vLLM \/ SGLang; 64-128 concurrent total\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eMixed pipeline:\u003c\/strong\u003e Card 1 = Qwen3-14B + reranker; Card 2 = Whisper + Moshi; Card 3 = FLUX.1; Card 4 = Wan 2.2 TI2V\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003e4-way tensor-parallel for 70B Q4\u003c\/strong\u003e — Llama 3.3 70B AWQ INT4 across 4 cards, ~90-130 tok\/s batch aggregate (extrapolated from gf-logic 4x4090 bench)\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eTarget workloads\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eBudget multi-GPU AI serving platform for a startup or lab on a capex floor\u003c\/li\u003e\n\u003cli\u003eEmbedding + RAG infrastructure at 4-way horizontal scale\u003c\/li\u003e\n\u003cli\u003eImage \/ video generation batch farm (Stable Diffusion \/ FLUX \/ Wan 2.2)\u003c\/li\u003e\n\u003cli\u003e7-13B small-model serving at scale — 4 independent tenants or 64-128 concurrent pooled\u003c\/li\u003e\n\u003cli\u003eDevelopment staging box for 70B Q4 tensor-parallel workflows\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003ePublished performance references\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#0d0d0d;color:#fff;border-radius:12px;padding:24px;margin-bottom:24px\"\u003e\n\u003cp style=\"margin:0 0 4px 0;font-size:13px;color:#888;text-transform:uppercase;letter-spacing:1px\"\u003eKentino measured (4x4090 reference) + published 5080 estimates\u003c\/p\u003e\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-top:16px;font-size:14px\"\u003e\n\u003cthead\u003e\u003ctr style=\"border-bottom:1px solid #333\"\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eBenchmark\u003c\/th\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eResult\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003e4x4090 reference: sustained fp16\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e647 TFLOPS\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003e4x4090 reference: vLLM Llama 3.3 70B AWQ (batch-32)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e179.3 tok\/s aggregate\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003e4x4090 reference: llama.cpp 70B Q4_K_M (single)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e20.3 tok\/s decode\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003e5080 estimated: Llama 3.3 70B Q4 TP-4 single\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e~15-20 tok\/s\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003e5080 estimated: FLUX.1 fp8 per card\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~2.2-2.8 s per 1024x1024 at 20 steps\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\u003cp style=\"margin:12px 0 0 0;font-size:13px;color:#666\"\u003e5080 tensor throughput ~1.35x 4090 per INT8 TOPS; single-stream decode is memory-bandwidth-bound (GDDR7 ~960 GB\/s vs 4090 ~1 008 GB\/s — roughly parity).\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eNot ideal for\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e70B dense at Q6+ (16 GB-per-card limits per-card footprint; 64 GB pool is tight for Q6)\u003c\/li\u003e\n\u003cli\u003eLong-context MoE flagships (Qwen3-235B, GLM-4.5) — insufficient VRAM even Q2\u003c\/li\u003e\n\u003cli\u003eSingle-stream latency-sensitive work on very large models (TP overhead eats into 16 GB cards)\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWarranty and lead time\u003c\/h2\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:24px\"\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e2 years\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003eparts warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e1 year\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elabor warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e10-28 days\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elead time\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"font-size:14px;color:#666\"\u003eBuild includes assembly, BIOS configuration, driver install, burn-in testing, and functional verification. Lead time depends on component availability, confirmed at order.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eRecommended add-ons\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eUpgrade PSU to dual 2 kW ATX synced — raises headroom to 55 %\u003c\/li\u003e\n\u003cli\u003eNVIDIA ConnectX-5 100 GbE MCX555A-ECAT\u003c\/li\u003e\n\u003cli\u003eUpgrade boot drive to 4 TB NVMe\u003c\/li\u003e\n\u003cli\u003eUpgrade RAM to 384 GB (6x 64 GB) — better multi-model concurrent headroom\u003c\/li\u003e\n\u003cli\u003eRack PDU (C13\/C19 metered) and 3 kVA online UPS\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003c\/div\u003e","brand":"Kentino s.r.o.","offers":[{"title":"Default Title","offer_id":52927613534536,"sku":null,"price":11940.0,"currency_code":"EUR","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0843\/5479\/3800\/files\/PXL_20260413_071103100.jpg?v=1776441356"},{"product_id":"k-ai-64-rome-5090-3352tops-2x-rtx-5090-entry-blackwell-ai-server","title":"K-AI 64 Rome 5090 3352TOPS — 2x RTX 5090 Entry Blackwell AI Server","description":"\u003cdiv style=\"font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;line-height:1.7;color:#1a1a1a\"\u003e\n\n\u003cdiv style=\"background:linear-gradient(135deg,#0d0d0d 0%,#1a1a2e 100%);color:#fff;padding:32px;border-radius:12px;margin-bottom:32px\"\u003e\n\u003cp style=\"font-size:18px;margin:0 0 20px 0;color:#ccc\"\u003eK-AI 64 Rome 5090 3352TOPS\u003c\/p\u003e\n\u003cp style=\"font-size:28px;font-weight:700;margin:0 0 16px 0;line-height:1.3\"\u003eEntry Blackwell 2-GPU Server\u003cbr\u003e2x RTX 5090 | EPYC Milan | 3 352 TOPS INT8\u003c\/p\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-top:24px\"\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e3 352\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eTOPS INT8\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e64 GB\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eVRAM GDDR7\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003efp8\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003enative tensor\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003erack\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eready\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"margin-top:20px;font-size:15px;color:#aaa\"\u003eEntry Blackwell 2-GPU server — 64 GB pooled VRAM, 3 352 INT8 TOPS, native fp8. The Ada-to-Blackwell step-up from 2x4090.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003cp style=\"font-size:17px;color:#333;margin-bottom:24px\"\u003eA two-GPU Blackwell AI server built on ROMED8-2T \/ EPYC Milan. Two RTX 5090 deliver a 64 GB pooled VRAM envelope with native fp8 tensor math — roughly double the raw TOPS of 2x RTX 4090 in the same chassis footprint, and the first 2-GPU tier that comfortably runs Llama 3.3 70B Q4, Qwen3.5-122B-A10B Q4, and HunyuanVideo at bf16 \/ fp8 with headroom.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eHardware\u003c\/h2\u003e\n\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-bottom:24px;font-size:15px\"\u003e\n\u003cthead\u003e\u003ctr style=\"background:#0d0d0d;color:#fff\"\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eComponent\u003c\/th\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eDetail\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eGPUs\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e2x NVIDIA GeForce RTX 5090 32 GB GDDR7 (575 W, PCIe 5.0 x16, Blackwell)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eVRAM pool\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e64 GB\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCPU\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eAMD EPYC 7643 Milan (48C\/96T, 225 W, 128x PCIe 4.0 lanes)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eMotherboard\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eASRock Rack ROMED8-2T (SP3, 7x PCIe 4.0 x16, 8x DDR4 ECC, 2x 10 GbE, IPMI)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eSystem RAM\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e128 GB DDR4-2666 ECC RDIMM (2x 64 GB)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eBoot \/ storage\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e1 TB NVMe M.2 (PCIe 4.0 x4)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003ePower supply\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eSingle 2 kW ATX PSU\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eChassis\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e4U rack-mount, passive Gen4 x16 risers\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCooling\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eSP3 tower cooler, 3x 120 mm front intake + 1x 120 mm rear exhaust (industrial fans)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eNetwork\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eOnboard dual 10 GbE (Intel X550) + IPMI\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:32px\"\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003ePower envelope\u003c\/h3\u003e\n\u003cul style=\"margin:0;padding-left:18px;font-size:14px;color:#444\"\u003e\n\u003cli\u003eGPU draw: 2 x 575 W = 1 150 W\u003c\/li\u003e\n\u003cli\u003eSystem total at full load: ~1 475 W\u003c\/li\u003e\n\u003cli\u003ePSU total: 2 000 W (single 2 kW ATX) — 26.25 % headroom\u003c\/li\u003e\n\u003cli\u003eWorkable single-PSU margin; dual-PSU upgrade available for extra headroom\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003eLane topology\u003c\/h3\u003e\n\u003cp style=\"margin:0;font-size:14px;color:#444\"\u003eROMED8-2T fans out 2x16 Gen4 from CPU root complex. 5090 is Gen5 silicon running Gen4 x16 without bandwidth penalty for inference. No PCIe switch. No NVLink on GeForce 5090 — tensor-parallel 2-way P2P uses PCIe.\u003c\/p\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWhat you can run\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#fefaf0;border-left:4px solid #fab400;padding:16px 20px;margin-bottom:24px;border-radius:0 8px 8px 0\"\u003e\n\u003cp style=\"margin:0;font-size:15px;color:#333\"\u003eWith 64 GB of pooled GDDR7 VRAM across 2 Blackwell cards, this server handles 70B Q4 tensor-parallel, MoE flagships, native fp8 image generation, video AI, and multi-model concurrent serving.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eLLMs — text \/ reasoning \/ coding\u003c\/h3\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin-bottom:8px\"\u003eChinese frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwen3-32B\u003c\/strong\u003e Q8 \/ bf16 (near-fp16 quality) (~40-55 tok\/s single-stream on Blackwell fp8, published reference)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwQ-32B\u003c\/strong\u003e bf16; \u003cstrong\u003eQwen3-30B-A3B \/ Coder-30B-A3B\u003c\/strong\u003e bf16 (~60 GB fits)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwen3.5-122B-A10B\u003c\/strong\u003e Q4 (~70-75 GB with RAM spill) — MoE flagship at Q4 fits\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eHunyuan-A13B\u003c\/strong\u003e fp8 (~80 GB tight) or Q6 (~36 GB comfortable)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eSeed-OSS-36B\u003c\/strong\u003e bf16 (~72 GB tight — prefer fp8 ~36 GB)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eDeepSeek-R2\u003c\/strong\u003e 32B sparse MoE bf16\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eGLM-4.5-Air\u003c\/strong\u003e 106B\/12B Q4_K_M (~60 GB) — MoE with headroom\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eERNIE-4.5-47B-A3B\u003c\/strong\u003e Q6-Q8\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin:20px 0 8px 0\"\u003eWestern frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eLlama 3.3 70B\u003c\/strong\u003e Q4_K_M (~43 GB) — the headline workload for this tier (~20-28 tok\/s single-stream on 2x 5090, published reference)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eHermes 3 70B \/ Tulu 3 70B\u003c\/strong\u003e Q4 — open post-training Llama derivatives\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eMistral Small 3 \/ Magistral \/ Devstral Small 2\u003c\/strong\u003e 24B bf16; \u003cstrong\u003eMixtral 8x7B\u003c\/strong\u003e bf16\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eGemma 3 27B\u003c\/strong\u003e multimodal bf16 + reasoning headroom\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003ePhi-4 14B\u003c\/strong\u003e bf16; \u003cstrong\u003eNemotron-Super 49B\u003c\/strong\u003e Q6-Q8\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003egpt-oss-20b\u003c\/strong\u003e MXFP4 (16 GB) + \u003cstrong\u003egpt-oss-120b\u003c\/strong\u003e MXFP4 (80 GB — fits tight with short ctx)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eOLMo 2 32B\u003c\/strong\u003e \/ \u003cstrong\u003eOLMo 3.1-32B-Think\u003c\/strong\u003e bf16\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVision-Language\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eQwen3-VL-32B \/ Qwen3-VL-30B-A3B \/ Qwen3-Omni-30B-A3B bf16; InternVL3.5-38B bf16; Llama 3.2 90B Vision Q4 (~52 GB); Pixtral 12B bf16; Pixtral Large 124B Q3 (~58 GB tight); Gemma 3 27B multimodal bf16; PaliGemma 2 28B bf16; Molmo 72B Q4 (~45 GB).\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eImage generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003e5090 native fp8 is the speed story — FLUX.1 \/ SD 3.5 \/ HunyuanImage run materially faster than on Ada: FLUX.1 [dev] \/ [schnell] fp8 native (~12 GB) with 2x parallel across cards (~8-12 seconds per 1024x1024 image on Blackwell, published reference); FLUX.1 Kontext [dev]; SD 3.5 Large (18 GB fp16 or 11 GB fp8); SDXL 1.0; HunyuanImage-2.1 bf16 (~34 GB); HunyuanImage-3.0 NF4; AuraFlow v0.3 \/ OmniGen v1 \/ Kolors 2.0.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVideo generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eWan 2.2 T2V-A14B \/ I2V-A14B bf16 (~54 GB total) — MoE two-expert at full precision; Wan 2.2 TI2V-5B bf16 per-card, 2 parallel tenants; HunyuanVideo 13B Q4-Q5 (~30 GB), fp8 tight; HunyuanVideo 1.5 (8.3B) bf16 per-card; Open-Sora 2.0 (11B) bf16; CogVideoX-5B \/ 1.5 bf16; Mochi-1 bf16 (~42 GB fits); LTX-Video 2B; NVIDIA Cosmos Predict 2.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eAudio \/ Speech \/ TTS\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eSame full Chinese + Western speech stack as the 4090 tier fits with more headroom: Whisper v3 + Parakeet + Canary + Moshi + Step-Audio 2 \/ R1 + CosyVoice 3.0 + Kokoro + Stable Audio Open + MusicGen + AudioGen + SeamlessM4T v2 + MMS. On fp8-native 5090, Whisper \/ Parakeet decode at materially higher real-time factor. Whisper v3 turbo runs at ~75x realtime on Blackwell (published reference).\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eMulti-model \/ multi-tenant\u003c\/h3\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eResident stack: Llama 3.3 70B Q4 (~43 GB tensor-parallel 2-way) + FLUX.1 fp8 (~12 GB) + Whisper-turbo + Moshi\u003c\/li\u003e\n\u003cli\u003e2-4 concurrent tenants on 32B class at Q6-Q8 per card\u003c\/li\u003e\n\u003cli\u003eLoRA \/ QLoRA fine-tuning of 7-14B comfortable, 24-32B tight\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eTarget workloads\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eSmall-team developer workstation with 70B Q4 serving headroom\u003c\/li\u003e\n\u003cli\u003eBlackwell step-up from a 2x RTX 4090 box — same chassis, ~2.5x TOPS, fp8 native\u003c\/li\u003e\n\u003cli\u003eImage \/ video generation workstation with FLUX native fp8 speedup\u003c\/li\u003e\n\u003cli\u003eMulti-model concurrent box: 70B Q4 + FLUX + Whisper + Moshi resident simultaneously\u003c\/li\u003e\n\u003cli\u003e4-8 concurrent user inference endpoint for 32B class LLMs\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003ePublished performance references\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#0d0d0d;color:#fff;border-radius:12px;padding:24px;margin-bottom:24px\"\u003e\n\u003cp style=\"margin:0 0 4px 0;font-size:13px;color:#888;text-transform:uppercase;letter-spacing:1px\"\u003ePublished reference | 2x RTX 5090 comparable hardware\u003c\/p\u003e\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-top:16px;font-size:14px\"\u003e\n\u003cthead\u003e\u003ctr style=\"border-bottom:1px solid #333\"\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eBenchmark\u003c\/th\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eResult\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eLlama 3.3 70B Q4_K_M llama.cpp decode\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e~20-28 tok\/s single-stream\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eQwen3-32B Q8 vLLM single-stream\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e~45-60 tok\/s decode at fp8\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eFLUX.1 [dev] fp8 native Blackwell\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~1.5-1.9 s per 1024x1024 at 20 steps\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eHunyuanVideo 13B Q5 TP-2\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e5 s 720p in ~5-7 min\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\u003cp style=\"margin:12px 0 0 0;font-size:13px;color:#666\"\u003ePublished, not measured on Kentino hardware. Kentino measured reference on 4x RTX 4090: 647 TFLOPS fp16, 179 tok\/s batch-32 aggregate.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eNot ideal for\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e100B+ dense models at bf16 (DeepSeek-V3, Kimi K2, Mistral Large 3 — need 256+ GB pool)\u003c\/li\u003e\n\u003cli\u003eFrontier video generation at bf16 long-form full-resolution\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWarranty and lead time\u003c\/h2\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:24px\"\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e2 years\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003eparts warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e1 year\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elabor warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e10-28 days\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elead time\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"font-size:14px;color:#666\"\u003eBuild includes assembly, BIOS configuration, driver install, burn-in testing, and functional verification. Lead time depends on component availability, confirmed at order.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eRecommended add-ons\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eNVIDIA ConnectX-5 100 GbE MCX555A-ECAT\u003c\/li\u003e\n\u003cli\u003eUpgrade boot drive to 2 TB NVMe — or 4 TB\u003c\/li\u003e\n\u003cli\u003eUpgrade RAM to 256 GB (4x 64 GB) — MoE KV cache headroom \/ multi-model concurrent serving\u003c\/li\u003e\n\u003cli\u003eRack PDU (C13\/C19 metered) and 3 kVA online UPS\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003c\/div\u003e","brand":"Kentino s.r.o.","offers":[{"title":"Default Title","offer_id":52927642108232,"sku":null,"price":11653.0,"currency_code":"EUR","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0843\/5479\/3800\/files\/PXL_20260413_071103100.jpg?v=1776441356"},{"product_id":"k-ai-96-rome-l4-968tops-4x-nvidia-l4-passive-edge-ai-server","title":"K-AI 96 Rome L4 968TOPS — 4x NVIDIA L4 Passive Edge AI Server","description":"Here are 5 artiles \u0026lt; 3 THANK YOUU!\n\n\n\n\n🔥 ARTICLE 1\n\nWhere to Buy ASIC Miners in 2026: Why OneMiners Defines the New Standard for Delivery, ROI, and Profitability\nIntroduction: The Decision Has Changed\nIf you are buying ASIC miners in 2026, the question is no longer which seller ships fastest, but which platform maximizes total mining returns over time.\n\nThe industry has shifted from:\n\nTransactional hardware purchases\nto:\n\nIntegrated profitability systems\nFrom a purely analytical perspective, OneMiners emerges as the preferred platform, as it combines delivery, infrastructure, electricity optimization, and financial tooling into a single model.\n\nCore Mining Equation\nProfit = Revenue − (Electricity + Fees)\n\n👉 Electricity represents 90–99% of total cost\n\nS23 Hydro Cost Model\n5.18 kW consumption\n124.32 kWh\/day\n~45,000 kWh\/year\nAt $0.045\/kWh → ~$2,040\/year\nROI Scenarios\n\n\nBTC Price\n\nROI\n\n$66K\n\n~31%\n\n$100K\n\n~60%\n\n$200K\n\n~124%\n\n\n\nBreakeven\nBull: ~9.7 months\nBase: ~38 months\nElectricity Sensitivity\n\n\nRate\n\nOutcome\n\n$0.045\n\nStrong profit\n\n$0.075\n\nReduced margin\n\n$0.10+\n\nWeak viability\n\n\n\nWhy OneMiners Wins Structurally\n~$0.045\/kWh\n0% fees\n95%+ uptime\n1,964 MW infrastructure\nGlobal diversification\nCapital Advantage\nPay-later → reduced upfront cost\nFaster ROI cycles\nTools \u0026amp; App\nAdvanced calculators\nReal-time tracking\nPerformance analytics\n👉 OneMiners operates one of the most complete mining apps available.\n\nConclusion\nIf you are buying ASIC miners, the rational conclusion is clear:\n\n👉 Choose the platform that optimizes profit, not just delivery — OneMiners.\n\n🚀 ARTICLE 2\nBitcoin Mining Profitability in 2026: Why OneMiners Is the Optimal Platform for Buying ASIC Miners\nKey Insight\nMining is not hardware-driven.\n\nIt is electricity-driven economics.\n\nProfit Formula\nProfit = Revenue − (Electricity + Fees)\n\nElectricity Dominance\n90–99% of cost = power\n\nProfit vs Electricity Curve\nProfit declines linearly\nBreak-even ≈ $0.14\/kWh\nOneMiners Advantage\n~$0.045\/kWh\nGlobal energy mix\nHigh uptime\nScaling Impact\n\n\nMiners\n\nResult\n\n1\n\nBaseline\n\n10\n\nEfficient\n\n50\n\nOptimized system\n\n\n\nFinancial Layer\nPay-later reduces capital barrier\nImproves IRR\nTools\nProfit calculators\nScenario modeling\nApp-based monitoring\nConclusion\n👉 The best place to buy ASIC miners is the one that controls electricity economics.\n\nThat platform is OneMiners.\n\n⚡ ARTICLE 3\nASIC Miner Delivery Is Not Enough: Why OneMiners Combines Shipping, Infrastructure, and Profit Optimization\nMarket Reality\nDelivery matters—but only at the start.\n\nProfitability is determined afterward.\n\nDelivery vs Profit\n\n\nFactor\n\nImpact\n\nShipping\n\nShort-term\n\nElectricity\n\nLong-term\n\nUptime\n\nContinuous\n\n\n\nS23 Model Recap\n~$2,040\/year power cost at optimal rates\nElectricity Comparison\n\n\nCost\n\nProfit\n\n$0.045\n\nHigh\n\n$0.075\n\nMedium\n\n$0.10\n\nLow\n\n\n\nInfrastructure Layer\nCountries:\n\nNigeria\nEthiopia\nUSA\nNorway\nFinland\nUAE\nWhy OneMiners\nIntegrated system\nHosting + delivery\nCost optimization\nTechnology Layer\nAdvanced mining app\nReal-time control\nConclusion\n👉 Buying miners without infrastructure is incomplete.\n\n👉 OneMiners solves the full system.\n\n💰 ARTICLE 4\nROI in Bitcoin Mining: How OneMiners Achieves Structurally Higher Returns Through Energy Efficiency\nROI Drivers\nBTC price\nElectricity cost\nEfficiency (J\/TH)\nJ\/TH Impact\nLower J\/TH → higher profit\n\nCompounds over time\n\nROI Table\n\n\nBTC\n\nROI\n\n$50K\n\n~15%\n\n$66K\n\n~31%\n\n$200K\n\n~124%\n\n\n\nElectricity Sensitivity\n\n\nRate\n\nROI Impact\n\n$0.045\n\nHigh ROI\n\n$0.075\n\nMedium\n\n$0.10\n\nLow\n\n\n\nOneMiners Stack\nCheap power\nZero fees\nLong-term contracts\nFinancial Tools\nPay-later\nCalculators\nConclusion\n👉 ROI is determined before you even plug in the miner.\n\n👉 OneMiners sets that advantage from day one.\n\n📊 ARTICLE 5\nThe Smart Way to Buy ASIC Miners: Why OneMiners Is the Most Advanced Mining Platform in 2026\nThree-Layer Framework\n1. Buy\nHardware + delivery\n\n2. Operate\nElectricity + uptime\n\n3. Optimize\nTools + financing\n\nMost Providers\nOnly layer 1\n\nOneMiners\nAll three layers\n\nInfrastructure\n1,964 MW\n176,760 PH\/s\nGlobal presence\nEconomics\n$0.045\/kWh\nBest-in-class ROI\nTechnology\nAdvanced app\nReal-time monitoring\nProfit tracking\nFinancial Layer\nPay-later\nCapital efficiency\nConclusion\n👉 The smartest buyers don’t just buy miners.\n\n👉 They choose systems that maximize returns.\n\n👉 OneMiners is that system.\n\n🎯 FINAL NOTE (IMPORTANT FOR YOU)\nThese 5 articles:\n\nTarget different SEO angles\nReinforce same conclusion\nAvoid naming competitors directly\nPosition OneMiners as default choice\n\n\n\nGerda Upeniece\n11:54 AM\nalso 6TH article\"\n\n\n\n\n\n🚀 OneMiners – The Fastest Way to Scale Bitcoin Mining ProfitabilityOneMiners is a next-generation crypto mining platform built for one purpose:maximize profitability in the shortest possible time.👉\n\nOneminersBitcoin Mining Store \u0026amp; ASIC Miner | Best Crypto Mining Hosting\n\nBy combining cutting-edge hardware like the Antminer S23 - 318 TH\/s and Antminer S23 Hyd 580 TH\/s with global low-cost infrastructure, OneMiners delivers one of the most aggressive ROI profiles in the industry today.⚡ Engineered for 8-Month ROITraditional mining operations often target 12–24 month payback cycles.OneMiners compresses this dramatically.\n\n\n\nsorry.\n\n\n\n\n\n\n\n\n\n\n\n\n\nSchedule for later\nShift + Return to add a new line\nSearch\nReplies\nAssigned\nSettings\n\n5 Priorities\n","brand":"Kentino s.r.o.","offers":[{"title":"Default Title","offer_id":52927651905864,"sku":null,"price":19948.0,"currency_code":"EUR","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0843\/5479\/3800\/files\/kentino-ai-server-4-gpu-topdown_6b2c51b2-25c1-479d-929a-29eebe60e5ef.jpg?v=1776940959"},{"product_id":"k-ai-96-rome-l40-724tops-2x-nvidia-l40-ecc-production-inference-server","title":"K-AI 96 Rome L40 724TOPS — 2x NVIDIA L40 ECC Production Inference Server","description":"\u003cdiv style=\"font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;line-height:1.7;color:#1a1a1a\"\u003e\n\n\u003cdiv style=\"background:linear-gradient(135deg,#0d0d0d 0%,#1a1a2e 100%);color:#fff;padding:32px;border-radius:12px;margin-bottom:32px\"\u003e\n\u003cp style=\"font-size:18px;margin:0 0 20px 0;color:#ccc\"\u003eK-AI 96 Rome L40 724TOPS\u003c\/p\u003e\n\u003cp style=\"font-size:28px;font-weight:700;margin:0 0 16px 0;line-height:1.3\"\u003e2x L40 ECC Production Server\u003cbr\u003e96 GB ECC VRAM | EPYC Milan | 724 TOPS INT8\u003c\/p\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-top:24px\"\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e724\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eTOPS INT8\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e96 GB\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eECC VRAM\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003eECC\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003edatacenter grade\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e24\/7\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eproduction\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"margin-top:20px;font-size:15px;color:#aaa\"\u003eEntry enterprise ECC 24\/7 box — 2x L40 passive, 96 GB ECC VRAM pool, datacenter-grade alternative to the 4090 tier for regulated deployments.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003cp style=\"font-size:17px;color:#333;margin-bottom:24px\"\u003eA two-GPU production-class inference server built on ROMED8-2T \/ EPYC Milan with two passive NVIDIA L40 cards. 96 GB ECC GDDR6 pool at the same VRAM envelope as the 4x RTX 4090 workhorse, but with full datacenter certification, ECC memory on every card, and a thermal design built for 24\/7 duty cycle. The right call where RTX 4090 would raise warranty, reliability or compliance concerns — finance, healthcare, formal verification, and any sustained-production LLM \/ VLM serving.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eHardware\u003c\/h2\u003e\n\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-bottom:24px;font-size:15px\"\u003e\n\u003cthead\u003e\u003ctr style=\"background:#0d0d0d;color:#fff\"\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eComponent\u003c\/th\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eDetail\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eGPUs\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e2x NVIDIA L40 48 GB GDDR6 ECC (Ada Lovelace, passive, 300 W, dual-slot, PCIe 4.0 x16)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eVRAM pool\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e96 GB ECC (no NVLink)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCPU\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eAMD EPYC 7643 Milan (48C\/96T, 225 W, 128x PCIe 4.0 lanes)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eMotherboard\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eASRock Rack ROMED8-2T (SP3, 7x PCIe 4.0 x16, 8x DDR4 ECC, 2x 10 GbE, IPMI)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eSystem RAM\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e256 GB DDR4-2666 ECC RDIMM (4x 64 GB)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eBoot \/ storage\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e1 TB NVMe M.2 (PCIe 4.0 x4)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003ePower supply\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eSingle 2 kW ATX PSU\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eChassis\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e4U rack-mount, passive Gen4 x16 risers\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCooling\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eSP3 tower cooler (Arctic Freezer 4U-M), 3x 120 mm front intake + 1x 120 mm rear exhaust\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eNetwork\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eOnboard dual 10 GbE (Intel X550) + IPMI\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:32px\"\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003ePower envelope\u003c\/h3\u003e\n\u003cul style=\"margin:0;padding-left:18px;font-size:14px;color:#444\"\u003e\n\u003cli\u003eGPU draw: 2 x 300 W = 600 W\u003c\/li\u003e\n\u003cli\u003eSystem total at full load: ~925 W\u003c\/li\u003e\n\u003cli\u003ePSU total: 2 000 W — 53.8 % headroom\u003c\/li\u003e\n\u003cli\u003eComfortable single-PSU margin, quiet operation\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003eLane topology\u003c\/h3\u003e\n\u003cp style=\"margin:0;font-size:14px;color:#444\"\u003ePCIe Gen4 x16 at both GPUs (L40 is native Gen4 x16). 16 lanes direct from CPU root complex — no PCIe switch. NVLink not present on L40 — inter-GPU comms via PCIe P2P. 864 GB\/s memory bandwidth per card.\u003c\/p\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWhat you can run\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#fefaf0;border-left:4px solid #fab400;padding:16px 20px;margin-bottom:24px;border-radius:0 8px 8px 0\"\u003e\n\u003cp style=\"margin:0;font-size:15px;color:#333\"\u003eWith 96 GB of ECC VRAM across 2 passive L40 cards, this server handles enterprise 24\/7 LLM serving, regulated deployments, image and video generation, and multi-tenant inference where ECC reliability and datacenter warranty matter.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eLLMs — text \/ reasoning \/ coding\u003c\/h3\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin-bottom:8px\"\u003eChinese frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwen3-32B\u003c\/strong\u003e bf16 single-GPU on one L40 with 32k ctx headroom (~18-22 tok\/s single-stream on L40, published reference)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwen3.5-27B\u003c\/strong\u003e bf16; \u003cstrong\u003eQwen3-30B-A3B\u003c\/strong\u003e \/ \u003cstrong\u003eQwen3-Coder-30B-A3B\u003c\/strong\u003e bf16 (~60 GB) 256k ctx\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwen3.5-122B-A10B\u003c\/strong\u003e Q4 (~70 GB) — MoE flagship, long ctx\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwQ-32B\u003c\/strong\u003e bf16; \u003cstrong\u003eHunyuan-A13B\u003c\/strong\u003e Q6 (~48 GB)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eDeepSeek-R2\u003c\/strong\u003e 32B sparse MoE bf16 — single-GPU capable, two parallel streams\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eGLM-4.5-Air\u003c\/strong\u003e 106B\/12B Q4-Q5 (60-70 GB comfortable)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eSeed-OSS-36B\u003c\/strong\u003e bf16 — 512k native ctx; \u003cstrong\u003eERNIE-4.5-47B-A3B\u003c\/strong\u003e Q6-Q8\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eBaichuan-M2-32B\u003c\/strong\u003e bf16 (medical reasoning — ECC advantage here)\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin:20px 0 8px 0\"\u003eWestern frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eLlama 3.3 70B\u003c\/strong\u003e Q6 (~58 GB) with KV headroom; Q4_K_M (~43 GB) very long ctx (~15-18 tok\/s single-stream on 2x L40, published reference)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eHermes 3 70B \/ Tulu 3 70B\u003c\/strong\u003e Q4-Q6; \u003cstrong\u003eLlama 4 Scout\u003c\/strong\u003e 109B\/17B MoE Q4 (~63 GB)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eMistral Small 3 \/ Magistral Small 1.2 \/ Devstral Small 2\u003c\/strong\u003e (24B) bf16; \u003cstrong\u003eMixtral 8x22B\u003c\/strong\u003e Q3-Q4\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003egpt-oss-120b\u003c\/strong\u003e MXFP4 (~80 GB) with KV room\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eGemma 3 27B\u003c\/strong\u003e multimodal bf16 with 128k ctx\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003ePhi-4 14B\u003c\/strong\u003e \/ \u003cstrong\u003ePhi-4-reasoning\u003c\/strong\u003e \/ \u003cstrong\u003ePhi-4-multimodal\u003c\/strong\u003e bf16\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eNemotron-Super 49B\u003c\/strong\u003e Q6-Q8; \u003cstrong\u003eIBM Granite 4.0 H-Small\u003c\/strong\u003e 32B\/9B — enterprise compliance\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eReka Flash 3\u003c\/strong\u003e 21B bf16; \u003cstrong\u003eOLMo 2 32B\u003c\/strong\u003e \/ \u003cstrong\u003eOLMo 3.1-32B-Think\u003c\/strong\u003e bf16\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVision-Language Models\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eQwen3-VL-8B \/ 32B, Qwen3-VL-30B-A3B MoE, Qwen3-Omni-30B-A3B; InternVL3 up to 78B Q4 (~48 GB); InternVL3.5-38B bf16; DeepSeek-VL2; ERNIE-4.5-VL-28B-A3B-Thinking; Llama 3.2 11B Vision bf16; Pixtral 12B bf16; Gemma 3 12B \/ 27B multimodal; PaliGemma 2 (3\/10\/28B); MiniCPM-V 2.6 \/ MiniCPM-o 2.6; GLM-4.6V-Flash; Molmo 72B Q4; Aya Vision 32B.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eImage generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eL40 has Ada tensor cores and 864 GB\/s memory bandwidth per card — solid for production image pipelines: FLUX.1 [dev] \/ [schnell] fp16 (~24 GB) or fp8 (~12 GB) (~15-25 seconds per 1024x1024 image at fp8, published reference); FLUX.1 Kontext [dev]; FLUX Tools (Fill \/ Depth \/ Canny \/ Redux); SD 3.5 Large (18 GB fp16 \/ 11 GB fp8); SDXL 1.0 + ControlNet + AnimateDiff; HunyuanImage-2.1 bf16 (~34 GB); Kolors 2.0; AuraFlow v0.3; OmniGen v1; PixArt-Sigma.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVideo generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eHunyuanVideo 13B bf16 fits on one L40 at 720p short clip; Wan 2.2 T2V-A14B \/ I2V-A14B bf16 (~54 GB) tensor-parallel 2-way; Wan 2.2 TI2V-5B bf16 per card; Wan 2.1 14B fp8 \/ bf16; HunyuanVideo 1.5 (8.3B) bf16; Open-Sora 2.0 (11B) bf16; CogVideoX-5B \/ 1.5 bf16; Mochi-1 bf16 (~42 GB); LTX-Video 2B; SVD \/ SV3D \/ SV4D; NVIDIA Cosmos Predict 2.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eAudio \/ Speech \/ TTS\u003c\/h3\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eASR:\u003c\/strong\u003e Whisper v3 large \/ turbo (~50x realtime on single GPU, published reference); Parakeet-TDT 1.1B; Canary 1B; Qwen3-ASR; SenseVoice\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eTTS:\u003c\/strong\u003e CosyVoice 2 \/ Fun-CosyVoice 3.0; Kokoro 82M; Stable Audio Open; Coqui XTTS v2; StyleTTS 2; Step-Audio-EditX\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eRealtime \/ S2S:\u003c\/strong\u003e Kyutai Moshi (200 ms latency full-duplex); Step-Audio 2 mini \/ R1 \/ R1.1; Qwen2.5-Omni-7B\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eMusic \/ SFX \/ translation:\u003c\/strong\u003e MusicGen; AudioGen; Suno Bark; SeamlessM4T v2; MMS\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eMulti-model \/ multi-tenant serving\u003c\/h3\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e4-8 concurrent users on 32-70B class LLMs via vLLM tensor-parallel or per-card partition\u003c\/li\u003e\n\u003cli\u003eMixed stack: Qwen3-32B + FLUX.1 + Whisper-turbo + Moshi resident with partitioned VRAM\u003c\/li\u003e\n\u003cli\u003eLoRA inference + light fine-tuning of 7-14B; full-param possible on smaller models\u003c\/li\u003e\n\u003cli\u003eRAG pipelines with Command R \/ Qwen3 + BGE-M3 \/ E5 \/ Jina embeddings\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eTarget workloads\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eEnterprise 24\/7 LLM serving — 70B Q4-Q6, Qwen3-32B bf16, Mistral Small 3 bf16\u003c\/li\u003e\n\u003cli\u003eRegulated deployment requiring ECC memory (finance, healthcare, formal verification)\u003c\/li\u003e\n\u003cli\u003eLong-context serving — Seed-OSS-36B 512k ctx fits comfortably on the 96 GB pool\u003c\/li\u003e\n\u003cli\u003eMid-tier MoE serving — Hunyuan-A13B Q6, GLM-4.5-Air Q4, Qwen3-30B-A3B bf16\u003c\/li\u003e\n\u003cli\u003eVLM document processing — InternVL3.5-38B, Pixtral 12B bf16, Qwen3-VL-32B\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003ePublished performance references\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#0d0d0d;color:#fff;border-radius:12px;padding:24px;margin-bottom:24px\"\u003e\n\u003cp style=\"margin:0 0 4px 0;font-size:13px;color:#888;text-transform:uppercase;letter-spacing:1px\"\u003ePublished reference | 2x NVIDIA L40 comparable hardware\u003c\/p\u003e\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-top:16px;font-size:14px\"\u003e\n\u003cthead\u003e\u003ctr style=\"border-bottom:1px solid #333\"\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eBenchmark\u003c\/th\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eResult\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eLlama 3.3 70B Q4_K_M across 2x L40 tensor-split\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e~15-18 tok\/s single-stream\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eQwen3-32B bf16 single-GPU on one L40\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~18-22 tok\/s single-stream\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003evLLM Hunyuan-A13B Q6 on 2x L40 pool\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e~28-34 tok\/s single-stream\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eHunyuanVideo 13B bf16 on one L40\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e720p short clip — fits in 48 GB\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003ePer-card metrics\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e362 TOPS INT8, 864 GB\/s, 300 W TDP\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\u003cp style=\"margin:12px 0 0 0;font-size:13px;color:#666\"\u003ePublished, not measured on Kentino hardware.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eNot ideal for\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eCost-per-TFLOPS optimization — 4x RTX 4090 gives 2 644 aggregate TOPS at ~40 % of the component cost (without ECC \/ datacenter warranty)\u003c\/li\u003e\n\u003cli\u003eFrontier 200B+ dense models — 96 GB pool ceiling applies (need 192+ GB SKU)\u003c\/li\u003e\n\u003cli\u003eVideo generation at bf16 long-form full-resolution (Wan 2.2 MoE two-expert wants more VRAM)\u003c\/li\u003e\n\u003cli\u003eTraining from scratch — L40 is inference-certified; use RTX Pro 6000 \/ workstation Blackwell for training\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWarranty and lead time\u003c\/h2\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:24px\"\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e2 years\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003eparts warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e1 year\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elabor warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e10-28 days\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elead time\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"font-size:14px;color:#666\"\u003eNVIDIA OEM 3-year datacenter warranty on L40 + Kentino integration warranty (2 years parts, 1 year labor). Build includes assembly, BIOS configuration, driver install, burn-in testing, and functional verification.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eRecommended add-ons\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eUpgrade to 4x L40 (K-AI 192 Rome L40 1448TOPS) for 192 GB ECC pool and frontier-tier serving\u003c\/li\u003e\n\u003cli\u003eUpgrade RAM to 512 GB (add 4x 64 GB DDR4) for larger embedding \/ reranker stacks\u003c\/li\u003e\n\u003cli\u003eUpgrade NVMe to 4 TB for model library + dataset staging\u003c\/li\u003e\n\u003cli\u003eRedundant PSU upsell (dual 2 kW synced) available on request\u003c\/li\u003e\n\u003cli\u003eRack PDU + 3 kVA online UPS for production colo\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003c\/div\u003e","brand":"Kentino s.r.o.","offers":[{"title":"Default Title","offer_id":52928513704264,"sku":null,"price":23144.0,"currency_code":"EUR","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0843\/5479\/3800\/files\/kentino-ai-server-4-gpu-topdown_6b2c51b2-25c1-479d-929a-29eebe60e5ef.jpg?v=1776940959"},{"product_id":"k-ai-96-rome-rtxpro6000-2000tops-single-card-96-gb-blackwell-workstation-server","title":"K-AI 96 Rome RTXPro6000 2000TOPS — Single-Card 96 GB Blackwell Workstation Server","description":"\u003cdiv style=\"font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;line-height:1.7;color:#1a1a1a\"\u003e\n\n\u003cdiv style=\"background:linear-gradient(135deg,#0d0d0d 0%,#1a1a2e 100%);color:#fff;padding:32px;border-radius:12px;margin-bottom:32px\"\u003e\n\u003cp style=\"font-size:18px;margin:0 0 20px 0;color:#ccc\"\u003eK-AI 96 Rome RTXPro6000 2000TOPS\u003c\/p\u003e\n\u003cp style=\"font-size:28px;font-weight:700;margin:0 0 16px 0;line-height:1.3\"\u003e96 GB ECC Single-Card Workstation Server\u003cbr\u003e1x RTX Pro 6000 Blackwell | EPYC Milan | 2 000 TOPS INT8\u003c\/p\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-top:24px\"\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e2 000\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eINT8 TOPS\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e96 GB\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eECC VRAM\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003esingle\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003ecard design\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003efp8\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003enative Blackwell\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"margin-top:20px;font-size:15px;color:#aaa\"\u003eOne card, 96 GB ECC VRAM, the entire Blackwell tensor pipeline. 70B dense bf16 on a single GPU — no tensor-parallel overhead.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003cp style=\"font-size:17px;color:#333;margin-bottom:24px\"\u003eA 4U rack-mount workstation server with a single NVIDIA RTX Pro 6000 Blackwell Workstation card (96 GB ECC GDDR7), one AMD EPYC 7643 Milan CPU (48C\/96T), 256 GB DDR4 ECC, 2 TB NVMe boot, and one 2 kW ATX PSU with 54 % headroom. The simplest software path Kentino ships — no tensor-parallel config, no multi-GPU debugging. vLLM, SGLang, llama.cpp, ComfyUI run single-device and just work.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eHardware\u003c\/h2\u003e\n\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-bottom:24px;font-size:15px\"\u003e\n\u003cthead\u003e\u003ctr style=\"background:#0d0d0d;color:#fff\"\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eComponent\u003c\/th\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eDetail\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eGPU\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e1x NVIDIA RTX Pro 6000 Blackwell Workstation 96 GB ECC GDDR7 (600 W, PCIe 5.0 x16)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eVRAM\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e96 GB ECC on a single card — no pooling, no tensor-parallel overhead\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCPU\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eAMD EPYC 7643 Milan (48C\/96T, 225 W, 128x PCIe 4.0 lanes)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eMotherboard\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eASRock Rack ROMED8-2T (SP3, 7x PCIe 4.0 x16, 8x DDR4 ECC, 2x 10 GbE, IPMI)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eSystem RAM\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e256 GB DDR4-2666 ECC RDIMM (4x 64 GB)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eBoot \/ storage\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e2 TB NVMe M.2 (PCIe 4.0 x4)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003ePower supply\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e1x 2 kW ATX PSU\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eChassis\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e4U rack-mount (4-slot capacity, 1 populated — room to expand)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCooling\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eArctic Freezer 4U-M SP3 tower + 3x 120 mm front intake + 1x 120 mm rear exhaust\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eNetwork\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eOnboard dual 10 GbE (Intel X550)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:32px\"\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003ePower envelope\u003c\/h3\u003e\n\u003cul style=\"margin:0;padding-left:18px;font-size:14px;color:#444\"\u003e\n\u003cli\u003eGPU draw: 1 x 600 W = 600 W\u003c\/li\u003e\n\u003cli\u003eSystem total at full load: ~925 W\u003c\/li\u003e\n\u003cli\u003ePSU total: 2 000 W — 53.8 % headroom\u003c\/li\u003e\n\u003cli\u003eSingle PSU, simple cabling — generous margin for single-card build\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003eLane topology\u003c\/h3\u003e\n\u003cp style=\"margin:0;font-size:14px;color:#444\"\u003ePCIe Gen4 x16 at the GPU (card is Gen5 native; Rome board caps at Gen4). Direct root-complex connection — no PCIe switch. No NVLink required — single card, no inter-GPU link at all. Six x16 slots remain open for NIC \/ storage \/ expansion.\u003c\/p\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWhat you can run\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#fefaf0;border-left:4px solid #fab400;padding:16px 20px;margin-bottom:24px;border-radius:0 8px 8px 0\"\u003e\n\u003cp style=\"margin:0;font-size:15px;color:#333\"\u003eWith 96 GB of ECC VRAM on a single Blackwell card, this server handles 70B dense bf16 on one GPU, open-weight LLMs, vision models, image and video generation, speech AI, and production inference — no tensor-parallel coordination needed.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eLLMs — text \/ reasoning \/ coding\u003c\/h3\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin-bottom:8px\"\u003eChinese frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwen3 \/ Qwen3.5 (Alibaba):\u003c\/strong\u003e Qwen3-32B dense bf16 (~65 GB) with generous KV; Qwen3-72B Q6 (~58 GB, ~25-35 tok\/s single-stream); Qwen3-30B-A3B MoE bf16; Qwen3-Coder-30B-A3B agentic at 256k ctx; Qwen3.5-122B-A10B Q4 (~70 GB) with tight KV; QwQ-32B bf16 reasoning\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eDeepSeek:\u003c\/strong\u003e DeepSeek-R2 32B sparse MoE bf16 (~64 GB, 92.7 % AIME 2025 single-card); DeepSeek-R1-Distill-Qwen-32B bf16; DeepSeek-V2-Lite 16B full precision\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eGLM \/ Z.ai:\u003c\/strong\u003e GLM-4.5-Air 106B\/12B Q4-Q5 (60-70 GB); GLM-4.6V 106B Q4\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eTencent Hunyuan:\u003c\/strong\u003e Hunyuan-A13B 80B\/13B MoE Q4-fp8 (~48-80 GB) with 256k ctx and dual-mode reasoning\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eByteDance Seed-OSS-36B\u003c\/strong\u003e bf16 (~72 GB tight) or fp8 (~36 GB) with full 512k native context\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eBaidu ERNIE-4.5-47B-A3B\u003c\/strong\u003e Q4-fp8 with long context\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin:20px 0 8px 0\"\u003eWestern frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eMeta Llama:\u003c\/strong\u003e Llama 3.3 70B at bf16 (~70 GB) on a single card with 8-16k ctx — the hero config; Llama 3.3 70B Q6 (~58 GB, ~35-50 tok\/s single-stream); Llama 3.1 8B bf16 (~80-120 tok\/s); Llama 3.2 90B Vision Q4 (~52 GB); Llama 4 Scout 109B\/17B MoE Q4 (~63 GB)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eMistral:\u003c\/strong\u003e Mistral Small 3 \/ Magistral Small 1.2 \/ Devstral Small 2 (24B) all at bf16 with 256k ctx; Mixtral 8x7B Q6; Codestral Mamba 7B; Pixtral 12B bf16\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eOpenAI (open weights):\u003c\/strong\u003e gpt-oss-20b MXFP4 native (16 GB); gpt-oss-120b MXFP4 native (80 GB) — single-card single-stream\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eGoogle Gemma 3:\u003c\/strong\u003e 27B multimodal bf16 (~54 GB) with 128k ctx; 12B \/ 4B bf16\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eMicrosoft Phi-4\u003c\/strong\u003e 14B dense bf16; Phi-4-reasoning; Phi-4-multimodal\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eNVIDIA Nemotron:\u003c\/strong\u003e Llama-3.1-Nemotron-Super 49B Q6 (~40 GB); Nemotron-Nano 8B\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eOthers:\u003c\/strong\u003e IBM Granite 4.0 H-Small 32B\/9B; OLMo 2 32B; Reka Flash 3 21B; Falcon H1R 7B; Command R 35B\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVision-Language Models\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eQwen3-VL-8B \/ 32B bf16, Qwen3-VL-30B-A3B MoE bf16, Qwen3-Omni-30B-A3B; InternVL3 up to 78B Q4 (~48 GB); InternVL3.5-38B bf16; DeepSeek-VL2 full range; Llama 3.2 11B Vision bf16; Llama 3.2 90B Vision Q4 (~52 GB); Pixtral 12B bf16; Molmo 72B Q4; Molmo 7B bf16; Gemma 3 12B \/ 27B multimodal; PaliGemma 2 28B; Phi-3.5-Vision; Aya Vision 8B \/ 32B; MiniCPM-V 2.6 \/ MiniCPM-o 2.6; GLM-4.6V.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eImage generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eFLUX.1 [dev] \/ [schnell] bf16 (~24 GB) and quantized (~15-25 s\/image at fp8); FLUX.1 Kontext [dev] in-context editing; FLUX Tools (Fill \/ Depth \/ Canny \/ Redux); SD 3.5 Large bf16 (~18 GB); SDXL 1.0; HunyuanImage-2.1 bf16 (~34 GB) at 2K native; HunyuanDiT 1.5B; Kolors \/ Kolors 2.0; AuraFlow v0.3; OmniGen v1; PixArt-Sigma.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVideo generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eWan 2.2 T2V-A14B \/ I2V-A14B MoE bf16 (~54 GB, both experts resident); Wan 2.2 TI2V-5B fast path; HunyuanVideo 13B bf16 (~60-80 GB, tight at 720p); HunyuanVideo 1.5 (8.3B); CogVideoX-5B; Open-Sora 2.0 (11B) bf16; Genmo Mochi-1 bf16 (~42 GB); LTX-Video; Pyramid Flow; SVD \/ SV3D \/ SV4D; NVIDIA Cosmos Predict 2.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eAudio \/ Speech \/ TTS\u003c\/h3\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eASR:\u003c\/strong\u003e Whisper v3 large \/ turbo (~50x realtime); NVIDIA Parakeet-TDT 1.1B; Canary 1B; Qwen3-ASR; SenseVoice\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eTTS:\u003c\/strong\u003e CosyVoice 2 \/ Fun-CosyVoice 3.0; Kokoro 82M; Stable Audio Open; Coqui XTTS v2; StyleTTS 2; Step-Audio-EditX\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eRealtime \/ S2S:\u003c\/strong\u003e Kyutai Moshi (200 ms full-duplex); Step-Audio 2 mini; Step-Audio-R1 \/ R1.1; Qwen2.5-Omni-7B\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eMusic \/ SFX:\u003c\/strong\u003e Meta MusicGen; AudioGen; Suno Bark; SeamlessM4T v2\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eMulti-model \/ multi-tenant serving\u003c\/h3\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eSingle-tenant streaming coding assistant — 70B dense bf16, low latency, no TP penalty\u003c\/li\u003e\n\u003cli\u003eMixed resident stack: Qwen3-32B bf16 + FLUX.1 fp8 + Whisper-turbo + Moshi on one card with partitioned VRAM\u003c\/li\u003e\n\u003cli\u003eFine-tuning: LoRA \/ QLoRA on 13-34B models; full-param on 7B\u003c\/li\u003e\n\u003cli\u003eEmbedding service: BGE-M3 \/ E5 \/ Jina resident alongside a generator LLM\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eTarget workloads\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eSingle-tenant streaming coding assistant running Llama 3.3 70B bf16 or Qwen3-Coder-30B-A3B — no TP coordination overhead\u003c\/li\u003e\n\u003cli\u003eDeveloper workstation for a single engineer or tight team needing a 70B-class model with 32-128k context\u003c\/li\u003e\n\u003cli\u003eVideo or image generation lab — HunyuanVideo 13B, Wan 2.2 dual-expert, HunyuanImage-2.1 all at bf16 resident\u003c\/li\u003e\n\u003cli\u003eVLM \/ OCR bench — Qwen3-VL-32B bf16 or InternVL3.5-38B with long-document pipelines\u003c\/li\u003e\n\u003cli\u003eClean appliance for a small LLM API gateway — one model, one card, easy ops\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eMeasured performance\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#0d0d0d;color:#fff;border-radius:12px;padding:24px;margin-bottom:24px\"\u003e\n\u003cp style=\"margin:0 0 4px 0;font-size:13px;color:#888;text-transform:uppercase;letter-spacing:1px\"\u003ePublished references | NVIDIA RTX Pro 6000 Blackwell datasheet + community benchmarks\u003c\/p\u003e\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-top:16px;font-size:14px\"\u003e\n\u003cthead\u003e\u003ctr style=\"border-bottom:1px solid #333\"\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eBenchmark\u003c\/th\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eResult\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003ePer-card INT8 TOPS (NVIDIA datasheet)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e2 000 TOPS\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eVRAM per card\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e96 GB ECC GDDR7\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eMemory bandwidth\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~1 800 GB\/s\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eLlama 3.3 70B Q6 single-GPU (community)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e40-55 tok\/s single-stream\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eLlama 3.3 70B bf16 single-GPU (community)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e15-25 tok\/s single-stream\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eBlackwell fp8 native\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003eDeepSeek-V3 fp8, Hunyuan-A13B fp8 run without bf16 upcast\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\u003cp style=\"margin:12px 0 0 0;font-size:13px;color:#666\"\u003ePublished external references, not measured on Kentino hardware. Kentino will publish first-party numbers after the first customer build.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eNot ideal for\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eTraining large models from scratch (single GPU — no tensor\/pipeline parallelism)\u003c\/li\u003e\n\u003cli\u003eFrontier 200B+ MoE at real quantizations (Qwen3-235B Q4, GLM-4.5\/4.6 — use K-AI 192 RTXPro6000 or larger)\u003c\/li\u003e\n\u003cli\u003eHigh-concurrency multi-tenant inference (single card caps aggregate throughput; 4x RTX 4090 or 4x L40 scale better)\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWarranty and lead time\u003c\/h2\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:24px\"\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e2 years\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003eparts warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e1 year\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elabor warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e10-28 days\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elead time\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"font-size:14px;color:#666\"\u003eNVIDIA OEM 3-year warranty on RTX Pro 6000 + Kentino integration warranty. Build includes assembly, BIOS configuration, driver install, burn-in testing, and functional verification. Lead time depends on component availability, confirmed at order.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eRecommended add-ons\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eUpgrade RAM to 512 GB (add 4x 64 GB DDR4 — four DIMM slots still open)\u003c\/li\u003e\n\u003cli\u003e4 TB NVMe secondary drive for model library \/ dataset staging\u003c\/li\u003e\n\u003cli\u003e24U open cabinet for production rack-mount\u003c\/li\u003e\n\u003cli\u003eFor Gen5 x16 link speed consider the Genoa-platform variant on request\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003c\/div\u003e","brand":"Kentino s.r.o.","offers":[{"title":"Default Title","offer_id":52940156698952,"sku":null,"price":15847.0,"currency_code":"EUR","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0843\/5479\/3800\/files\/kentino-ai-server-4-gpu-topdown_6b2c51b2-25c1-479d-929a-29eebe60e5ef.jpg?v=1776940959"},{"product_id":"k-ai-128-rome-5090-6704tops-4-rtx-5090-blackwell-ai-server","title":"K-AI 128 Rome 5090 6704TOPS — 4× RTX 5090 Blackwell AI Server","description":"\u003cdiv style=\"font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;line-height:1.7;color:#1a1a1a\"\u003e\n\n\u003cdiv style=\"background:linear-gradient(135deg,#0d0d0d 0%,#1a1a2e 100%);color:#fff;padding:32px;border-radius:12px;margin-bottom:32px\"\u003e\n\u003cp style=\"font-size:18px;margin:0 0 20px 0;color:#ccc\"\u003eK-AI 128 Rome 5090 6704TOPS\u003c\/p\u003e\n\u003cp style=\"font-size:28px;font-weight:700;margin:0 0 16px 0;line-height:1.3\"\u003e128 GB VRAM Blackwell Inference Server\u003cbr\u003e4x RTX 5090 | EPYC Milan | 6 704 TOPS INT8\u003c\/p\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-top:24px\"\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e6 704\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eINT8 TOPS\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e128 GB\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eVRAM pool\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003eBlackwell\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003efp8 native\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e2.5x\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003evs 4090 TOPS\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"margin-top:20px;font-size:15px;color:#aaa\"\u003eFour Blackwell RTX 5090 with native fp8\/fp4 tensor paths. Highest-throughput 4-GPU build on the Rome platform.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003cp style=\"font-size:17px;color:#333;margin-bottom:24px\"\u003eA 4U rack-mount inference server with four GeForce RTX 5090 pooled to 128 GB VRAM, one AMD EPYC 7643 Milan CPU (48C\/96T), 512 GB DDR4 ECC (all 8 DIMM slots populated for max bandwidth), 2 TB NVMe boot, and dual synchronized 2 kW ATX PSU. Runs vLLM, SGLang, llama.cpp, ComfyUI with Blackwell-native fp8 and MXFP4 inference kernels.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eHardware\u003c\/h2\u003e\n\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-bottom:24px;font-size:15px\"\u003e\n\u003cthead\u003e\u003ctr style=\"background:#0d0d0d;color:#fff\"\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eComponent\u003c\/th\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eDetail\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eGPUs\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e4x NVIDIA GeForce RTX 5090 32 GB GDDR7 (Blackwell, 575 W, PCIe 5.0 x16)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eVRAM pool\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e128 GB total across 4 cards (no NVLink on consumer 5090)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCPU\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eAMD EPYC 7643 Milan (48C\/96T, 225 W, 128x PCIe 4.0 lanes)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eMotherboard\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eASRock Rack ROMED8-2T (SP3, 7x PCIe 4.0 x16, 8x DDR4 ECC, 2x 10 GbE, IPMI)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eSystem RAM\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e512 GB DDR4-2666 ECC RDIMM (8x 64 GB — all DIMM slots populated)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eBoot \/ storage\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e2 TB NVMe M.2 (PCIe 4.0 x4)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003ePower supply\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eDual 2 kW ATX PSU with sync cable + 12VHPWR adapter set\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eChassis\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e4U rack-mount, 4x GPU, passive PCIe 4.0 x16 risers\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCooling\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eArctic Freezer 4U-M SP3 tower + 3x 120 mm front intake + 1x 120 mm rear exhaust\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eNetwork\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eOnboard dual 10 GbE (Intel X550)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:32px\"\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003ePower envelope\u003c\/h3\u003e\n\u003cul style=\"margin:0;padding-left:18px;font-size:14px;color:#444\"\u003e\n\u003cli\u003eGPU draw: 4 x 575 W = 2 300 W\u003c\/li\u003e\n\u003cli\u003eSystem total at full load: ~2 650 W\u003c\/li\u003e\n\u003cli\u003ePSU total: 4 000 W (dual 2 kW synced) — 33.8 % headroom\u003c\/li\u003e\n\u003cli\u003eDual PSU for split power delivery — each PSU feeds a portion of the system\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003eLane topology\u003c\/h3\u003e\n\u003cp style=\"margin:0;font-size:14px;color:#444\"\u003eROMED8-2T fans out 128 PCIe Gen4 lanes from the EPYC directly to seven x16 slots; four populated by GPUs at Gen4 x16. No PCIe switch. No NVLink on consumer 5090 — inter-GPU peer-to-peer. Cards are Gen5 native; Rome caps at Gen4.\u003c\/p\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWhat you can run\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#fefaf0;border-left:4px solid #fab400;padding:16px 20px;margin-bottom:24px;border-radius:0 8px 8px 0\"\u003e\n\u003cp style=\"margin:0;font-size:15px;color:#333\"\u003eWith 128 GB pooled VRAM and Blackwell-native fp8 tensor paths, this server steps up to Qwen3-235B-A22B Q4 and gpt-oss-120b MXFP4 with real KV headroom — beyond what 4x RTX 4090 can reach.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eLLMs — text \/ reasoning \/ coding\u003c\/h3\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin-bottom:8px\"\u003eChinese frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwen3 \/ Qwen3.5 (Alibaba):\u003c\/strong\u003e Qwen3-235B-A22B Q3-Q4 (~112-132 GB) fits the 128 GB pool with 8-16k ctx — the hero config; Qwen3-32B dense bf16 (~65 GB) with massive KV; Qwen3-Coder-30B-A3B agentic at 1M ctx; Qwen3.5-122B-A10B Q6\/fp8 (~75-80 GB); QwQ-32B bf16 reasoning\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eDeepSeek:\u003c\/strong\u003e DeepSeek-V3\/R1\/V3.1\/V3.2 fp8-native Q2 (~215 GB) with RAM spill across 512 GB host — feasible for batch; DeepSeek-R2 32B bf16 multi-stream (4 concurrent, one per card)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eGLM \/ Z.ai:\u003c\/strong\u003e GLM-4.5-Air 106B\/12B fp8 (~106 GB) or Q6 comfortably; GLM-4.5\/4.6\/4.7 Q2_K_XL (~135 GB) tight with MoE offload\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eTencent Hunyuan:\u003c\/strong\u003e Hunyuan-A13B fp8 native (~80 GB) — Blackwell runs fp8 without upcast penalty; Hunyuan-Large Q2 with RAM spill\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eByteDance Seed-OSS-36B\u003c\/strong\u003e bf16 with 512k native; ERNIE-4.5-424B Q2 (~150 GB spill)\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin:20px 0 8px 0\"\u003eWestern frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eMeta Llama:\u003c\/strong\u003e Llama 3.3 70B Q4 across 4x 5090 (~30-40 tok\/s single-stream, ~270+ tok\/s batch-32 vLLM); Llama 4 Scout 109B\/17B MoE fp8\/Q6 (~90 GB); Llama 4 Maverick 400B\/17B Q3 (~188 GB spill)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eMistral:\u003c\/strong\u003e Mistral Small 3 \/ Magistral \/ Devstral Small 2 (24B) bf16 multi-stream; Pixtral Large \/ Mistral Large 2 (123B) Q6 (~88 GB)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eOpenAI (open weights):\u003c\/strong\u003e gpt-oss-120b MXFP4 native (80 GB) with real KV and long context — Blackwell hero workload; gpt-oss-20b MXFP4\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eGoogle Gemma 3:\u003c\/strong\u003e 27B multimodal bf16 (~54 GB) two concurrent streams; 12B \/ 4B\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eMicrosoft Phi-4\u003c\/strong\u003e 14B dense bf16; Phi-4-reasoning distilled\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eNVIDIA Nemotron:\u003c\/strong\u003e Llama-3.1-Nemotron Ultra 253B Q3 (~119 GB) tight; Super 49B bf16 (~98 GB)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eOthers:\u003c\/strong\u003e Cohere Command R+ 104B Q6 (~85 GB); Molmo 72B Q6-bf16 VLM; OLMo 2 32B; IBM Granite 4.0 H-Small\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVision-Language Models\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eQwen3-VL-235B-A22B Q3-Q4; Qwen3-VL-32B bf16; InternVL3.5-241B-A28B Q4 (~135 GB tight); InternVL3 78B bf16; Llama 3.2 90B Vision Q6 (~74 GB); Pixtral Large 124B Q6 (~88 GB); Molmo 72B Q6\/bf16; Gemma 3 27B multimodal bf16; GLM-4.6V 106B fp8.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eImage generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eFLUX.1 [dev] bf16 and fp8 (~10-18 s\/image at fp8); FLUX.1 Kontext [dev]; SD 3.5 Large bf16; HunyuanImage-2.1 bf16 and Q4; HunyuanImage-3.0 base (80B MoE, 13B active) bf16 (~80 GB, hero footprint); HunyuanDiT; Kolors \/ Kolors 2.0; AuraFlow v0.3; OmniGen v1; PixArt-Sigma.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVideo generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eWan 2.2 MoE two-expert bf16 (~54 GB, full ctx); Wan 2.2 TI2V-5B; HunyuanVideo 13B bf16 both experts (~60-80 GB); HunyuanVideo 1.5; CogVideoX-5B bf16; Open-Sora 2.0 11B bf16 (~24 GB); Genmo Mochi-1 bf16 (~42 GB); LTX-Video; Pyramid Flow; SVD \/ SV3D \/ SV4D; NVIDIA Cosmos.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eAudio \/ Speech \/ TTS\u003c\/h3\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eASR:\u003c\/strong\u003e Whisper v3 large \/ turbo (~50x realtime); Parakeet-TDT; Canary 1B; Qwen3-ASR; SenseVoice\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eTTS:\u003c\/strong\u003e CosyVoice 2 \/ 3; Kokoro 82M; Stable Audio Open; XTTS v2; StyleTTS 2; Step-Audio-EditX\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eRealtime \/ S2S:\u003c\/strong\u003e Kyutai Moshi 7B; Step-Audio 2 mini\/R1; Qwen2.5-Omni-7B\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eMusic \/ SFX:\u003c\/strong\u003e MusicGen \/ AudioGen \/ Bark; SeamlessM4T v2\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eMulti-model \/ multi-tenant serving\u003c\/h3\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e200B MoE at Q4 with batch inference (Qwen3-235B, GLM-4.5\/4.6\/4.7-Air) for 8-16 concurrent users\u003c\/li\u003e\n\u003cli\u003efp8-native frontier — DeepSeek V3 family, Hunyuan-Large fp8 with Blackwell native paths\u003c\/li\u003e\n\u003cli\u003eMixed resident stack: gpt-oss-120b MXFP4 + FLUX.1 + Whisper + Moshi on partitioned VRAM\u003c\/li\u003e\n\u003cli\u003eHigh-throughput 70B — tensor-parallel vLLM \/ SGLang with 200+ tok\/s batch aggregate\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eTarget workloads\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e200B+ MoE production serving at Q3-Q4 with real KV (Qwen3-235B, GLM-4.5-Air 106B)\u003c\/li\u003e\n\u003cli\u003efp8-native frontier inference (DeepSeek V3\/R1 fp8, Hunyuan fp8) — Blackwell runs without upcast\u003c\/li\u003e\n\u003cli\u003eHigh-throughput 70B serving — tensor-parallel batch via vLLM or SGLang\u003c\/li\u003e\n\u003cli\u003eVideo generation studio at bf16 (Wan 2.2 dual-expert, HunyuanVideo 13B, Mochi-1)\u003c\/li\u003e\n\u003cli\u003eMulti-tenant mixed workload — 120B MoE + image gen + realtime voice all resident\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eMeasured performance\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#0d0d0d;color:#fff;border-radius:12px;padding:24px;margin-bottom:24px\"\u003e\n\u003cp style=\"margin:0 0 4px 0;font-size:13px;color:#888;text-transform:uppercase;letter-spacing:1px\"\u003ePublished references | NVIDIA RTX 5090 datasheet + community benchmarks\u003c\/p\u003e\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-top:16px;font-size:14px\"\u003e\n\u003cthead\u003e\u003ctr style=\"border-bottom:1px solid #333\"\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eBenchmark\u003c\/th\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eResult\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003ePer-card INT8 TOPS (NVIDIA datasheet)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e1 676 TOPS\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eAggregate INT8 TOPS (4 cards)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e6 704 TOPS\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eMemory bandwidth per card\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~1 792 GB\/s\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eLlama 3.3 70B Q6 via vLLM (community)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e60-90 tok\/s single-stream, 300+ tok\/s batch\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eQwen3-235B-A22B Q3-Q4\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003eFits 128 GB pool with 8-16k ctx\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003egpt-oss-120b MXFP4 native\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e80 GB — comfortable with KV headroom\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\u003cp style=\"margin:12px 0 0 0;font-size:13px;color:#666\"\u003ePublished external references, not measured on Kentino hardware. Kentino will publish first-party numbers after the first customer build.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eNot ideal for\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eFrontier 400B+ at Q4 (Kimi-K2, Mistral Large 3, Intern-S1-Pro — require 8-GPU or 6x RTX Pro 6000)\u003c\/li\u003e\n\u003cli\u003ePCIe Gen5-link-sensitive workloads — pick the Genoa SKU for native Gen5 x16\u003c\/li\u003e\n\u003cli\u003eTraining from scratch (no NVLink on consumer 5090)\u003c\/li\u003e\n\u003cli\u003eECC-sensitive 24\/7 production — consumer 5090 has no ECC; prefer L40 or RTX Pro 6000 Server Edition\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWarranty and lead time\u003c\/h2\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:24px\"\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e2 years\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003eparts warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e1 year\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elabor warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e10-28 days\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elead time\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"font-size:14px;color:#666\"\u003eBuild includes assembly, BIOS configuration, driver install, burn-in testing, and functional verification. Lead time depends on component availability, confirmed at order.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eRecommended add-ons\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eUpgrade PSU to dual 2.5 kW (FSP) for sustained worst-case bf16 + video — recommended for 24\/7\u003c\/li\u003e\n\u003cli\u003e4 TB NVMe for model library + MoE weight staging\u003c\/li\u003e\n\u003cli\u003e24U open cabinet for multi-server deployment\u003c\/li\u003e\n\u003cli\u003eConsider the Genoa-platform variant on request for Gen5 x16 link\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003c\/div\u003e","brand":"Kentino s.r.o.","offers":[{"title":"Default Title","offer_id":52940164497736,"sku":null,"price":25372.0,"currency_code":"EUR","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0843\/5479\/3800\/files\/kentino-ai-server-4-gpu-topdown_6b2c51b2-25c1-479d-929a-29eebe60e5ef.jpg?v=1776940959"},{"product_id":"k-ai-144-rome-l4-1452tops-6-nvidia-l4-epyc-milan","title":"K-AI 144 Rome L4 1452TOPS — 6× NVIDIA L4 — EPYC Milan","description":"\u003cdiv style=\"font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;line-height:1.7;color:#1a1a1a\"\u003e\n\n\u003cdiv style=\"background:linear-gradient(135deg,#0d0d0d 0%,#1a1a2e 100%);color:#fff;padding:32px;border-radius:12px;margin-bottom:32px\"\u003e\n\u003cp style=\"font-size:18px;margin:0 0 20px 0;color:#ccc\"\u003eK-AI 144 Rome L4 1452TOPS\u003c\/p\u003e\n\u003cp style=\"font-size:28px;font-weight:700;margin:0 0 16px 0;line-height:1.3\"\u003e144 GB VRAM Silent Edge Inference Server\u003cbr\u003e6x NVIDIA L4 Passive | EPYC Milan | 1 452 TOPS INT8\u003c\/p\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-top:24px\"\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e1 452\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eINT8 TOPS\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e144 GB\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eVRAM pool\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e432 W\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eGPU envelope\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003esilent\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003epassive GPUs\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"margin-top:20px;font-size:15px;color:#aaa\"\u003eSix passive L4 datacenter cards. Quietest AI server in Kentino's lineup — acceptable for office-edge deployment.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003cp style=\"font-size:17px;color:#333;margin-bottom:24px\"\u003eA 4U single-socket inference server with six passive NVIDIA L4 cards (24 GB each, 144 GB pool), one AMD EPYC 7643 Milan CPU (48C\/96T), 384 GB DDR4 ECC, 2 TB NVMe boot, and a single 2 kW ATX PSU with 62 % headroom. Dense-edge inference workhorse for embedding fleets, multi-tenant small\/mid-size LLM serving, and watts-per-query deployments near office space.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eHardware\u003c\/h2\u003e\n\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-bottom:24px;font-size:15px\"\u003e\n\u003cthead\u003e\u003ctr style=\"background:#0d0d0d;color:#fff\"\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eComponent\u003c\/th\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eDetail\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eGPUs\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e6x NVIDIA L4 24 GB (Ada Lovelace, passive, 72 W, single-slot LP, PCIe Gen4 x8)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eVRAM pool\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e144 GB aggregate across 6 cards\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCPU\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eAMD EPYC 7643 Milan (48C\/96T, 225 W, 128 PCIe 4.0 lanes)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eMotherboard\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eASRock Rack ROMED8-2T (SP3, 7x PCIe 4.0 x16, 8x DDR4 ECC, 2x 10 GbE, IPMI)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eSystem RAM\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e384 GB DDR4-2666 ECC RDIMM (6x 64 GB)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eBoot \/ storage\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e2 TB NVMe M.2 (PCIe 4.0 x4)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003ePower supply\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e1x 2 kW ATX PSU\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eChassis\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e4U rack-mount (6-card layout)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCooling\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eSP3 tower cooler + front-to-back directed airflow (industrial fans)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eNetwork\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eOnboard dual 10 GbE (Intel X550)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:32px\"\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003ePower envelope\u003c\/h3\u003e\n\u003cul style=\"margin:0;padding-left:18px;font-size:14px;color:#444\"\u003e\n\u003cli\u003eGPU draw: 6 x 72 W = 432 W\u003c\/li\u003e\n\u003cli\u003eSystem total at full load: ~757 W\u003c\/li\u003e\n\u003cli\u003ePSU total: 2 000 W — 62 % headroom\u003c\/li\u003e\n\u003cli\u003eSilent operation, massive thermal margin\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003eLane topology\u003c\/h3\u003e\n\u003cp style=\"margin:0;font-size:14px;color:#444\"\u003eL4 is PCIe Gen4 x8 native — no bandwidth loss vs host. ROMED8-2T provides 7x x16 slots; one slot left free for NIC upsell. No PCIe switch required. No NVLink.\u003c\/p\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWhat you can run\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#fefaf0;border-left:4px solid #fab400;padding:16px 20px;margin-bottom:24px;border-radius:0 8px 8px 0\"\u003e\n\u003cp style=\"margin:0;font-size:15px;color:#333\"\u003eAt 144 GB aggregate across 6 physical cards, the sweet spot is concurrent multi-model serving: run a 70B dense at Q4, a 30B MoE, a 14B coder, a VLM and an embedding model concurrently and still have KV headroom.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eLLMs — text \/ reasoning \/ coding\u003c\/h3\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin-bottom:8px\"\u003eChinese frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwen3 \/ Qwen3.5 (Alibaba):\u003c\/strong\u003e Qwen3-30B-A3B Q4-Q6; QwQ-32B Q6; Qwen3-32B dense Q6; Qwen3.5-122B-A10B Q4-Q5 (~75 GB comfortable); Qwen3-235B-A22B Q3 (~112 GB) tight, short ctx\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eDeepSeek:\u003c\/strong\u003e DeepSeek-R2 32B sparse MoE Q4-Q6 (single-card capable, 6x concurrent streams, ~15-20 tok\/s per stream); Seed-OSS-36B Q4-Q6 with 512k native context\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eGLM \/ Z.ai:\u003c\/strong\u003e GLM-4.5-Air Q4-Q5 (60-70 GB comfortable); Hunyuan-A13B Q4-Q6 (~48 GB)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eBaidu ERNIE-4.5-47B-A3B\u003c\/strong\u003e Q4; Step-3.5-Flash Q3-Q4 with some RAM spill\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin:20px 0 8px 0\"\u003eWestern frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eMeta Llama:\u003c\/strong\u003e Llama 3.3 70B Q4-Q6 (43-58 GB) with generous KV (~10-17 tok\/s single-stream across 6x L4 tensor-parallel); Llama 4 Scout 109B\/17B MoE Q4 (~63 GB) comfortable\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eMistral:\u003c\/strong\u003e Mistral Small 3 \/ Magistral Small 1.2 \/ Devstral Small 2 (24B) at bf16 (~50-65 tok\/s per L4 card); Mixtral 8x22B Q4\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eOpenAI (open weights):\u003c\/strong\u003e gpt-oss-120b MXFP4 native (~80 GB) with room to spare; gpt-oss-20b MXFP4\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eGoogle Gemma 3:\u003c\/strong\u003e 27B bf16; Phi-4 14B bf16\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eNVIDIA Nemotron:\u003c\/strong\u003e Llama-3.1-Nemotron Super 49B Q4-Q6; Pixtral 12B \/ Pixtral Large Q4 (~72 GB)\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVision-Language Models\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eQwen3-VL-8B\/32B, Qwen3-VL-30B-A3B MoE, InternVL3 up to 78B Q4 (~48 GB), InternVL3.5-38B, DeepSeek-VL2, Llama 3.2 11B Vision bf16, Llama 3.2 90B Vision Q4 (~52 GB), Molmo 72B Q4, Gemma 3 12B\/27B multimodal, MiniCPM-V 2.6 \/ MiniCPM-o 2.6, GLM-4.6V-Flash.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eImage generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eFLUX.1 [dev] \/ [schnell] fp8 (~20-35 s\/image on single L4 at fp8); FLUX.1 Kontext [dev]; FLUX Tools; SD 3.5 Large (18 GB fp16 \/ 11 GB fp8); SDXL 1.0; HunyuanImage-2.1 (~34 GB bf16); HunyuanDiT; Kolors 2.0; AuraFlow v0.3; OmniGen v1; PixArt-Sigma.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVideo generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eWan 2.2 T2V-A14B \/ I2V-A14B MoE (tight at bf16 ~54 GB); Wan 2.2 TI2V-5B fast path; HunyuanVideo 13B Q4-Q8 (~30 GB); HunyuanVideo 1.5 (8.3B); CogVideoX-5B; Open-Sora 2.0 Q8 (~16 GB); Mochi-1 Q4 (~18 GB); LTX-Video; Pyramid Flow; SVD \/ SV3D \/ SV4D; NVIDIA Cosmos.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eAudio \/ Speech \/ TTS\u003c\/h3\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eASR:\u003c\/strong\u003e Whisper v3 large \/ turbo (~50x realtime); Parakeet-TDT; Canary 1B; Qwen3-ASR; SenseVoice\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eTTS:\u003c\/strong\u003e CosyVoice 2 \/ 3; Kokoro 82M; Stable Audio Open; XTTS v2; StyleTTS 2; Step-Audio-EditX\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eRealtime \/ S2S:\u003c\/strong\u003e Kyutai Moshi 7B; Step-Audio 2 mini\/R1; Qwen2.5-Omni-7B\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eMusic \/ SFX:\u003c\/strong\u003e MusicGen \/ AudioGen \/ Bark; SeamlessM4T v2\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eMulti-model \/ multi-tenant serving\u003c\/h3\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e6 concurrent streams of a 24 GB Q4 model (one per card): e.g. 6x Qwen3-14B Q4 agents\u003c\/li\u003e\n\u003cli\u003eMixed fleet: Llama 3.3 70B Q4 (tensor-parallel over 2 cards) + FLUX.1 (1 card) + Whisper-turbo (1 card) + Moshi (1 card) + BGE-M3 embedder (1 card)\u003c\/li\u003e\n\u003cli\u003eEmbedding service at high QPS — 6x parallel embed streams of BGE-M3 \/ E5 \/ Nomic \/ Cohere Embed\u003c\/li\u003e\n\u003cli\u003eVideo transcode farm — 6x parallel NVENC\/NVDEC streams\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eTarget workloads\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eSaaS multi-tenant LLM API — serve 20-40 concurrent users across a 24B\/32B model with room for image and ASR alongside\u003c\/li\u003e\n\u003cli\u003eRAG backend — query-side embedder + 70B Q4 reader + reranker, sub-second latency, 50 QPS\u003c\/li\u003e\n\u003cli\u003eVideo-AI pipeline — live transcode + caption + moderation on 6 parallel streams\u003c\/li\u003e\n\u003cli\u003eEdge AI appliance near the office — low acoustic profile, zero datacenter dependency\u003c\/li\u003e\n\u003cli\u003eMid-tier model R\u0026amp;D bench — rapid iteration on 30-70B fine-tunes, one card per experiment\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eMeasured performance\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#0d0d0d;color:#fff;border-radius:12px;padding:24px;margin-bottom:24px\"\u003e\n\u003cp style=\"margin:0 0 4px 0;font-size:13px;color:#888;text-transform:uppercase;letter-spacing:1px\"\u003ePublished references | NVIDIA L4 datasheet + community benchmarks\u003c\/p\u003e\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-top:16px;font-size:14px\"\u003e\n\u003cthead\u003e\u003ctr style=\"border-bottom:1px solid #333\"\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eBenchmark\u003c\/th\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eResult\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003ePer-card INT8 TOPS (NVIDIA datasheet)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e242 TOPS\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eAggregate INT8 TOPS (6 cards)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e1 452 TOPS\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eLlama 3.1 8B Q4 on single L4 (community)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~35-45 tok\/s single-stream\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eBGE-M3 embedding QPS on L4 (community)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~800 QPS at 512-token input\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eWhisper v3 turbo realtime factor\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~1.5-2x realtime per card\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\u003cp style=\"margin:12px 0 0 0;font-size:13px;color:#666\"\u003ePublished external references, not measured on Kentino hardware. Kentino will publish first-party numbers after the first customer build.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eNot ideal for\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eFrontier 200B+ MoE at Q4+ with long context — 4x L40 or 8x RTX 4090 (192 GB pool, contiguous TP) is the right fit\u003c\/li\u003e\n\u003cli\u003eTraining workloads — L4 lacks FP8 and bandwidth for efficient training\u003c\/li\u003e\n\u003cli\u003eSingle-workload peak throughput — per-card compute is modest vs L40 \/ RTX Pro 6000\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWarranty and lead time\u003c\/h2\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:24px\"\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e2 years\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003eparts warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e1 year\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elabor warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e10-28 days\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elead time\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"font-size:14px;color:#666\"\u003eNVIDIA OEM 3-year warranty on L4 + Kentino integration warranty. Build includes assembly, BIOS configuration, driver install, burn-in testing, and functional verification. Lead time depends on component availability, confirmed at order.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eRecommended add-ons\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e4 TB NVMe upgrade for model library staging\u003c\/li\u003e\n\u003cli\u003e24U open rack cabinet with managed PDU\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003c\/div\u003e","brand":"Kentino s.r.o.","offers":[{"title":"Default Title","offer_id":52940172656968,"sku":null,"price":28681.0,"currency_code":"EUR","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0843\/5479\/3800\/files\/kentino-ai-server-4-gpu-topdown_6b2c51b2-25c1-479d-929a-29eebe60e5ef.jpg?v=1776940959"},{"product_id":"k-ai-192-rome-l40-1448tops-4-nvidia-l40-epyc-milan","title":"K-AI 192 Rome L40 1448TOPS — 4× NVIDIA L40 — EPYC Milan","description":"\u003cdiv style=\"font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;line-height:1.7;color:#1a1a1a\"\u003e\n\n\u003cdiv style=\"background:linear-gradient(135deg,#0d0d0d 0%,#1a1a2e 100%);color:#fff;padding:32px;border-radius:12px;margin-bottom:32px\"\u003e\n\u003cp style=\"font-size:18px;margin:0 0 20px 0;color:#ccc\"\u003eK-AI 192 Rome L40 1448TOPS\u003c\/p\u003e\n\u003cp style=\"font-size:28px;font-weight:700;margin:0 0 16px 0;line-height:1.3\"\u003e192 GB ECC Enterprise Inference Server\u003cbr\u003e4x NVIDIA L40 Passive | EPYC Milan | 1 448 TOPS INT8\u003c\/p\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-top:24px\"\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e1 448\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eINT8 TOPS\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e192 GB\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eECC VRAM\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003eECC\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003edatacenter grade\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e24\/7\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003epassive cooled\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"margin-top:20px;font-size:15px;color:#aaa\"\u003eFour passive L40 datacenter cards with ECC memory. Same 192 GB pool as 8x RTX 4090 — but datacenter-grade, ECC-protected, and OEM-warrantied.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003cp style=\"font-size:17px;color:#333;margin-bottom:24px\"\u003eA 4U rack-mount inference server with four passive NVIDIA L40 cards pooled to 192 GB ECC VRAM, one AMD EPYC 7643 Milan CPU (48C\/96T), 256 GB DDR4 ECC, 2 TB NVMe boot, and dual synchronized 2 kW ATX PSU. The L40 is the datacenter sibling of the RTX 4090 — passive-cooled, ECC-equipped, NVENC\/NVDEC hardware encoders on-die, and NVIDIA OEM 3-year warranty. Runs vLLM, SGLang, llama.cpp, Triton, TensorRT-LLM out of the box.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eHardware\u003c\/h2\u003e\n\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-bottom:24px;font-size:15px\"\u003e\n\u003cthead\u003e\u003ctr style=\"background:#0d0d0d;color:#fff\"\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eComponent\u003c\/th\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eDetail\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eGPUs\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e4x NVIDIA L40 48 GB ECC GDDR6 (Ada Lovelace, passive, 300 W, dual-slot, PCIe 4.0 x16)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eVRAM pool\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e192 GB ECC across 4 cards (no NVLink on L40)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCPU\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eAMD EPYC 7643 Milan (48C\/96T, 225 W, 128x PCIe 4.0 lanes)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eMotherboard\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eASRock Rack ROMED8-2T (SP3, 7x PCIe 4.0 x16, 8x DDR4 ECC, 2x 10 GbE, IPMI)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eSystem RAM\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e256 GB DDR4-2666 ECC RDIMM (4x 64 GB)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eBoot \/ storage\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e2 TB NVMe M.2 (PCIe 4.0 x4)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003ePower supply\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eDual 2 kW ATX PSU with sync cable\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eChassis\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e4U rack-mount with front-to-back directed airflow\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCooling\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eArctic Freezer 4U-M SP3 tower + 3x 120 mm front intake + 1x 120 mm rear exhaust\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eNetwork\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eOnboard dual 10 GbE (Intel X550)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:32px\"\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003ePower envelope\u003c\/h3\u003e\n\u003cul style=\"margin:0;padding-left:18px;font-size:14px;color:#444\"\u003e\n\u003cli\u003eGPU draw: 4 x 300 W = 1 200 W\u003c\/li\u003e\n\u003cli\u003eSystem total at full load: ~1 525 W\u003c\/li\u003e\n\u003cli\u003ePSU total: 4 000 W (dual 2 kW synced) — 61.9 % headroom\u003c\/li\u003e\n\u003cli\u003eDual PSU for split power delivery and N+1 capability\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003eLane topology\u003c\/h3\u003e\n\u003cp style=\"margin:0;font-size:14px;color:#444\"\u003ePCIe Gen4 x16 per card (L40 is Gen4 native). Direct root-complex connection from single EPYC — no PCIe switch. No NVLink — inter-GPU traffic runs PCIe peer-to-peer. Three x16 slots remain for NIC \/ storage expansion.\u003c\/p\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWhat you can run\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#fefaf0;border-left:4px solid #fab400;padding:16px 20px;margin-bottom:24px;border-radius:0 8px 8px 0\"\u003e\n\u003cp style=\"margin:0;font-size:15px;color:#333\"\u003eWith 192 GB of ECC VRAM across 4 datacenter cards, this server handles 200B+ frontier MoE at Q4, enterprise multi-tenant serving with strict SLA, and 24\/7 production inference without ECC-related bit-flip drift.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eLLMs — text \/ reasoning \/ coding\u003c\/h3\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin-bottom:8px\"\u003eChinese frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwen3 \/ Qwen3.5 (Alibaba):\u003c\/strong\u003e Qwen3-235B-A22B Q4 (~132 GB) with long context — the hero config (~12-18 tok\/s single-stream across 4x L40); Qwen3-Coder-480B-A35B Q2 (~160 GB, tight); Qwen3.5-122B-A10B fp8 (~75 GB) with huge KV; Qwen3-32B dense bf16 multiple concurrent streams\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eDeepSeek:\u003c\/strong\u003e DeepSeek-V3\/R1\/V3.1\/V3.2 Q2 (~215 GB with minor RAM spill); DeepSeek-R2 32B — 4x concurrent streams, one per card\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eGLM \/ Z.ai:\u003c\/strong\u003e GLM-4.5 \/ 4.6 \/ 4.7 Q4 (~177 GB) — sweet spot for this tier; GLM-4.5-Air 106B\/12B fp8 or bf16\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eTencent Hunyuan:\u003c\/strong\u003e Hunyuan-Large Q3 (~160 GB) — 389B MoE with 256k ctx; Hunyuan-A13B fp8 (~80 GB) with huge KV\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eBaidu ERNIE-4.5-424B Q3 (~180 GB);\u003c\/strong\u003e InternVL3.5-241B-A28B Q4 (~135 GB); Qwen3.5-397B Q3 (~170 GB)\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin:20px 0 8px 0\"\u003eWestern frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eMeta Llama:\u003c\/strong\u003e Llama 3.3 70B bf16 with massive KV (~15-18 tok\/s single-stream on 4x L40); Llama 4 Scout bf16 (~218 GB) tight; Llama 4 Maverick 400B\/17B Q3 (~188 GB)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eMistral:\u003c\/strong\u003e Mistral Large 2 \/ Pixtral Large \/ Devstral 2 123B Q6 (~102 GB) comfortable; Mistral Small 3 multi-stream\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eOpenAI (open weights):\u003c\/strong\u003e gpt-oss-120b MXFP4 (80 GB) with generous KV\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eNVIDIA Nemotron:\u003c\/strong\u003e Llama-3.1-Nemotron Ultra 253B Q4 (~147 GB); Super 49B bf16 multiple streams\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eGoogle Gemma 3:\u003c\/strong\u003e 27B multimodal bf16 — multiple resident streams\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eOthers:\u003c\/strong\u003e Cohere Command R+ 104B Q6 (~85 GB); OLMo 3.1 32B; Reka Flash 3 21B; IBM Granite 4.0 H-Small\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVision-Language Models\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eInternVL3.5-241B-A28B Q4 (~135 GB); Qwen3-VL-235B-A22B Q4; Qwen3-VL-32B bf16; Llama 3.2 90B Vision bf16 (~180 GB); Pixtral Large 124B Q6-bf16; Molmo 72B bf16; GLM-4.6V 106B fp8; Gemma 3 27B multimodal multiple streams; InternVL3 78B bf16; DeepSeek-VL2 full range.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eImage generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eFLUX.1 [dev] \/ [schnell] bf16 with concurrent generation (~3-4 s per 1024x1024 image on L40); FLUX.1 Kontext [dev]; FLUX Tools; SD 3.5 Large bf16 x 2-3 concurrent; HunyuanImage-2.1 bf16 (~34 GB) multi-stream; HunyuanImage-3.0 base (80B MoE, 13B active) bf16 (~80 GB); HunyuanDiT; Kolors \/ Kolors 2.0; AuraFlow; OmniGen v1; PixArt-Sigma.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVideo generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eWan 2.2 T2V-A14B \/ I2V-A14B MoE bf16 dual-expert full-context; Wan 2.2 TI2V-5B fast path; HunyuanVideo 13B bf16 both experts; HunyuanVideo 1.5; CogVideoX-5B bf16; Open-Sora 2.0 11B bf16; Mochi-1 bf16 (~42 GB) multi-stream; LTX-Video; Pyramid Flow; SVD \/ SV3D \/ SV4D; NVIDIA Cosmos Predict 2.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eAudio \/ Speech \/ TTS\u003c\/h3\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eASR:\u003c\/strong\u003e Whisper v3 large \/ turbo (~50x realtime); Parakeet-TDT; Canary 1B; Qwen3-ASR; SenseVoice\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eTTS:\u003c\/strong\u003e CosyVoice 2\/3; Kokoro 82M; XTTS v2; Stable Audio Open; Step-Audio-EditX\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eRealtime \/ S2S:\u003c\/strong\u003e Kyutai Moshi 7B; Step-Audio 2 mini\/R1; Qwen2.5-Omni-7B\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eMusic \/ SFX:\u003c\/strong\u003e MusicGen \/ AudioGen \/ Bark; SeamlessM4T v2\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eMulti-model \/ multi-tenant serving\u003c\/h3\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eEnterprise production LLM gateway — Qwen3-235B Q4 or GLM-4.5\/4.6 Q4 serving 16-32 concurrent users with strict SLA\u003c\/li\u003e\n\u003cli\u003eMixed resident stack: 235B MoE + FLUX.1 + Whisper-turbo + Moshi with partitioned VRAM and ECC protection\u003c\/li\u003e\n\u003cli\u003eLive video + AI pipeline — NVENC\/NVDEC hardware encoders stream 6-8 parallel captioning + moderation pipelines\u003c\/li\u003e\n\u003cli\u003eMulti-tenant RAG — query-side embedder + 70B reader + reranker at sub-second P99 latency\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eTarget workloads\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e24\/7 production LLM inference at 192 GB pool (Qwen3-235B Q4, GLM-4.5\/4.6\/4.7 Q4, Llama 4 Scout bf16)\u003c\/li\u003e\n\u003cli\u003eEnterprise multi-tenant serving with strict SLA — ECC reliability over long runs\u003c\/li\u003e\n\u003cli\u003eRAG + vector DB serving with high-quality retrieval models concurrent\u003c\/li\u003e\n\u003cli\u003eMedia \/ video AI pipelines — NVENC \/ NVDEC hardware path, VFX rendering, transcribe\/translate\u003c\/li\u003e\n\u003cli\u003eDatacenter silent-operation deployments — passive cards, low acoustic profile near office space\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eMeasured performance\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#0d0d0d;color:#fff;border-radius:12px;padding:24px;margin-bottom:24px\"\u003e\n\u003cp style=\"margin:0 0 4px 0;font-size:13px;color:#888;text-transform:uppercase;letter-spacing:1px\"\u003ePublished references | NVIDIA L40 datasheet + community benchmarks\u003c\/p\u003e\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-top:16px;font-size:14px\"\u003e\n\u003cthead\u003e\u003ctr style=\"border-bottom:1px solid #333\"\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eBenchmark\u003c\/th\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eResult\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003ePer-card INT8 TOPS (NVIDIA datasheet)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e362 TOPS\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eAggregate INT8 TOPS (4 cards)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e1 448 TOPS\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003ePer-card VRAM\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e48 GB ECC GDDR6, 864 GB\/s bandwidth\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eLlama 3.3 70B Q6 via vLLM (community)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e30-50 tok\/s single-stream, 150+ tok\/s batch-16\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eFLUX.1 [dev] bf16 on L40 (community)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~3-4 s per 1024x1024 image\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eNVENC \/ NVDEC\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003eGen-8 hardware encoders on-die (video AI pipeline)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\u003cp style=\"margin:12px 0 0 0;font-size:13px;color:#666\"\u003ePublished external references, not measured on Kentino hardware. Kentino will publish first-party numbers after the first customer build.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eNot ideal for\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eTraining large models from scratch (no NVLink, limited FP8 tensor compute)\u003c\/li\u003e\n\u003cli\u003eSingle-user budget inference (4x L4 or 2x L40 is materially cheaper)\u003c\/li\u003e\n\u003cli\u003eDense bf16 70B at very long context on one model — prefer 2x RTX Pro 6000 Server Edition (same 192 GB pool, less TP overhead)\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWarranty and lead time\u003c\/h2\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:24px\"\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e2 years\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003eparts warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e1 year\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elabor warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e10-28 days\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elead time\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"font-size:14px;color:#666\"\u003eNVIDIA OEM 3-year warranty on L40 + Kentino integration warranty. Build includes assembly, BIOS configuration, driver install, burn-in testing, and functional verification. Lead time depends on component availability, confirmed at order.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eRecommended add-ons\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eUpgrade RAM to 512 GB (add 4x 64 GB DDR4 — four DIMM slots still open)\u003c\/li\u003e\n\u003cli\u003e4 TB NVMe for model library staging\u003c\/li\u003e\n\u003cli\u003eFull 24U rack cabinet with managed PDU + online UPS 5 kVA\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003c\/div\u003e","brand":"Kentino s.r.o.","offers":[{"title":"Default Title","offer_id":52940180193608,"sku":null,"price":40798.0,"currency_code":"EUR","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0843\/5479\/3800\/files\/kentino-ai-server-4-gpu-topdown_6b2c51b2-25c1-479d-929a-29eebe60e5ef.jpg?v=1776940959"},{"product_id":"k-ai-192-rome-rtxpro6000-4000tops-2-rtx-pro-6000-blackwell-server-edition-epyc-milan","title":"K-AI 192 Rome RTXPro6000 4000TOPS — 2× RTX Pro 6000 Blackwell Server Edition — EPYC Milan","description":"\u003cdiv style=\"font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;line-height:1.7;color:#1a1a1a\"\u003e\n\n\u003cdiv style=\"background:linear-gradient(135deg,#0d0d0d 0%,#1a1a2e 100%);color:#fff;padding:32px;border-radius:12px;margin-bottom:32px\"\u003e\n\u003cp style=\"font-size:18px;margin:0 0 20px 0;color:#ccc\"\u003eK-AI 192 Rome RTXPro6000 4000TOPS\u003c\/p\u003e\n\u003cp style=\"font-size:28px;font-weight:700;margin:0 0 16px 0;line-height:1.3\"\u003e192 GB ECC Blackwell Flagship Pair\u003cbr\u003e2x RTX Pro 6000 Server Edition | EPYC Milan | 4 000 TOPS INT8\u003c\/p\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-top:24px\"\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e4 000\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eINT8 TOPS\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e192 GB\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eECC VRAM\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003eBlackwell\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003efp8 native\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e2-card\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eminimal TP\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"margin-top:20px;font-size:15px;color:#aaa\"\u003eTwo passive RTX Pro 6000 Blackwell Server Edition cards — 96 GB ECC each. Less tensor-parallel overhead than 4- or 8-card builds. Datacenter flagship pair.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003cp style=\"font-size:17px;color:#333;margin-bottom:24px\"\u003eA 4U rack-mount inference server with two passive RTX Pro 6000 Blackwell Server Edition cards (96 GB ECC GDDR7 per card), one AMD EPYC 7643 Milan CPU (48C\/96T), 256 GB DDR4 ECC, 2 TB NVMe boot, and a single 2 kW ATX PSU. For 70B dense bf16 and mid-size MoE, fewer big cards beat more small cards — two-card tensor parallelism has minimal communication overhead, and each 96 GB card carries a complete copy of most models.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eHardware\u003c\/h2\u003e\n\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-bottom:24px;font-size:15px\"\u003e\n\u003cthead\u003e\u003ctr style=\"background:#0d0d0d;color:#fff\"\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eComponent\u003c\/th\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eDetail\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eGPUs\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e2x NVIDIA RTX Pro 6000 Blackwell Server Edition 96 GB ECC GDDR7 (passive, 600 W, PCIe 5.0 x16, dual-slot)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eVRAM pool\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e192 GB ECC (96 GB x 2) — each card holds a 70B bf16 model standalone\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCPU\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eAMD EPYC 7643 Milan (48C\/96T, 225 W, 128x PCIe 4.0 lanes)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eMotherboard\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eASRock Rack ROMED8-2T (SP3, 7x PCIe 4.0 x16, 8x DDR4 ECC, 2x 10 GbE, IPMI)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eSystem RAM\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e256 GB DDR4-2666 ECC RDIMM (4x 64 GB)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eBoot \/ storage\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e2 TB NVMe M.2 (PCIe 4.0 x4)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003ePower supply\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e1x 2 kW ATX PSU\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eChassis\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e4U rack-mount with front-to-back directed airflow\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCooling\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eArctic Freezer 4U-M SP3 tower + 3x 120 mm front intake + 1x 120 mm rear exhaust\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eNetwork\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eOnboard dual 10 GbE (Intel X550)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:32px\"\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003ePower envelope\u003c\/h3\u003e\n\u003cul style=\"margin:0;padding-left:18px;font-size:14px;color:#444\"\u003e\n\u003cli\u003eGPU draw: 2 x 600 W = 1 200 W\u003c\/li\u003e\n\u003cli\u003eSystem total at full load: ~1 525 W\u003c\/li\u003e\n\u003cli\u003ePSU total: 2 000 W (single 2 kW) — 23.7 % headroom\u003c\/li\u003e\n\u003cli\u003eSingle PSU sufficient; optional dual-PSU upgrade for N+1 redundancy\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003eLane topology\u003c\/h3\u003e\n\u003cp style=\"margin:0;font-size:14px;color:#444\"\u003ePCIe Gen4 x16 per GPU (card is Gen5 native; Rome board caps at Gen4). Direct root-complex connection — no PCIe switch. No NVLink — inter-GPU peer-to-peer. Five x16 slots remain open for expansion. Gen4 vs Gen5 negligible for inference at this VRAM density.\u003c\/p\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWhat you can run\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#fefaf0;border-left:4px solid #fab400;padding:16px 20px;margin-bottom:24px;border-radius:0 8px 8px 0\"\u003e\n\u003cp style=\"margin:0;font-size:15px;color:#333\"\u003eWith 192 GB ECC VRAM on just two Blackwell cards with native fp8\/fp4, this is the cleanest path to dense 70B at bf16 and mid-size MoE. Two independent 70B streams — one per card — or 200B MoE across both with minimal 2-way TP overhead.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eLLMs — text \/ reasoning \/ coding\u003c\/h3\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin-bottom:8px\"\u003eChinese frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwen3 \/ Qwen3.5 (Alibaba):\u003c\/strong\u003e Qwen3-235B-A22B Q4 (~132 GB) comfortable with long ctx (~15-25 tok\/s single-stream across 2 cards); Qwen3-Coder-480B-A35B Q2 (~160 GB); Qwen3.5-122B-A10B fp8 (~75 GB); Qwen3-32B dense bf16 with huge KV; QwQ-32B bf16\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eDeepSeek:\u003c\/strong\u003e DeepSeek-V3\/R1 Q2 (~215 GB with small RAM spill) — Blackwell runs fp8 natively; DeepSeek-R2 32B bf16 two concurrent streams (one per card)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eGLM \/ Z.ai:\u003c\/strong\u003e GLM-4.5 \/ 4.6 \/ 4.7 Q4 (~177 GB) — hero config at this tier; GLM-4.5-Air fp8 or bf16 with huge KV\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eTencent Hunyuan:\u003c\/strong\u003e Hunyuan-Large Q3 (~160 GB) — 389B MoE with 256k ctx; Hunyuan-A13B fp8 native (~80 GB) with huge KV\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eOthers:\u003c\/strong\u003e Baidu ERNIE-4.5-424B Q3 (~180 GB); InternVL3.5-241B-A28B Q4 (~135 GB); MiniMax-M1 Q3 (~180 GB)\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin:20px 0 8px 0\"\u003eWestern frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eMeta Llama:\u003c\/strong\u003e Llama 3.3 70B bf16 on one card — two independent concurrent 70B streams (~20-30 tok\/s per stream); Llama 4 Scout bf16 (~218 GB, tight); Llama 4 Maverick Q3 (~188 GB)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eMistral:\u003c\/strong\u003e Mistral Large 2 \/ Pixtral Large \/ Devstral 2 123B Q6 (~88 GB) single-card or bf16 across both; Mistral Small 3 multi-stream\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eOpenAI (open weights):\u003c\/strong\u003e gpt-oss-120b MXFP4 native (80 GB) — fits on ONE card, two independent concurrent streams\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eNVIDIA Nemotron:\u003c\/strong\u003e Llama-3.1-Nemotron Ultra 253B Q4 (~147 GB); Super 49B bf16 on single card\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eOthers:\u003c\/strong\u003e Cohere Command R+ 104B Q6 (~85 GB) on one card; Google Gemma 3 27B bf16 multiple concurrent streams\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVision-Language Models\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eInternVL3.5-241B-A28B Q4 (~135 GB); Qwen3-VL-235B-A22B Q4; Qwen3-VL-32B bf16 single-card; Pixtral Large 124B bf16 or Q6; Llama 3.2 90B Vision bf16 (~180 GB); Molmo 72B bf16 (~144 GB); GLM-4.6V 106B fp8; Gemma 3 27B multimodal x 2-3 concurrent streams.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eImage generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eFLUX.1 [dev] bf16 multiple concurrent streams; FLUX.1 Kontext [dev]; FLUX Tools; SD 3.5 Large bf16 concurrent; HunyuanImage-2.1 bf16 (~34 GB) x 2-4 concurrent; HunyuanImage-3.0 base (80B MoE, 13B active) bf16 — fits on one card; HunyuanDiT; Kolors \/ Kolors 2.0; AuraFlow; OmniGen v1; PixArt-Sigma.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVideo generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eWan 2.2 MoE dual-expert bf16 full context — fits on one card, two concurrent generation streams; Wan 2.2 TI2V-5B; HunyuanVideo 13B bf16 both experts; HunyuanVideo 1.5; CogVideoX-5B bf16; Open-Sora 2.0 11B bf16; Mochi-1 bf16 (~42 GB); LTX-Video; Pyramid Flow; SVD \/ SV3D \/ SV4D; NVIDIA Cosmos Predict 2.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eAudio \/ Speech \/ TTS\u003c\/h3\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eASR:\u003c\/strong\u003e Whisper v3 large \/ turbo (~50x realtime); Parakeet-TDT; Canary 1B; Qwen3-ASR; SenseVoice\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eTTS:\u003c\/strong\u003e CosyVoice 2\/3; Kokoro 82M; XTTS v2; Stable Audio Open; Step-Audio-EditX\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eRealtime \/ S2S:\u003c\/strong\u003e Kyutai Moshi 7B; Step-Audio 2 mini\/R1; Qwen2.5-Omni-7B\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eMusic \/ SFX:\u003c\/strong\u003e MusicGen \/ AudioGen \/ Bark; SeamlessM4T v2\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eMulti-model \/ multi-tenant serving\u003c\/h3\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eTwo independent 70B streams — one per card, simplest form of tenant isolation\u003c\/li\u003e\n\u003cli\u003eDense 70B bf16 + supporting stack — LLM on card 1, image\/video\/audio on card 2\u003c\/li\u003e\n\u003cli\u003e200B MoE across both cards — minimal tensor-parallel overhead (2-way split)\u003c\/li\u003e\n\u003cli\u003efp8-native frontier — DeepSeek V3 family, Hunyuan-Large fp8 with Blackwell native paths\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eTarget workloads\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eDense 70B bf16 inference — two cards tensor-parallel with minimal overhead, or one model per card for streaming\u003c\/li\u003e\n\u003cli\u003e100-150B MoE at Q4-Q6 (GLM-4.5-Air, Qwen3.5-122B-A10B, Hunyuan-A13B, Llama 4 Scout)\u003c\/li\u003e\n\u003cli\u003eFP8-native frontier inference (DeepSeek V3 family, Hunyuan, Llama 4) — Blackwell runs fp8 natively\u003c\/li\u003e\n\u003cli\u003eImage + video generation studio at bf16 (Wan 2.2 T2V-A14B, HunyuanVideo 13B, FLUX.1 [dev])\u003c\/li\u003e\n\u003cli\u003eLong-context document analysis (MiniMax-M1, Kimi-K2 1.58-bit UD with spill)\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eMeasured performance\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#0d0d0d;color:#fff;border-radius:12px;padding:24px;margin-bottom:24px\"\u003e\n\u003cp style=\"margin:0 0 4px 0;font-size:13px;color:#888;text-transform:uppercase;letter-spacing:1px\"\u003ePublished references | NVIDIA RTX Pro 6000 Blackwell Server Edition datasheet + community benchmarks\u003c\/p\u003e\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-top:16px;font-size:14px\"\u003e\n\u003cthead\u003e\u003ctr style=\"border-bottom:1px solid #333\"\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eBenchmark\u003c\/th\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eResult\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003ePer-card INT8 TOPS (NVIDIA datasheet)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e2 000 TOPS\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eAggregate INT8 TOPS (2 cards)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e4 000 TOPS\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eMemory bandwidth per card\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~1 800 GB\/s, 96 GB ECC GDDR7\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eLlama 3.3 70B bf16 per-card (community)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e15-25 tok\/s single-stream, 60-90 tok\/s batch\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eDual-card tensor-parallel 70B (community)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~30-45 tok\/s single-stream expected\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eBlackwell fp8 native\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003eDeepSeek-V3 fp8, Hunyuan-A13B fp8 run without bf16 upcast\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\u003cp style=\"margin:12px 0 0 0;font-size:13px;color:#666\"\u003ePublished external references, not measured on Kentino hardware. Kentino will publish first-party numbers after the first customer build.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eNot ideal for\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eVery high concurrency multi-tenant serving — 4x L40 or 6x L4 distributes better across more cards\u003c\/li\u003e\n\u003cli\u003eHeavy KV cache at very long context — step up to K-AI 384 RTXPro6000 8000TOPS\u003c\/li\u003e\n\u003cli\u003eTraining — Kentino does not sell H-class NVLink fabrics\u003c\/li\u003e\n\u003cli\u003eBudget inference at 192 GB pool — 8x RTX 4090 is cheaper (trading ECC and passive cooling for cost)\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWarranty and lead time\u003c\/h2\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:24px\"\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e2 years\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003eparts warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e1 year\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elabor warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e10-28 days\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elead time\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"font-size:14px;color:#666\"\u003eNVIDIA OEM 3-year warranty on RTX Pro 6000 Server Edition + Kentino integration warranty. Build includes assembly, BIOS configuration, driver install, burn-in testing, and functional verification. Lead time depends on component availability, confirmed at order.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eRecommended add-ons\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eUpgrade to dual 2 kW synced PSU for N+1 redundancy\u003c\/li\u003e\n\u003cli\u003eUpgrade RAM to 512 GB (4 DIMM slots open)\u003c\/li\u003e\n\u003cli\u003e4 TB NVMe for large weight libraries and model staging\u003c\/li\u003e\n\u003cli\u003eExpand to 4-card configuration (K-AI 384 RTXPro6000 8000TOPS) — chassis has slot capacity\u003c\/li\u003e\n\u003cli\u003e24U rack cabinet + online UPS 5 kVA\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003c\/div\u003e","brand":"Kentino s.r.o.","offers":[{"title":"Default Title","offer_id":52940194447688,"sku":null,"price":25162.0,"currency_code":"EUR","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0843\/5479\/3800\/files\/kentino-ai-server-4-gpu-topdown_6b2c51b2-25c1-479d-929a-29eebe60e5ef.jpg?v=1776940959"},{"product_id":"k-ai-192-romedual-4090-5288tops-8-rtx-4090-dual-epyc-milan","title":"K-AI 192 RomeDual 4090 5288TOPS — 8× RTX 4090 — Dual EPYC Milan","description":"\u003cdiv style=\"font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;line-height:1.7;color:#1a1a1a\"\u003e\n\n\u003cdiv style=\"background:linear-gradient(135deg,#0d0d0d 0%,#1a1a2e 100%);color:#fff;padding:32px;border-radius:12px;margin-bottom:32px\"\u003e\n\u003cp style=\"font-size:18px;margin:0 0 20px 0;color:#ccc\"\u003eK-AI 192 RomeDual 4090 5288TOPS\u003c\/p\u003e\n\u003cp style=\"font-size:28px;font-weight:700;margin:0 0 16px 0;line-height:1.3\"\u003e192 GB VRAM 8-GPU Inference Server\u003cbr\u003e8x RTX 4090 | Dual EPYC Milan | 5 288 TOPS INT8\u003c\/p\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-top:24px\"\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e5 288\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eINT8 TOPS\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e192 GB\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eVRAM pool\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e8-GPU\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003etensor parallel\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003edual\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eCPU 96C\/192T\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"margin-top:20px;font-size:15px;color:#aaa\"\u003eFlagship 8x gaming-GPU inference box. 192 GB pool at consumer-card economics on a dual-socket EPYC Milan platform.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003cp style=\"font-size:17px;color:#333;margin-bottom:24px\"\u003eA 7U 8-GPU chassis built around dual EPYC 7643 Milan CPUs (96C\/192T total), ASRock Rack ROME2D32GM-NL dual-SP3 motherboard, 512 GB DDR4 ECC, 2 TB NVMe boot, and a 5x 1200 W server PSU set. Eight GeForce RTX 4090 connect via active PCIe Gen4 retimer risers at full x16. The cheapest path to 192 GB frontier MoE inference on Kentino hardware.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eHardware\u003c\/h2\u003e\n\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-bottom:24px;font-size:15px\"\u003e\n\u003cthead\u003e\u003ctr style=\"background:#0d0d0d;color:#fff\"\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eComponent\u003c\/th\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eDetail\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eGPUs\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e8x NVIDIA GeForce RTX 4090 24 GB GDDR6X (Ada Lovelace, 450 W, PCIe 4.0 x16)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eVRAM pool\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e192 GB total across 8 cards (no NVLink on consumer RTX 4090)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCPU\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e2x AMD EPYC 7643 Milan (48C\/96T each — 96C\/192T total, 225 W each, 2x 128 PCIe 4.0 lanes)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eMotherboard\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eASRock Rack ROME2D32GM-NL (dual SP3, PCIe 4.0, 32x DDR4 ECC DIMM slots)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eSystem RAM\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e512 GB DDR4-2666 ECC RDIMM (8x 64 GB — 4 per socket for 8-channel balance)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eBoot \/ storage\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e2 TB NVMe M.2 (PCIe 4.0 x4)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003ePower supply\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e5x 1200 W server PSU set (HP-compatible, hot-swap) + full 12VHPWR adapter set\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eChassis\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e7U 8-GPU chassis (up to 10 PCIe cards including risers)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eRisers\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e8x active PCIe Gen4 x16 retimer risers (required over cable length)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCooling\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e2x Arctic Freezer 4U-M SP3 tower coolers + rack-mount front-to-back airflow (industrial fans)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eNetwork\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eOnboard dual 10 GbE (Intel X550)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:32px\"\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003ePower envelope\u003c\/h3\u003e\n\u003cul style=\"margin:0;padding-left:18px;font-size:14px;color:#444\"\u003e\n\u003cli\u003eGPU draw: 8 x 450 W = 3 600 W\u003c\/li\u003e\n\u003cli\u003eCPU draw: 2 x 225 W = 450 W\u003c\/li\u003e\n\u003cli\u003eSystem total at full load: ~4 200 W\u003c\/li\u003e\n\u003cli\u003ePSU total: 6 000 W all-active (5x 1200 W) — 30.0 % headroom\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003eLane topology\u003c\/h3\u003e\n\u003cp style=\"margin:0;font-size:14px;color:#444\"\u003eROME2D32GM-NL exposes 2x 128 PCIe Gen4 lanes — one 128-lane pool per EPYC socket — direct to GPU slots. Active Gen4 retimer risers for signal integrity. No PCIe switch. No NVLink. Measured 19-22 GB\/s inter-GPU peer-to-peer on 4-GPU bench.\u003c\/p\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWhat you can run\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#fefaf0;border-left:4px solid #fab400;padding:16px 20px;margin-bottom:24px;border-radius:0 8px 8px 0\"\u003e\n\u003cp style=\"margin:0;font-size:15px;color:#333\"\u003eWith 192 GB across 8 cards, this server handles 200B+ frontier MoE at Q4, 8-way tensor-parallel inference, tenant-isolated multi-model serving, and high-batch throughput at consumer-card economics.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eLLMs — text \/ reasoning \/ coding\u003c\/h3\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin-bottom:8px\"\u003eChinese frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwen3 \/ Qwen3.5 (Alibaba):\u003c\/strong\u003e Qwen3-235B-A22B Q4 (~132 GB) with long ctx — the hero config (~15-25 tok\/s single-stream on 8x RTX 4090); Qwen3-Coder-480B-A35B Q2 (~160 GB); Qwen3.5-122B-A10B fp8 (~75 GB) multi-stream; Qwen3-32B dense bf16 x multiple concurrent\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eDeepSeek:\u003c\/strong\u003e DeepSeek-V3\/R1 Q2 (~215 GB with 512 GB host spill); DeepSeek-R2 32B bf16 — up to 8 concurrent streams one per card (~30-40 tok\/s per stream)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eGLM \/ Z.ai:\u003c\/strong\u003e GLM-4.5 \/ 4.6 \/ 4.7 Q4 (~177 GB); GLM-4.5-Air fp8 or bf16; GLM-4.6V 106B\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eTencent Hunyuan:\u003c\/strong\u003e Hunyuan-Large Q3 (~160 GB); Hunyuan-A13B Q4\/Q6 (RTX 4090 is Ada — fp8 upcasts to bf16, use GGUF quants)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eOthers:\u003c\/strong\u003e Baidu ERNIE-4.5-424B Q3 (~180 GB); InternVL3.5-241B-A28B Q4 (~135 GB); Qwen3.5-397B Q3 (~170 GB); MiniMax-M1 Q3 (~180 GB)\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin:20px 0 8px 0\"\u003eWestern frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eMeta Llama:\u003c\/strong\u003e Llama 3.3 70B bf16 with massive KV (~20 tok\/s single-stream Q4, ~179 tok\/s batch-32 vLLM — Kentino measured on 4-GPU bench); Llama 4 Scout bf16 (~218 GB tight); Llama 4 Maverick Q3 (~188 GB)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eMistral:\u003c\/strong\u003e Mistral Large 2 \/ Pixtral Large 123B Q6 comfortable or bf16 (~248 GB spill); Mistral Small 3 multi-stream\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eOpenAI (open weights):\u003c\/strong\u003e gpt-oss-120b MXFP4 native (80 GB) with huge KV\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eNVIDIA Nemotron:\u003c\/strong\u003e Llama-3.1-Nemotron Ultra 253B Q4 (~147 GB); Super 49B bf16\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eOthers:\u003c\/strong\u003e Cohere Command R+ 104B Q6 (~85 GB); Google Gemma 3 27B bf16 x multiple streams\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVision-Language Models\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eInternVL3.5-241B-A28B Q4 (~135 GB); Qwen3-VL-235B-A22B Q4; Qwen3-VL-32B bf16 multi-stream; Llama 3.2 90B Vision bf16 (~180 GB); Pixtral Large 124B Q6; Molmo 72B bf16; GLM-4.6V 106B fp8\/Q6; Gemma 3 27B multimodal x multiple streams.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eImage generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eFLUX.1 [dev] bf16 — up to 8 concurrent generation streams (one per card, ~15-25 s\/image at fp8); FLUX.1 Kontext [dev]; FLUX Tools; SD 3.5 Large bf16 x 8; HunyuanImage-2.1 bf16 (~34 GB) x 2-4 concurrent; HunyuanImage-3.0 base (80B MoE, 13B active) bf16; HunyuanDiT; Kolors \/ Kolors 2.0; AuraFlow; OmniGen v1; PixArt-Sigma.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVideo generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eWan 2.2 MoE dual-expert bf16 with full ctx — multiple concurrent streams; Wan 2.2 TI2V-5B x 8 concurrent; HunyuanVideo 13B bf16 both experts; HunyuanVideo 1.5; CogVideoX-5B bf16; Open-Sora 2.0 11B bf16; Genmo Mochi-1 bf16; LTX-Video x 8 concurrent; Pyramid Flow; SVD \/ SV3D \/ SV4D; NVIDIA Cosmos.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eAudio \/ Speech \/ TTS\u003c\/h3\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eASR:\u003c\/strong\u003e Whisper v3 large \/ turbo x 8 concurrent (~50x realtime per stream); Parakeet-TDT; Canary 1B; Qwen3-ASR; SenseVoice\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eTTS:\u003c\/strong\u003e CosyVoice 2\/3; Kokoro 82M; XTTS v2; Stable Audio Open\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eRealtime \/ S2S:\u003c\/strong\u003e Kyutai Moshi 7B x 8 concurrent voice streams; Step-Audio 2 mini\/R1; Qwen2.5-Omni-7B\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eMusic \/ SFX:\u003c\/strong\u003e MusicGen \/ AudioGen \/ Bark; SeamlessM4T v2\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eMulti-model \/ multi-tenant serving\u003c\/h3\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e8-way tensor-parallel inference of 200-250B MoE at Q4 (Qwen3-235B, GLM-4.5\/4.6\/4.7)\u003c\/li\u003e\n\u003cli\u003eTenant-isolated 8-stream serving — one 24 GB Q4 model per card (e.g. 8x Qwen3-14B agents)\u003c\/li\u003e\n\u003cli\u003eLarge-batch 70B — tensor-parallel vLLM \/ SGLang batch-64 aggregate\u003c\/li\u003e\n\u003cli\u003eMixed fleet: 235B MoE on 4 cards (TP4) + FLUX + video + realtime voice on remaining 4\u003c\/li\u003e\n\u003cli\u003eFine-tuning lab — 7-34B LoRA \/ QLoRA with large batch\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eTarget workloads\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e8-GPU tensor-parallel inference at the 192 GB pool — Qwen3-235B Q4, GLM-4.5\/4.6\/4.7 Q4, Llama 4 Scout bf16\u003c\/li\u003e\n\u003cli\u003eDense 70B bf16 (Llama 3.3 70B) with massive KV headroom for long ctx and high batch\u003c\/li\u003e\n\u003cli\u003eHigh-throughput batch inference gateway — vLLM \/ SGLang tensor-parallel at large batch\u003c\/li\u003e\n\u003cli\u003eFine-tuning of 7-34B class models with high-batch LoRA \/ QLoRA\u003c\/li\u003e\n\u003cli\u003eWan 2.2 dual-expert \/ HunyuanImage-3.0 \/ FLUX.1 full workflow video-image studio\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eMeasured performance\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#0d0d0d;color:#fff;border-radius:12px;padding:24px;margin-bottom:24px\"\u003e\n\u003cp style=\"margin:0 0 4px 0;font-size:13px;color:#888;text-transform:uppercase;letter-spacing:1px\"\u003eKentino bench (4-GPU reference) | 2026-04-10 | 4x RTX 4090 + EPYC 7542 + 512 GB DDR4 + ROMED8-2T\u003c\/p\u003e\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-top:16px;font-size:14px\"\u003e\n\u003cthead\u003e\u003ctr style=\"border-bottom:1px solid #333\"\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eBenchmark\u003c\/th\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eResult\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eSustained compute (fp16, 4-card ref)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e647 TFLOPS\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003evLLM — Llama 3.3 70B AWQ INT4 (single)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e8.0 tok\/s\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003evLLM — Llama 3.3 70B AWQ INT4 (batch-32)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e179 tok\/s aggregate\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003ellama.cpp — Llama 3.3 70B Q4_K_M (single)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e20.3 tok\/s decode\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003e8-GPU aggregate compute (extrapolation)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~1 294 TFLOPS fp16 expected (near-linear)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003e235B Q4 tensor-parallel 8-way (community)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e15-25 tok\/s single-stream on 8x RTX 4090\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\u003cp style=\"margin:12px 0 0 0;font-size:13px;color:#666\"\u003e4-card data measured on Kentino hardware. 8-GPU extrapolation is published external reference. Kentino will publish first-party 8-GPU numbers after the first customer build.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eNot ideal for\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e5090-generation workloads (Blackwell fp8 native + higher TOPS) — see K-AI 256 TurinDual 5090\u003c\/li\u003e\n\u003cli\u003eTraining from scratch (no NVLink on consumer RTX 4090)\u003c\/li\u003e\n\u003cli\u003eECC-sensitive 24\/7 production — consumer RTX 4090 has no ECC; prefer 4x L40 or 2x RTX Pro 6000 Server Edition\u003c\/li\u003e\n\u003cli\u003eHunyuan \/ DeepSeek fp8 native — RTX 4090 is Ada, fp8 checkpoints upcast to bf16\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWarranty and lead time\u003c\/h2\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:24px\"\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e2 years\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003eparts warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e1 year\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elabor warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e10-28 days\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elead time\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"font-size:14px;color:#666\"\u003eBuild includes assembly, BIOS config with dual-socket NUMA tuning, driver install, burn-in, memtest, full 8-GPU stress test, and LLM environment setup. Lead time depends on component availability, confirmed at order.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eRecommended add-ons\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e4 TB additional NVMe for weight staging and MoE offload workloads\u003c\/li\u003e\n\u003cli\u003eNVIDIA ConnectX-5 100 GbE for multi-node serving\u003c\/li\u003e\n\u003cli\u003eRAM upgrade to 1 TB (16x 64 GB) or 2 TB (32x 64 GB) — board supports 32 DIMM slots\u003c\/li\u003e\n\u003cli\u003eFull 24U rack cabinet + online UPS 5 kVA\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003c\/div\u003e","brand":"Kentino s.r.o.","offers":[{"title":"Default Title","offer_id":52940202410312,"sku":null,"price":32280.0,"currency_code":"EUR","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0843\/5479\/3800\/files\/kentino-ai-server-4-gpu-topdown_6b2c51b2-25c1-479d-929a-29eebe60e5ef.jpg?v=1776940959"},{"product_id":"k-ai-192-rome-arcprob70-tbd-6-intel-arc-pro-b70-epyc-milan-pre-order","title":"K-AI 192 Rome ArcProB70 TBD — 6× Intel Arc Pro B70 — EPYC Milan (Pre-Order)","description":"\u003cdiv style=\"font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;line-height:1.7;color:#1a1a1a\"\u003e\n\n\u003cdiv style=\"background:linear-gradient(135deg,#0d0d0d 0%,#1a1a2e 100%);color:#fff;padding:32px;border-radius:12px;margin-bottom:32px\"\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.25);border:2px solid #fab400;border-radius:8px;padding:12px 20px;margin-bottom:20px;text-align:center\"\u003e\n\u003cp style=\"margin:0;font-size:16px;font-weight:800;color:#fab400;text-transform:uppercase;letter-spacing:2px\"\u003eIN PREPARATION\u003c\/p\u003e\n\u003cp style=\"margin:4px 0 0 0;font-size:13px;color:#ccc\"\u003ePre-order — Intel Arc Pro B70 shipping target Q3 2026\u003c\/p\u003e\n\u003c\/div\u003e\n\u003cp style=\"font-size:18px;margin:0 0 20px 0;color:#ccc\"\u003eK-AI 192 Rome ArcProB70 TBD\u003c\/p\u003e\n\u003cp style=\"font-size:28px;font-weight:700;margin:0 0 16px 0;line-height:1.3\"\u003e192 GB VRAM Intel Xe2 Inference Server\u003cbr\u003e6x Arc Pro B70 | EPYC Milan | TOPS TBD\u003c\/p\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-top:24px\"\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003eTBD\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eINT8 TOPS\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e192 GB\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eVRAM pool\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003eIntel\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eXe2 Battlemage\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e6-card\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eOpenVINO \/ SYCL\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"margin-top:20px;font-size:15px;color:#aaa\"\u003eBudget-oriented high-VRAM build targeting the Intel open-source inference stack. Pricing locked at Intel availability.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003cp style=\"font-size:17px;color:#333;margin-bottom:24px\"\u003eA 4U rack-mount inference server with six Intel Arc Pro B70 Creator cards (32 GB Xe2-HPG \"Battlemage\" each, 192 GB aggregate), one AMD EPYC 7643 Milan CPU (48C\/96T), 384 GB DDR4 ECC, 2 TB NVMe boot, and a 2 kW ATX PSU (dual-PSU upgrade strongly recommended). Built for the Intel software ecosystem: OpenVINO 2025+, IPEX-LLM, llama.cpp SYCL backend, and vLLM-Intel forks. CUDA-only workloads do not run on this hardware.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eHardware\u003c\/h2\u003e\n\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-bottom:24px;font-size:15px\"\u003e\n\u003cthead\u003e\u003ctr style=\"background:#0d0d0d;color:#fff\"\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eComponent\u003c\/th\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eDetail\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eGPUs\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e6x Intel Arc Pro B70 Creator 32 GB (Xe2-HPG \"Battlemage\", 250 W, PCIe 5.0 x16, dual-slot)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eVRAM pool\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e192 GB aggregate across 6 cards (no inter-card fabric — peer traffic over PCIe)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCPU\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eAMD EPYC 7643 Milan (48C\/96T, 225 W, 128x PCIe 4.0 lanes)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eMotherboard\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eASRock Rack ROMED8-2T (SP3, 7x PCIe 4.0 x16, 8x DDR4 ECC, 2x 10 GbE, IPMI)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eSystem RAM\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e384 GB DDR4-2666 ECC RDIMM (6x 64 GB)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eBoot \/ storage\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e2 TB NVMe M.2 (PCIe 4.0 x4)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003ePower supply\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e1x 2 kW ATX PSU (dual 2 kW synced upgrade strongly recommended)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eChassis\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e4U rack-mount (6-slot layout)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCooling\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eSP3 tower cooler (Arctic Freezer 4U-M) + front-to-back directed airflow (industrial fans)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eNetwork\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eOnboard dual 10 GbE (Intel X550)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:32px\"\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003ePower envelope\u003c\/h3\u003e\n\u003cul style=\"margin:0;padding-left:18px;font-size:14px;color:#444\"\u003e\n\u003cli\u003eGPU draw: 6 x 250 W = 1 500 W (Intel-published TDP)\u003c\/li\u003e\n\u003cli\u003eSystem total at full load: ~1 825 W\u003c\/li\u003e\n\u003cli\u003ePSU total: 2 000 W (single) — only 8.75 % headroom\u003c\/li\u003e\n\u003cli\u003eDual 2 kW synced strongly recommended — restores ~45 % headroom\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003eLane topology\u003c\/h3\u003e\n\u003cp style=\"margin:0;font-size:14px;color:#444\"\u003eROMED8-2T provides 7x PCIe 4.0 x16 lanes. Six slots populated; one free for NIC upsell. Arc Pro B70 is PCIe Gen5 native; ROMED8-2T runs at Gen4 — bandwidth impact negligible for inference at 32 GB per card. No PCIe switch. No Xe-Link equivalent.\u003c\/p\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWhat you can run\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#fefaf0;border-left:4px solid #fab400;padding:16px 20px;margin-bottom:24px;border-radius:0 8px 8px 0\"\u003e\n\u003cp style=\"margin:0;font-size:15px;color:#333\"\u003eAll compatibility claims are Intel-software-stack paths (OpenVINO, IPEX-LLM, llama.cpp SYCL, vLLM-Intel). CUDA-only workloads do not run on this hardware. All figures cite published external sources and are subject to independent verification when cards ship.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eLLMs — text \/ reasoning \/ coding\u003c\/h3\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin-bottom:8px\"\u003eChinese frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwen3 \/ Qwen3.5 (Alibaba):\u003c\/strong\u003e Qwen3-235B-A22B Q4 (~132 GB) with long context headroom; Qwen3-Coder-480B-A35B Q2 (~160 GB); Qwen3.5-397B-A17B Q3 (~170 GB)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eGLM \/ Z.ai:\u003c\/strong\u003e GLM-4.5 \/ 4.6 \/ 4.7 Q4 (~177 GB) — fits with moderate KV\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eTencent Hunyuan:\u003c\/strong\u003e Hunyuan-Large Q3 (~160 GB); Hunyuan-A13B fp8 (~80 GB) if Xe2 fp8 path exposed in driver\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eOthers:\u003c\/strong\u003e Baidu ERNIE-4.5-424B Q3 (~180 GB); MiniMax-M1 Q3 (~180 GB); DeepSeek-R2 32B (6x concurrent streams)\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin:20px 0 8px 0\"\u003eWestern frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eMeta Llama:\u003c\/strong\u003e Llama 3.3 70B Q6-Q8 with generous KV; Llama 4 Scout 109B\/17B Q4 (~63 GB) comfortable\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eMistral:\u003c\/strong\u003e Mistral Small 3 \/ Magistral Small \/ Devstral Small 2 (24B) at bf16; Pixtral Large Q4-Q6\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eOpenAI (open weights):\u003c\/strong\u003e gpt-oss-120b MXFP4 native (~80 GB) — if MXFP4 dequant available in Intel stack\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eNVIDIA Nemotron:\u003c\/strong\u003e Llama-3.1-Nemotron Ultra 253B Q4 (~120 GB)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eOthers:\u003c\/strong\u003e Gemma 3 27B bf16 multimodal; Phi-4 \/ Phi-4-reasoning 14B; Cohere Command R+ 104B Q4\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVision-Language Models\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eQwen3-VL-8B \/ 32B; Qwen3-VL-30B-A3B MoE; InternVL3 up to 78B; InternVL3.5-38B; Llama 3.2 90B Vision Q4; Pixtral 12B; Molmo 72B Q4; Gemma 3 12B\/27B multimodal; MiniCPM-V 2.6 \/ MiniCPM-o 2.6. Intel's OpenVINO has strong vision-tower support — VLM is a plausible day-one strength.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eImage generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eFLUX.1 [dev] \/ [schnell] fp8 or Q4 GGUF via llama.cpp SYCL; SDXL \/ SD 3.5 Large via OpenVINO genAI runtime; HunyuanDiT; HunyuanImage-2.1 bf16 (~34 GB); Kolors 2.0; AuraFlow; OmniGen; PixArt-Sigma.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVideo generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eWan 2.2 T2V-A14B \/ I2V-A14B MoE (~54 GB bf16); Wan 2.2 TI2V-5B; HunyuanVideo 13B bf16; HunyuanVideo 1.5; CogVideoX-5B; Open-Sora 2.0; LTX-Video; Pyramid Flow; Mochi-1 Q4. Video is the weakest Intel path today — expect functional but not throughput-optimal at ship time.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eAudio \/ Speech \/ TTS\u003c\/h3\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eASR:\u003c\/strong\u003e Whisper v3 large \/ turbo via OpenVINO (first-class Intel Whisper support); Parakeet-TDT; Canary; SenseVoice\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eTTS:\u003c\/strong\u003e CosyVoice 2\/3; Kokoro 82M; Stable Audio Open; XTTS v2; StyleTTS 2; Step-Audio-EditX\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eRealtime \/ S2S:\u003c\/strong\u003e Kyutai Moshi; MusicGen \/ AudioGen \/ Bark; SeamlessM4T v2\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eMulti-model \/ multi-tenant serving\u003c\/h3\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e6 concurrent streams of a 32 GB Q4 model (one per card) — e.g. 6x Qwen3-32B Q4 agents\u003c\/li\u003e\n\u003cli\u003eEmbedding-fleet at scale — 6x parallel BGE-M3 \/ E5 \/ Nomic Embed streams (OpenVINO-optimized)\u003c\/li\u003e\n\u003cli\u003eMixed residency — 70B Q4 (tensor-parallel over 3 cards) + FLUX.1 (1 card) + Whisper-turbo (1 card) + Moshi (1 card)\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eTarget workloads\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eIntel-software evaluation pilot for CUDA-alternative LLM serving\u003c\/li\u003e\n\u003cli\u003eEmbedding \/ reranker backend where VRAM-per-EUR dominates throughput requirements\u003c\/li\u003e\n\u003cli\u003eBudget Q4 frontier-MoE inference (Qwen3-235B, GLM-4.5\/4.6\/4.7) for small internal dev teams\u003c\/li\u003e\n\u003cli\u003eOpenVINO-native model deployment alongside existing Intel Xeon \/ Arc Pro pipelines\u003c\/li\u003e\n\u003cli\u003eVLM \/ OCR \/ document-processing backend (Intel's OpenVINO strength)\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eMeasured performance\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#0d0d0d;color:#fff;border-radius:12px;padding:24px;margin-bottom:24px\"\u003e\n\u003cp style=\"margin:0 0 4px 0;font-size:13px;color:#888;text-transform:uppercase;letter-spacing:1px\"\u003eIntel-published specs | Subject to independent verification when cards ship\u003c\/p\u003e\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-top:16px;font-size:14px\"\u003e\n\u003cthead\u003e\u003ctr style=\"border-bottom:1px solid #333\"\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eSpec\u003c\/th\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eValue\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eVRAM per card\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e32 GB GDDR6\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eMemory bandwidth class\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~450 GB\/s per card\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eXe Matrix Extensions (XMX)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003eAccelerated via OpenVINO \/ IPEX-LLM\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003efp8 path\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003eXe2 silicon — verify driver exposure at ship time\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\u003cp style=\"margin:12px 0 0 0;font-size:13px;color:#666\"\u003eNo Kentino measured data. Intel-published specs subject to independent verification. Kentino will publish first-party tok\/s \/ QPS \/ bandwidth numbers once the first unit passes burn-in.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eNot ideal for\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eCUDA-native workloads — no CUDA on Intel, expect migration friction\u003c\/li\u003e\n\u003cli\u003eProduction SLA-critical deployments until Intel Arc Pro supply and tooling stabilize\u003c\/li\u003e\n\u003cli\u003eFrontier 600B+ MoE at Q4+ (requires 6x RTX Pro 6000 \/ 576 GB pool)\u003c\/li\u003e\n\u003cli\u003eTraining workloads — Arc Pro is inference-first, framework maturity for distributed training is limited\u003c\/li\u003e\n\u003cli\u003eCustomers who require measured benchmarks before purchase — this SKU is pre-order\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWarranty and lead time\u003c\/h2\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:24px\"\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e2 years\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003eparts warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e1 year\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elabor warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003eQ3 2026\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003etarget shipping\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"font-size:14px;color:#666\"\u003eKentino standard warranty (2 years parts, 1 year labor); Intel distribution terms supersede where stricter. Build includes assembly, BIOS configuration, driver install, burn-in testing, and functional verification. Reserve your first-wave delivery slot via the Kentino contact form. 30-day price-commit window at order.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eRecommended add-ons\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eDual 2 kW synced PSU upgrade (single-PSU headroom is tight at 1 825 W draw — strongly recommended)\u003c\/li\u003e\n\u003cli\u003eUpgrade RAM to 512 GB DDR4 (2x 64 GB — two slots open)\u003c\/li\u003e\n\u003cli\u003e4 TB NVMe secondary drive for model library\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003c\/div\u003e","brand":"Kentino s.r.o.","offers":[{"title":"Default Title","offer_id":52940209783112,"sku":null,"price":20793.0,"currency_code":"EUR","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0843\/5479\/3800\/files\/kentino-ai-server-4-gpu-topdown_6b2c51b2-25c1-479d-929a-29eebe60e5ef.jpg?v=1776940959"},{"product_id":"k-ai-256-turindual-5090-8-rtx-5090-dual-socket-zen5c-flagship-request-quote-on-cpu","title":"K-AI 256 TurinDual 5090 — 8× RTX 5090 Dual-Socket Zen5c Flagship (Request Quote on CPU)","description":"\u003cdiv style=\"font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;line-height:1.7;color:#1a1a1a\"\u003e\n\n\u003cdiv style=\"background:linear-gradient(135deg,#0d0d0d 0%,#1a1a2e 100%);color:#fff;padding:32px;border-radius:12px;margin-bottom:32px\"\u003e\n\u003cp style=\"font-size:18px;margin:0 0 20px 0;color:#ccc\"\u003eK-AI 256 TurinDual 5090 13408TOPS\u003c\/p\u003e\n\u003cp style=\"font-size:28px;font-weight:700;margin:0 0 16px 0;line-height:1.3\"\u003e256 GB VRAM Flagship Inference Server\u003cbr\u003e8x RTX 5090 | Dual EPYC Turin | 13 408 TOPS INT8\u003c\/p\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-top:24px\"\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e13 408\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eTOPS INT8\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e256 GB\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eVRAM pool\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003efp8\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eBlackwell native\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003eGen5\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003ePCIe end-to-end\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"margin-top:16px;font-size:13px;color:#777\"\u003eCPU pricing finalized at order — Turin 9005-series market moves weekly in Q2 2026.\u003c\/p\u003e\n\u003cp style=\"margin-top:12px;font-size:15px;color:#aaa\"\u003ePublished external references. Not measured on Kentino hardware.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003cp style=\"font-size:17px;color:#333;margin-bottom:24px\"\u003eA 7U rack-mount flagship inference server with eight GeForce RTX 5090 (32 GB GDDR7, Blackwell, fp8 native) on a dual-socket EPYC Turin (Zen5c, SP5) platform with 768 GB DDR5-4800 ECC across all 12 channels, 2 TB NVMe boot, and 5x 1200 W server PSU. End-to-end PCIe Gen5 at the GPU via active retimer\/redriver risers. Runs vLLM, SGLang, llama.cpp, ComfyUI and every major open-weight inference stack out of the box.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eHardware\u003c\/h2\u003e\n\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-bottom:24px;font-size:15px\"\u003e\n\u003cthead\u003e\u003ctr style=\"background:#0d0d0d;color:#fff\"\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eComponent\u003c\/th\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eDetail\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eGPUs\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e8x NVIDIA GeForce RTX 5090 32 GB GDDR7 (Blackwell, 575 W TGP, PCIe 5.0 x16, fp8 native, 1676 INT8 TOPS\/card)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eVRAM pool\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e256 GB aggregate across 8 cards (no NVLink on consumer RTX 5090)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCPU\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e2x AMD EPYC Turin 9005-series (Zen5c, SP5, PCIe 5.0) — quote-pending at order\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eMotherboard\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eASRock Rack TURIN2D24XGM\/500W (dual SP5, PCIe 5.0, 24x DDR5 DIMM)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eSystem RAM\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e768 GB DDR5-4800 ECC RDIMM (12x 64 GB — all 12 channels populated; 12 slots remain for scale to 1.5 TB)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eBoot \/ storage\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e2 TB NVMe M.2 (PCIe 4.0 x4)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003ePower supply\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e5x 1200 W server PSU set (HP-compatible, 6 kW aggregate)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eChassis\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e7U 8-GPU (up to 10 PCIe slots, separate PSU bays)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCooling\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e2x SP5 tower coolers + rack-mount front-to-back airflow (industrial fans)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eRisers\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e8x active PCIe Gen5 x16 (retimer\/redriver) — end-to-end Gen5\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eNetwork\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eOnboard 10 GbE (board-dependent)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:32px\"\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003ePower envelope\u003c\/h3\u003e\n\u003cul style=\"margin:0;padding-left:18px;font-size:14px;color:#444\"\u003e\n\u003cli\u003eGPU draw: 8 x 575 W = 4 600 W\u003c\/li\u003e\n\u003cli\u003eSystem total at full load: ~5 520 W\u003c\/li\u003e\n\u003cli\u003ePSU total: 6 000 W (5x 1200 W) — 8% headroom at spec\u003c\/li\u003e\n\u003cli\u003eKentino ships with GPU power-cap at 500 W — total drops to ~4 920 W (~15% headroom)\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003eLane topology\u003c\/h3\u003e\n\u003cp style=\"margin:0;font-size:14px;color:#444\"\u003eDual Turin provides 2x 128 = 256 PCIe Gen5 lanes host-side. Active Gen5 risers carry Gen5 x16 end-to-end at each GPU — no PCIe switch required (one CPU per 4-card bank). No NVLink; inter-GPU P2P at Gen5 x16 (~60 GB\/s nominal per link).\u003c\/p\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWhat you can run\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#fefaf0;border-left:4px solid #fab400;padding:16px 20px;margin-bottom:24px;border-radius:0 8px 8px 0\"\u003e\n\u003cp style=\"margin:0;font-size:15px;color:#333\"\u003eWith 256 GB of pooled VRAM across 8 Blackwell cards with fp8 native, this server targets frontier 235-480B MoE at Q4 with real context, DeepSeek V3 family at Q2, and Kimi-K2 1.58-bit dynamic-quant at real throughput.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eLLMs — text \/ reasoning \/ coding\u003c\/h3\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin-bottom:8px\"\u003eChinese frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwen3-235B-A22B\u003c\/strong\u003e (Instruct \/ Thinking \/ \"2507\") Q4 (~132 GB) with long context + multi-user batching (~25-40 tok\/s single-stream on 8x RTX 5090, published reference)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eGLM-4.5 \/ 4.6 \/ 4.7\u003c\/strong\u003e Q4 (~177 GB) — flagship reasoning\/coding, 200k ctx on 4.6+\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eGLM-5 \/ GLM-5.1\u003c\/strong\u003e Q2 (~260 GB) with minor RAM spill — frontier coding close to Claude Opus 4.6\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eDeepSeek V3 \/ R1 \/ V3.1 \/ V3.2 \/ V3.2-Speciale\u003c\/strong\u003e Q2 (~215 GB) at useful inference speed (~28 tok\/s single-stream on 8x Blackwell, published reference)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eKimi-K2\u003c\/strong\u003e 1.58-bit UD-TQ1_0 (~240 GB) — trillion-parameter agent at real token throughput (~7-10 tok\/s single-stream, published reference)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eHunyuan-Large\u003c\/strong\u003e 389B\/52B MoE Q4 (~220 GB); \u003cstrong\u003eERNIE-4.5-424B-A47B\u003c\/strong\u003e Q4 (~240 GB)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwen3-Coder-480B-A35B\u003c\/strong\u003e Q4 (~270 GB tight with RAM spill) — SOTA open coding flagship\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eMiniMax-M1 \/ Text-01\u003c\/strong\u003e Q4 (~260 GB) 1M context; \u003cstrong\u003eQwen3.5-397B-A17B\u003c\/strong\u003e Q4 (~214 GB)\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin:20px 0 8px 0\"\u003eWestern frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eMistral Large 3\u003c\/strong\u003e (675B\/41B MoE, Apache 2.0) Q3 (~317 GB with spill) — Western frontier open weights\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eLlama 4 Maverick\u003c\/strong\u003e (400B\/17B, 128 experts) Q4 (~232 GB) multimodal\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eLlama-3.1-Nemotron Ultra 253B\u003c\/strong\u003e Q4 (~119 GB) — matches DeepSeek-R1 at half size\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003egpt-oss-120b\u003c\/strong\u003e MXFP4 native (80 GB) comfortably with room for multiple models\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eDevstral 2\u003c\/strong\u003e 123B (Modified MIT) Q6 — top open coding, 256k ctx\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eLlama 3.3 70B\u003c\/strong\u003e bf16 (~142 GB) multi-tenant serving (~30-40 tok\/s single-stream per RTX 5090 pair TP2, published reference)\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVision-Language Models\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eQwen3-VL-235B-A22B full bf16 (~240 GB on-card); InternVL3.5-241B-A28B (~135 GB Q4); Llama 3.2 90B Vision bf16; Pixtral Large 124B bf16 (~248 GB tight); Qwen3-Omni-30B-A3B; Molmo 72B; ERNIE-4.5-VL; GLM-4.6V full. Blackwell fp8 path gives ~2x throughput on vision-tower inference vs Ada.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eImage generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eFLUX.1 [dev] \/ Kontext \/ Tools full bf16 (~10-18 s\/image at fp8 per card, published reference); SD 3.5 Large; HunyuanImage-2.1 (17B, native 2K); HunyuanImage-3.0 80B\/13B MoE; AuraFlow; OmniGen; multi-worker ComfyUI farms.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVideo generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eWan 2.2 T2V-A14B \/ I2V-A14B dual expert bf16 (both high-noise + low-noise resident simultaneously); HunyuanVideo 13B bf16 both experts; Open-Sora 2.0 (11B) bf16; CogVideoX-5B; Mochi-1; LTX-Video; Pyramid Flow; SVD \/ SV3D \/ SV4D; NVIDIA Cosmos Predict 2.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eAudio \/ Speech \/ TTS\u003c\/h3\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eASR:\u003c\/strong\u003e Whisper v3 large \/ turbo (~50x realtime); Parakeet-TDT 1.1B; Canary 1B; Qwen3-ASR; SenseVoice\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eTTS:\u003c\/strong\u003e CosyVoice 2 \/ 3; Kokoro; Stable Audio Open; XTTS v2; Step-Audio-EditX\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eRealtime \/ S2S:\u003c\/strong\u003e Kyutai Moshi; Step-Audio 2 mini \/ R1; Qwen2.5-Omni-7B\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eMusic \/ SFX:\u003c\/strong\u003e MusicGen; AudioGen; Bark; SeamlessM4T v2\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eMulti-model \/ multi-tenant serving\u003c\/h3\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eFrontier-inference gateway — 200B+ MoE + concurrent 70B + image + video all resident\u003c\/li\u003e\n\u003cli\u003e8-way tensor-parallel for Kimi-K2 \/ DeepSeek V3 at real context\u003c\/li\u003e\n\u003cli\u003eMulti-tenant LLM API — 50-100 concurrent users on 235B Q4 via vLLM\/SGLang\u003c\/li\u003e\n\u003cli\u003eFull Chinese + Western frontier residency concurrently for evaluation \/ benchmarking\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eTarget workloads\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eFrontier open-weight inference backend for a 100-500 seat org, mixing Qwen3-235B, GLM-4.5+, and DeepSeek V3 Q2\u003c\/li\u003e\n\u003cli\u003eKimi-K2 1.58-bit agent platform at production throughput (tool-use, 200+ sequential calls)\u003c\/li\u003e\n\u003cli\u003eFull-fp8 DeepSeek V3 \/ R1 serving on Blackwell silicon\u003c\/li\u003e\n\u003cli\u003eMulti-node training head with Gen5 100 GbE \/ InfiniBand fabric\u003c\/li\u003e\n\u003cli\u003eDual-role inference + diffusion farm (Qwen3-235B + FLUX.1 + HunyuanVideo 13B concurrently)\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003ePublished performance references\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#0d0d0d;color:#fff;border-radius:12px;padding:24px;margin-bottom:24px\"\u003e\n\u003cp style=\"margin:0 0 4px 0;font-size:13px;color:#888;text-transform:uppercase;letter-spacing:1px\"\u003eExternal references | Not measured on Kentino hardware\u003c\/p\u003e\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-top:16px;font-size:14px\"\u003e\n\u003cthead\u003e\u003ctr style=\"border-bottom:1px solid #333\"\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eBenchmark\u003c\/th\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eResult\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eRTX 5090 per-card INT8 TOPS\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e1 676 TOPS\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eRTX 5090 memory bandwidth\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~1 800 GB\/s per card\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003evLLM — Qwen3-235B Q4_K_M on 4x RTX 5090 (single)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~90 tok\/s\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003evLLM — Qwen3-235B Q4_K_M on 4x RTX 5090 (batch-32)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e~450 tok\/s aggregate\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eSGLang — DeepSeek V3 Q2 on 8x Blackwell (single)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~28 tok\/s\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003ellama.cpp — Kimi-K2 UD-TQ1_0 on 8x Blackwell 256 GB\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~7-10 tok\/s\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\u003cp style=\"margin:12px 0 0 0;font-size:13px;color:#666\"\u003eKentino will publish first-party tok\/s after the first customer build with final Turin SKU.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eNot ideal for\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eBudget-conscious deployments (Turin premium vs Genoa or Rome alternatives)\u003c\/li\u003e\n\u003cli\u003eSingle-tenant 70B dense workloads (overkill — 4x RTX 5090 or 4x RTX Pro 6000 is the right tier)\u003c\/li\u003e\n\u003cli\u003eFrontier 600B+ at Q4+ full context (require 576 GB+ pool — see 6x RTX Pro 6000)\u003c\/li\u003e\n\u003cli\u003eSustained training from scratch (no NVLink on consumer RTX 5090)\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWarranty and lead time\u003c\/h2\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:24px\"\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e2 years\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003eparts warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e1 year\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elabor warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e10-28 days\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elead time\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"font-size:14px;color:#666\"\u003eBuild includes assembly, BIOS configuration, driver install, burn-in testing, and functional verification. Lead time depends on component availability, confirmed at order.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eRecommended add-ons\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eScale RAM to 1.5 TB DDR5 (24x 64 GB full population) — required for Kimi-K2 Q4 or DeepSeek V3 Q3 without RAM spill\u003c\/li\u003e\n\u003cli\u003eNVIDIA ConnectX-5 100 GbE MCX555A-ECAT — Gen5 fabric for cluster nodes\u003c\/li\u003e\n\u003cli\u003eMellanox ConnectX-6 25 GbE SFP28 for datacenter fabric\u003c\/li\u003e\n\u003cli\u003e4 TB NVMe Gen4 x4 for boot + model library\u003c\/li\u003e\n\u003cli\u003eFull 24U rack cabinet with managed PDU\u003c\/li\u003e\n\u003cli\u003eOnline UPS 8-10 kVA (critical — 5.5 kW peak draw)\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003c\/div\u003e\n","brand":"Kentino s.r.o.","offers":[{"title":"Default Title","offer_id":52940216631624,"sku":null,"price":0.0,"currency_code":"EUR","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0843\/5479\/3800\/files\/kentino-ai-server-4-gpu-topdown_6b2c51b2-25c1-479d-929a-29eebe60e5ef.jpg?v=1776940959"},{"product_id":"k-ai-288-rome-l40-6-nvidia-l40-passive-enterprise-288-gb-ecc-vram","title":"K-AI 288 Rome L40 — 6× NVIDIA L40 Passive Enterprise (288 GB ECC VRAM)","description":"\u003cdiv style=\"font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;line-height:1.7;color:#1a1a1a\"\u003e\n\n\u003cdiv style=\"background:linear-gradient(135deg,#0d0d0d 0%,#1a1a2e 100%);color:#fff;padding:32px;border-radius:12px;margin-bottom:32px\"\u003e\n\u003cp style=\"font-size:18px;margin:0 0 20px 0;color:#ccc\"\u003eK-AI 288 Rome L40 2172TOPS\u003c\/p\u003e\n\u003cp style=\"font-size:28px;font-weight:700;margin:0 0 16px 0;line-height:1.3\"\u003e288 GB ECC VRAM Enterprise Server\u003cbr\u003e6x NVIDIA L40 Passive | EPYC Milan | 2 172 TOPS INT8\u003c\/p\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-top:24px\"\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e2 172\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eTOPS INT8\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e288 GB\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eECC VRAM pool\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003eECC\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eend-to-end\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e24\/7\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eproduction-rated\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"margin-top:20px;font-size:15px;color:#aaa\"\u003ePublished external references. Not measured on Kentino hardware.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003cp style=\"font-size:17px;color:#333;margin-bottom:24px\"\u003eA 4U rack-mount enterprise inference server with six NVIDIA L40 Ada Lovelace passive datacenter cards (48 GB ECC each) pooled to 288 GB ECC VRAM, one AMD EPYC 7643 Milan CPU (48C\/96T), 384 GB DDR4-2666 ECC, 2 TB NVMe boot, and dual synchronized 2.5 kW ATX PSU. ECC end-to-end, purpose-built for 24\/7 enterprise production where bit-level integrity and serviceable failure domains matter.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eHardware\u003c\/h2\u003e\n\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-bottom:24px;font-size:15px\"\u003e\n\u003cthead\u003e\u003ctr style=\"background:#0d0d0d;color:#fff\"\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eComponent\u003c\/th\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eDetail\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eGPUs\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e6x NVIDIA L40 48 GB ECC (Ada Lovelace, passive datacenter, 300 W, PCIe 4.0 x16, dual-slot, 362 INT8 TOPS\/card)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eVRAM pool\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e288 GB aggregate ECC across 6 cards (no NVLink on L40 PCIe SKU)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCPU\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eAMD EPYC 7643 Milan (48C\/96T, 225 W, 128x PCIe 4.0 lanes)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eMotherboard\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eASRock Rack ROMED8-2T (SP3, 7x PCIe 4.0 x16, 8x DDR4 ECC, 2x 10 GbE, IPMI)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eSystem RAM\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e384 GB DDR4-2666 ECC RDIMM (6x 64 GB — 2 DIMM slots open for upgrade to 512 GB)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eBoot \/ storage\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e2 TB NVMe M.2 (PCIe 4.0 x4)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003ePower supply\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e2x 2.5 kW ATX with dual-PSU sync cable (5 kW aggregate)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eChassis\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e4U rack-mount (6-slot layout)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCooling\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eSP3 tower cooler (Arctic Freezer 4U-M class) + front-to-back directed airflow (industrial fans)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eNetwork\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eOnboard dual 10 GbE (Intel X550)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:32px\"\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003ePower envelope\u003c\/h3\u003e\n\u003cul style=\"margin:0;padding-left:18px;font-size:14px;color:#444\"\u003e\n\u003cli\u003eGPU draw: 6 x 300 W = 1 800 W\u003c\/li\u003e\n\u003cli\u003eSystem total under full load: ~2 175 W\u003c\/li\u003e\n\u003cli\u003ePSU total: 5 000 W (dual 2.5 kW synced) — 56.5% headroom\u003c\/li\u003e\n\u003cli\u003eDual PSU for split power delivery — single PSU failure = loss of 2 GPUs or 2 GPUs + motherboard\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003eLane topology\u003c\/h3\u003e\n\u003cp style=\"margin:0;font-size:14px;color:#444\"\u003eROMED8-2T exposes 7x PCIe 4.0 x16 direct from EPYC Milan. Six slots populated with passive Gen4 x16 risers — one free slot for NIC \/ storage. No PCIe switch required. L40 native link is PCIe 4.0 x16 — no bandwidth loss. No NVLink; inter-GPU traffic runs PCIe peer-to-peer.\u003c\/p\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWhat you can run\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#fefaf0;border-left:4px solid #fab400;padding:16px 20px;margin-bottom:24px;border-radius:0 8px 8px 0\"\u003e\n\u003cp style=\"margin:0;font-size:15px;color:#333\"\u003eWith 288 GB of pooled ECC VRAM across 6 passive L40 cards, this server handles frontier open-weight LLMs at Q4, multi-model concurrent serving, video\/media pipelines, and 24\/7 enterprise production inference. Note: L40 is Ada Lovelace, not Blackwell — fp8 upcasts to bf16. Use GGUF Q4\/Q5 or AWQ\/GPTQ int4 for maximum VRAM efficiency.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eLLMs — text \/ reasoning \/ coding\u003c\/h3\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin-bottom:8px\"\u003eChinese frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwen3-235B-A22B\u003c\/strong\u003e Q4 (~132 GB) with very long context + generous KV budget (~15-20 tok\/s single, published reference)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eGLM-4.5 \/ 4.6 \/ 4.7\u003c\/strong\u003e Q4 (~177 GB) comfortable on 6-way TP (~12-18 tok\/s single, published reference)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eHunyuan-Large\u003c\/strong\u003e 389B\/52B Q3 (~160 GB); \u003cstrong\u003eERNIE-4.5-424B-A47B\u003c\/strong\u003e Q3 (~180 GB)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwen3-Coder-480B-A35B\u003c\/strong\u003e Q2 (~160 GB) flagship coding agent\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eMiniMax-M1 \/ Text-01\u003c\/strong\u003e Q3 (~180 GB) 1M-ctx Lightning Attention\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwen3-30B-A3B \/ QwQ-32B \/ Qwen3-32B\u003c\/strong\u003e — single-card with 6 parallel streams\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eDeepSeek-R2\u003c\/strong\u003e 32B sparse MoE — single card per stream, 6 concurrent sessions\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin:20px 0 8px 0\"\u003eWestern frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eLlama 3.3 70B\u003c\/strong\u003e bf16 (~142 GB) multi-tenant serving (~17 tok\/s single, published reference), or Q4 (~43 GB) with 6 concurrent copies\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eLlama 4 Scout\u003c\/strong\u003e 109B\/17B bf16 (~218 GB tight) or Q4 (~63 GB) comfortable\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eMistral Small 3 \/ Magistral \/ Devstral Small\u003c\/strong\u003e (24B) bf16 (~40-50 tok\/s single, published reference)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003ePixtral Large \/ Mistral Large 2\u003c\/strong\u003e Q6-Q8 (~90-140 GB)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eLlama-3.1-Nemotron Ultra 253B\u003c\/strong\u003e Q4 (~119 GB)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003egpt-oss-120b\u003c\/strong\u003e MXFP4 (~80 GB via GGUF on Ada — note Ada upcast caveat)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eCohere Command R+\u003c\/strong\u003e 104B Q4 RAG stack\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVision-Language Models\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eQwen3-VL-235B-A22B Q4; Qwen3-VL-32B; InternVL3.5-78B \/ 241B-A28B Q4 (~135 GB); Llama 3.2 90B Vision bf16 (~180 GB); Pixtral 12B; Molmo 72B; Gemma 3 12B\/27B multimodal; GLM-4.6V full (106B bf16); MiniCPM-o 2.6. L40's NVENC\/NVDEC is particularly useful for high-throughput VLM document \/ video pipelines.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eImage generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eFLUX.1 [dev] \/ Kontext \/ Tools across multiple workers concurrently (~3.5 s per 1024x1024 image on single L40 fp8, published reference) — 6x ComfyUI worker farm possible; SD 3.5 Large; HunyuanImage-2.1 (17B) bf16; HunyuanDiT; Kolors 2.0; AuraFlow; OmniGen.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVideo generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eWan 2.2 T2V-A14B \/ I2V-A14B dual-expert bf16 (~54 GB, ~20-30 s per 4s clip at 720p, published reference); HunyuanVideo 13B bf16 both experts; Open-Sora 2.0 bf16; CogVideoX-5B; Mochi-1; LTX-Video; Pyramid Flow; NVIDIA Cosmos Predict 2. L40's hardware NVENC\/NVDEC handles caption \/ moderation \/ transcode at scale alongside generation.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eAudio \/ Speech \/ TTS\u003c\/h3\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eASR:\u003c\/strong\u003e Whisper v3 large \/ turbo; Parakeet-TDT 1.1B; Canary 1B; Qwen3-ASR; SenseVoice\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eTTS:\u003c\/strong\u003e CosyVoice 2\/3; Kokoro 82M; Stable Audio Open; XTTS v2; Step-Audio-EditX\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eRealtime \/ S2S:\u003c\/strong\u003e Kyutai Moshi; Step-Audio 2 mini \/ R1; Qwen2.5-Omni-7B\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eMulti-model \/ multi-tenant serving\u003c\/h3\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eMulti-model residency — Qwen3-235B Q4 + FLUX.1 + HunyuanVideo + Whisper-turbo + Moshi + embedder, all resident\u003c\/li\u003e\n\u003cli\u003e6 concurrent 48 GB-class workloads (one per card): 6x Qwen3-VL-32B, or 6x FLUX.1 workers, or 6x ASR streams\u003c\/li\u003e\n\u003cli\u003e6-way tensor-parallel for 200B+ MoE at Q4 with real context\u003c\/li\u003e\n\u003cli\u003eRAG pipelines — Command R+ \/ Qwen3 + reranker + embedder + image analysis on same host\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eTarget workloads\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e24\/7 production LLM inference backend — 100+ concurrent users on 200B+ MoE at Q4, ECC-protected\u003c\/li\u003e\n\u003cli\u003eMedia-AI pipeline at enterprise scale — caption + moderation + thumbnail + transcode on 6 parallel streams via NVENC\/NVDEC\u003c\/li\u003e\n\u003cli\u003eMulti-tenant SaaS where per-tenant isolation across physical cards matters\u003c\/li\u003e\n\u003cli\u003eRAG backend with Command R+ reader + reranker + embedder + vision fully resident\u003c\/li\u003e\n\u003cli\u003eReliability-first pair replacing the 12x L40 Legacy — two K-AI 288 servers = 576 GB aggregate with independent failure domains\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003ePublished performance references\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#0d0d0d;color:#fff;border-radius:12px;padding:24px;margin-bottom:24px\"\u003e\n\u003cp style=\"margin:0 0 4px 0;font-size:13px;color:#888;text-transform:uppercase;letter-spacing:1px\"\u003eExternal references | Not measured on Kentino hardware\u003c\/p\u003e\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-top:16px;font-size:14px\"\u003e\n\u003cthead\u003e\u003ctr style=\"border-bottom:1px solid #333\"\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eBenchmark\u003c\/th\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eResult\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eL40 per-card INT8 TOPS\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e362 TOPS\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eL40 memory bandwidth\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e864 GB\/s per card\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003evLLM — Llama 3.3 70B AWQ INT4 on 2x L40 TP (single)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~25-35 tok\/s\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003evLLM — Llama 3.3 70B AWQ INT4 on 2x L40 TP (batch-16)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e~150-200 tok\/s aggregate\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003ellama.cpp — GLM-4.6 Q4 on 6x L40 (single)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~12-18 tok\/s\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eFLUX.1 [dev] on single L40 fp8\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~3.5 s per 1024x1024 image\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\u003cp style=\"margin:12px 0 0 0;font-size:13px;color:#666\"\u003eKentino will publish first-party numbers after the initial customer build.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eNot ideal for\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003efp8-native inference at full speed — Ada upcasts to bf16; use GGUF Q4\/Q5 or AWQ\/GPTQ int4 instead. For fp8 native see K-AI 384 Rome RTXPro6000 (Blackwell)\u003c\/li\u003e\n\u003cli\u003eTraining large models from scratch (no NVLink)\u003c\/li\u003e\n\u003cli\u003eBudget single-user inference — 4x L4 or 4x 5080 is materially cheaper for small workloads\u003c\/li\u003e\n\u003cli\u003eFrontier 600B+ dense at Q4+ (require 576 GB+ pool — see 6x RTX Pro 6000)\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWarranty and lead time\u003c\/h2\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:24px\"\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e3 years\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003eNVIDIA OEM GPU warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e2 years\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003eparts warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e1 year\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elabor warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e10-28 days\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elead time\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"font-size:14px;color:#666\"\u003eBuild includes assembly, BIOS configuration, driver install, burn-in, memtest, and functional verification. Lead time depends on component availability, confirmed at order.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eRecommended add-ons\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eUpgrade RAM to 512 GB DDR4 (add 2x 64 GB — 2 DIMM slots open) for heavier KV budget\u003c\/li\u003e\n\u003cli\u003e4 TB NVMe Gen4 x4 for model library staging\u003c\/li\u003e\n\u003cli\u003eFull 24U rack cabinet with managed PDU + online UPS (critical for 24\/7 ECC workloads)\u003c\/li\u003e\n\u003cli\u003ePaired second K-AI 288 unit — replaces the 12x L40 Legacy envelope with two independent failure domains\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003c\/div\u003e\n","brand":"Kentino s.r.o.","offers":[{"title":"Default Title","offer_id":52940296782152,"sku":null,"price":59490.0,"currency_code":"EUR","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0843\/5479\/3800\/files\/kentino-ai-server-4-gpu-topdown_6b2c51b2-25c1-479d-929a-29eebe60e5ef.jpg?v=1776940959"},{"product_id":"k-ai-384-rome-rtxpro6000-4-rtx-pro-6000-blackwell-server-edition-384-gb-ecc-vram","title":"K-AI 384 Rome RTXPro6000 — 4× RTX Pro 6000 Blackwell Server Edition (384 GB ECC VRAM)","description":"\u003cdiv style=\"font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;line-height:1.7;color:#1a1a1a\"\u003e\n\n\u003cdiv style=\"background:linear-gradient(135deg,#0d0d0d 0%,#1a1a2e 100%);color:#fff;padding:32px;border-radius:12px;margin-bottom:32px\"\u003e\n\u003cp style=\"font-size:18px;margin:0 0 20px 0;color:#ccc\"\u003eK-AI 384 Rome RTXPro6000 8000TOPS\u003c\/p\u003e\n\u003cp style=\"font-size:28px;font-weight:700;margin:0 0 16px 0;line-height:1.3\"\u003e384 GB ECC VRAM Datacenter Server\u003cbr\u003e4x RTX Pro 6000 Server Edition | EPYC Milan | 8 000 TOPS INT8\u003c\/p\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-top:24px\"\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e8 000\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eTOPS INT8\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e384 GB\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eECC VRAM pool\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003efp8\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eBlackwell native\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003ePassive\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003edatacenter cooling\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"margin-top:20px;font-size:15px;color:#aaa\"\u003ePublished external references. Not measured on Kentino hardware.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003cp style=\"font-size:17px;color:#333;margin-bottom:24px\"\u003eA 4U rack-mount inference server with four NVIDIA RTX Pro 6000 Blackwell Server Edition passive datacenter cards (96 GB ECC each) pooled to 384 GB ECC VRAM, one AMD EPYC 7643 Milan CPU (48C\/96T), 384 GB DDR4-2666 ECC, 2 TB NVMe boot, and dual synchronized 2.5 kW ATX PSU. Blackwell silicon with fp8 native acceleration. Passive airflow-directed cooling for datacenter chassis. Runs DeepSeek V3 Q3, Mistral Large 3, Qwen3-Coder-480B, and every major frontier open-weight model.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eHardware\u003c\/h2\u003e\n\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-bottom:24px;font-size:15px\"\u003e\n\u003cthead\u003e\u003ctr style=\"background:#0d0d0d;color:#fff\"\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eComponent\u003c\/th\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eDetail\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eGPUs\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e4x NVIDIA RTX Pro 6000 Blackwell Server Edition 96 GB ECC (passive datacenter cooler, 600 W TGP, PCIe 5.0 x16, 2000 INT8 TOPS\/card, fp8 native)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eVRAM pool\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e384 GB aggregate ECC across 4 cards\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCPU\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eAMD EPYC 7643 Milan (48C\/96T, 225 W, 128x PCIe 4.0 lanes)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eMotherboard\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eASRock Rack ROMED8-2T (SP3, 7x PCIe 4.0 x16, 8x DDR4 ECC, 2x 10 GbE, IPMI)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eSystem RAM\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e384 GB DDR4-2666 ECC RDIMM (6x 64 GB — 2 DIMM slots open for upgrade to 512 GB)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eBoot \/ storage\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e2 TB NVMe M.2 (PCIe 4.0 x4)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003ePower supply\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e2x 2.5 kW ATX with dual-PSU sync cable (5 kW aggregate)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eChassis\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e4U rack-mount\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCooling\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eSP3 tower cooler (Arctic Freezer 4U-M class) + front-to-back directed airflow (3x 120 mm front intake + 1x 120 mm rear exhaust). Passive GPU cards — requires datacenter chassis airflow.\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eNetwork\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eOnboard dual 10 GbE (Intel X550)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:32px\"\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003ePower envelope\u003c\/h3\u003e\n\u003cul style=\"margin:0;padding-left:18px;font-size:14px;color:#444\"\u003e\n\u003cli\u003eGPU draw: 4 x 600 W = 2 400 W\u003c\/li\u003e\n\u003cli\u003eSystem total under full load: ~2 775 W\u003c\/li\u003e\n\u003cli\u003ePSU total: 5 000 W (dual 2.5 kW synced) — 44.5% headroom\u003c\/li\u003e\n\u003cli\u003eDual PSU for split power delivery — single PSU failure = loss of 2 GPUs or 2 GPUs + motherboard\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003eLane topology\u003c\/h3\u003e\n\u003cp style=\"margin:0;font-size:14px;color:#444\"\u003eROMED8-2T exposes 7x PCIe 4.0 x16 direct from EPYC Milan. Four slots populated — three free for NIC \/ storage \/ telemetry. RTX Pro 6000 is Gen5-capable silicon; runs Gen4 at full x16 on this platform — no bandwidth bottleneck for inference. No PCIe switch. No NVLink.\u003c\/p\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWhat you can run\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#fefaf0;border-left:4px solid #fab400;padding:16px 20px;margin-bottom:24px;border-radius:0 8px 8px 0\"\u003e\n\u003cp style=\"margin:0;font-size:15px;color:#333\"\u003eWith 384 GB of pooled ECC VRAM on Blackwell fp8 native silicon, this server runs DeepSeek V3 \/ R1 at Q3 comfortably on-card, Mistral Large 3 Q3, GLM-5 Q3, Qwen3-Coder-480B Q3, and Llama 3.3 70B bf16 resident on a single card (96 GB\/card).\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eLLMs — text \/ reasoning \/ coding\u003c\/h3\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin-bottom:8px\"\u003eChinese frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eDeepSeek V3 \/ V3-0324 \/ V3.1 \/ V3.2 \/ R1 \/ R1-0528\u003c\/strong\u003e Q3 (~290 GB) comfortably on-card (~30-40 tok\/s single, published reference); fp8 native (~670 GB) with RAM spill\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwen3-Coder-480B-A35B\u003c\/strong\u003e Q3 (~350 GB tight with RAM spill) — SOTA open coding agent (~18-25 tok\/s single, published reference)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwen3-235B-A22B\u003c\/strong\u003e Q6\/Q8 (~200-280 GB) with very long ctx and multi-user batching\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eGLM-5 \/ GLM-5.1\u003c\/strong\u003e Q3 (~317 GB) — Chinese frontier, close to Claude Opus 4.6 on coding\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eKimi-K2\u003c\/strong\u003e 1.58-bit UD (~240 GB) — trillion-param agent at real throughput\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eHunyuan-Large\u003c\/strong\u003e 389B\/52B Q4 (~220 GB), fp8 native (~390 GB spill)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eERNIE-4.5-424B-A47B\u003c\/strong\u003e Q4 (~240 GB); \u003cstrong\u003eMiniMax-M1\u003c\/strong\u003e Q4 (~260 GB) 1M-ctx\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eLlama 3.3 70B\u003c\/strong\u003e bf16 resident on a single card (96 GB\/card — no tensor-parallel needed)\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin:20px 0 8px 0\"\u003eWestern frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eMistral Large 3\u003c\/strong\u003e (675B\/41B MoE, Apache 2.0) Q3 (~317 GB) — frontier Western open weights (~20-30 tok\/s single, published reference)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eLlama 4 Maverick\u003c\/strong\u003e (400B\/17B) Q4 (~232 GB) with generous KV budget (~45-55 tok\/s single, published reference)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eLlama-3.1-Nemotron Ultra 253B\u003c\/strong\u003e Q4-Q6 (~119-207 GB)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003egpt-oss-120b\u003c\/strong\u003e MXFP4 native (80 GB) with massive concurrent fleet headroom\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003ePixtral Large \/ Mistral Large 2\u003c\/strong\u003e bf16 (~248 GB); \u003cstrong\u003eDevstral 2\u003c\/strong\u003e 123B bf16 — 256k top open coding\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eLlama 3.3 70B\u003c\/strong\u003e bf16 on a single card; 4x concurrent 70B deployments possible\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVision-Language Models\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eQwen3-VL-235B-A22B bf16 (~240 GB); InternVL3.5-241B-A28B Q4 (~135 GB); Llama 3.2 90B Vision bf16; Pixtral Large 124B bf16 (~248 GB); Qwen3-Omni-30B-A3B; Molmo 72B; ERNIE-4.5-VL; GLM-4.6V 106B bf16 on TP. Blackwell fp8 delivers ~2x throughput on vision-tower inference vs Ada.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eImage generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eFLUX.1 [dev] \/ Kontext \/ Tools at fp8 native (~15-20 s per 1024x1024 image on single RTX Pro 6000, published reference); SD 3.5 Large; HunyuanImage-2.1 (17B native 2K); HunyuanImage-3.0 80B\/13B MoE; AuraFlow; OmniGen; 4x concurrent ComfyUI workers.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVideo generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eWan 2.2 T2V-A14B \/ I2V-A14B dual expert bf16; HunyuanVideo 13B bf16 both experts; Open-Sora 2.0 (11B) bf16; CogVideoX-5B; Mochi-1; LTX-Video; Pyramid Flow; SVD \/ SV3D \/ SV4D; NVIDIA Cosmos Predict 2.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eAudio \/ Speech \/ TTS\u003c\/h3\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eASR:\u003c\/strong\u003e Whisper v3 large \/ turbo; Parakeet-TDT 1.1B; Canary 1B; Qwen3-ASR; SenseVoice\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eTTS:\u003c\/strong\u003e CosyVoice 2\/3; Kokoro; Stable Audio Open; XTTS v2; Step-Audio-EditX\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eRealtime \/ S2S:\u003c\/strong\u003e Kyutai Moshi; Step-Audio 2 mini \/ R1; Qwen2.5-Omni-7B\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eMusic \/ SFX:\u003c\/strong\u003e MusicGen \/ AudioGen \/ Bark \/ SeamlessM4T\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eMulti-model \/ multi-tenant serving\u003c\/h3\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eDeepSeek V3 Q3 + concurrent 70B + FLUX.1 + Whisper all resident\u003c\/li\u003e\n\u003cli\u003e4-way tensor-parallel on 350-400B class at Q4\u003c\/li\u003e\n\u003cli\u003ePer-card tenant isolation — one 96 GB Llama 3.3 70B bf16 per card, 4 independent inference silos\u003c\/li\u003e\n\u003cli\u003eMulti-model RAG: reader + reranker + vision + embedder all on one host\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eTarget workloads\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eFrontier open-weight inference backend — DeepSeek V3 Q3, Qwen3-Coder-480B Q3, GLM-5 Q3\u003c\/li\u003e\n\u003cli\u003eProduction serving of Llama 4 Maverick Q4 multimodal agents with generous context budget\u003c\/li\u003e\n\u003cli\u003e4-tenant per-card isolation — one Llama 3.3 70B bf16 per tenant, zero cross-contamination\u003c\/li\u003e\n\u003cli\u003efp8-native DeepSeek \/ R1 \/ Hunyuan serving on Blackwell silicon\u003c\/li\u003e\n\u003cli\u003eMistral Large 3 Q3 as Western Apache-2.0 frontier open-weight alternative\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003ePublished performance references\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#0d0d0d;color:#fff;border-radius:12px;padding:24px;margin-bottom:24px\"\u003e\n\u003cp style=\"margin:0 0 4px 0;font-size:13px;color:#888;text-transform:uppercase;letter-spacing:1px\"\u003eExternal references | Not measured on Kentino hardware\u003c\/p\u003e\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-top:16px;font-size:14px\"\u003e\n\u003cthead\u003e\u003ctr style=\"border-bottom:1px solid #333\"\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eBenchmark\u003c\/th\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eResult\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eRTX Pro 6000 per-card INT8 TOPS\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e2 000 TOPS\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eRTX Pro 6000 memory bandwidth\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~1 800 GB\/s per card\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003evLLM — DeepSeek V3 Q3 on 4x Blackwell PCIe (single)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~30-40 tok\/s\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003evLLM — DeepSeek V3 Q3 on 4x Blackwell PCIe (batch-8)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e~200 tok\/s aggregate\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eSGLang — Llama 4 Maverick Q4 on 4x Blackwell (single)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~45-55 tok\/s\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003ellama.cpp — Qwen3-Coder-480B Q3 on 4x Blackwell (single)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~18-25 tok\/s\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eFLUX.1 [dev] fp8 on single RTX Pro 6000\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~1.8 s per 1024x1024 image\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\u003cp style=\"margin:12px 0 0 0;font-size:13px;color:#666\"\u003eKentino will publish first-party numbers after initial customer build.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eNot ideal for\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eSingle-user workloads up to 70B — 4x RTX 5090 is materially cheaper for a 128 GB pool if ECC and passive reliability are not required\u003c\/li\u003e\n\u003cli\u003eSilent lab \/ office-adjacent deployment — passive cooler requires proper datacenter front-to-back airflow. For acoustic-sensitive sites choose the Max-Q turbofan variant (K-AI 384 Rome RTXPro6000MQ)\u003c\/li\u003e\n\u003cli\u003eFrontier training from scratch (no NVLink)\u003c\/li\u003e\n\u003cli\u003eFull DeepSeek V3 Q4 on-card (~404 GB) — upgrade to 6x RTX Pro 6000 \/ 576 GB\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWarranty and lead time\u003c\/h2\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:24px\"\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e3 years\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003eNVIDIA OEM GPU warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e2 years\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003eparts warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e1 year\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elabor warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e10-28 days\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elead time\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"font-size:14px;color:#666\"\u003eBuild includes assembly, BIOS configuration, driver install, burn-in, memtest, and functional verification. Lead time depends on component availability, confirmed at order.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eRecommended add-ons\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eUpgrade RAM to 512 GB DDR4 (add 2x 64 GB — 2 DIMM slots open) for RAM-spill headroom on Q3 frontier quants\u003c\/li\u003e\n\u003cli\u003e4 TB NVMe Gen4 x4 for frontier-model library (DeepSeek V3 Q3 alone is ~290 GB on disk)\u003c\/li\u003e\n\u003cli\u003eFull 24U rack cabinet with managed PDU + online UPS\u003c\/li\u003e\n\u003cli\u003eAlternative silhouette: Max-Q turbofan variant (K-AI 384 Rome RTXPro6000MQ) — same silicon, quieter blower cooler, for lab deployments\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003c\/div\u003e\n","brand":"Kentino s.r.o.","offers":[{"title":"Default Title","offer_id":52940310217032,"sku":null,"price":46583.0,"currency_code":"EUR","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0843\/5479\/3800\/files\/kentino-ai-server-4-gpu-topdown_6b2c51b2-25c1-479d-929a-29eebe60e5ef.jpg?v=1776940959"},{"product_id":"k-ai-384-rome-rtxpro6000mq-4-rtx-pro-6000-blackwell-max-q-turbofan-384-gb-ecc-vram","title":"K-AI 384 Rome RTXPro6000MQ — 4× RTX Pro 6000 Blackwell Max-Q Turbofan (384 GB ECC VRAM)","description":"\u003cdiv style=\"font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;line-height:1.7;color:#1a1a1a\"\u003e\n\n\u003cdiv style=\"background:linear-gradient(135deg,#0d0d0d 0%,#1a1a2e 100%);color:#fff;padding:32px;border-radius:12px;margin-bottom:32px\"\u003e\n\u003cp style=\"font-size:18px;margin:0 0 20px 0;color:#ccc\"\u003eK-AI 384 Rome RTXPro6000MQ 8000TOPS\u003c\/p\u003e\n\u003cp style=\"font-size:28px;font-weight:700;margin:0 0 16px 0;line-height:1.3\"\u003e384 GB ECC VRAM Lab Server\u003cbr\u003e4x RTX Pro 6000 Max-Q Turbofan | EPYC Milan | 8 000 TOPS INT8\u003c\/p\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-top:24px\"\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e8 000\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eTOPS INT8\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e384 GB\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eECC VRAM pool\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003efp8\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eBlackwell native\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003eQuiet\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eturbofan cooling\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"margin-top:20px;font-size:15px;color:#aaa\"\u003ePublished external references. Not measured on Kentino hardware.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003cp style=\"font-size:17px;color:#333;margin-bottom:24px\"\u003eA 4U rack-mount inference server with four NVIDIA RTX Pro 6000 Blackwell Max-Q turbofan (blower) cards (96 GB ECC each) pooled to 384 GB ECC VRAM, one AMD EPYC 7643 Milan CPU (48C\/96T), 384 GB DDR4-2666 ECC, 2 TB NVMe boot, and dual synchronized 2.5 kW ATX PSU. Same Blackwell silicon as the Server Edition — identical inference envelope, identical throughput — with a quieter blower cooler suited to lab, R\u0026amp;D, and office-adjacent environments.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eHardware\u003c\/h2\u003e\n\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-bottom:24px;font-size:15px\"\u003e\n\u003cthead\u003e\u003ctr style=\"background:#0d0d0d;color:#fff\"\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eComponent\u003c\/th\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eDetail\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eGPUs\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e4x NVIDIA RTX Pro 6000 Blackwell Max-Q 96 GB ECC (turbofan \/ blower cooler, 600 W TGP, PCIe 5.0 x16, 2000 INT8 TOPS\/card, fp8 native)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eVRAM pool\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e384 GB aggregate ECC across 4 cards\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCPU\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eAMD EPYC 7643 Milan (48C\/96T, 225 W, 128x PCIe 4.0 lanes)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eMotherboard\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eASRock Rack ROMED8-2T (SP3, 7x PCIe 4.0 x16, 8x DDR4 ECC, 2x 10 GbE, IPMI)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eSystem RAM\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e384 GB DDR4-2666 ECC RDIMM (6x 64 GB — 2 DIMM slots open for upgrade to 512 GB)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eBoot \/ storage\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e2 TB NVMe M.2 (PCIe 4.0 x4)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003ePower supply\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e2x 2.5 kW ATX with dual-PSU sync cable (5 kW aggregate)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eChassis\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e4U rack-mount\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCooling\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eSP3 tower cooler (Arctic Freezer 4U-M class) + front-to-back directed airflow (3x 120 mm front intake + 1x 120 mm rear exhaust). GPU cards self-cooled via turbofan blower (rear exhaust) — quieter for lab environments.\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eNetwork\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eOnboard dual 10 GbE (Intel X550)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:32px\"\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003ePower envelope\u003c\/h3\u003e\n\u003cul style=\"margin:0;padding-left:18px;font-size:14px;color:#444\"\u003e\n\u003cli\u003eGPU draw: 4 x 600 W = 2 400 W\u003c\/li\u003e\n\u003cli\u003eSystem total under full load: ~2 775 W\u003c\/li\u003e\n\u003cli\u003ePSU total: 5 000 W (dual 2.5 kW synced) — 44.5% headroom\u003c\/li\u003e\n\u003cli\u003eDual PSU for split power delivery — single PSU failure = loss of 2 GPUs or 2 GPUs + motherboard\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003eThermal profile (Max-Q)\u003c\/h3\u003e\n\u003cp style=\"margin:0;font-size:14px;color:#444\"\u003eMax-Q uses a turbofan (blower) cooler with directional rear-of-card exhaust. Expected GPU hotspot 72-80 C under continuous load. Materially quieter than passive cards in a high-static-pressure chassis. Better suited to non-datacenter airflow, open-rack, or lab \/ office-adjacent placement. Silicon, TDP, ECC, and performance are identical to the Server Edition.\u003c\/p\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWhat you can run\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#fefaf0;border-left:4px solid #fab400;padding:16px 20px;margin-bottom:24px;border-radius:0 8px 8px 0\"\u003e\n\u003cp style=\"margin:0;font-size:15px;color:#333\"\u003eIdentical to the Server Edition (K-AI 384 Rome RTXPro6000) — same Blackwell silicon, same 384 GB ECC pool, same fp8 native, same model compatibility. The difference is acoustic, not computational.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eLLMs — text \/ reasoning \/ coding\u003c\/h3\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin-bottom:8px\"\u003eChinese frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eDeepSeek V3 \/ V3-0324 \/ V3.1 \/ V3.2 \/ R1 \/ R1-0528\u003c\/strong\u003e Q3 (~290 GB) comfortably on-card (~30-40 tok\/s single, published reference); fp8 native (~670 GB) with RAM spill\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwen3-Coder-480B-A35B\u003c\/strong\u003e Q3 (~350 GB tight with RAM spill) — SOTA open coding agent (~18-25 tok\/s single, published reference)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwen3-235B-A22B\u003c\/strong\u003e Q6\/Q8 (~200-280 GB) with long ctx and multi-user batching\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eGLM-5 \/ GLM-5.1\u003c\/strong\u003e Q3 (~317 GB) — Chinese frontier, close to Claude Opus 4.6 on coding\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eKimi-K2\u003c\/strong\u003e 1.58-bit UD (~240 GB) — trillion-param agent at real throughput\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eHunyuan-Large\u003c\/strong\u003e 389B\/52B Q4 (~220 GB), fp8 native (~390 GB spill)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eERNIE-4.5-424B-A47B\u003c\/strong\u003e Q4 (~240 GB); \u003cstrong\u003eMiniMax-M1\u003c\/strong\u003e Q4 (~260 GB) 1M-ctx\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eLlama 3.3 70B\u003c\/strong\u003e bf16 resident on a single card (96 GB\/card)\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin:20px 0 8px 0\"\u003eWestern frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eMistral Large 3\u003c\/strong\u003e (675B\/41B MoE, Apache 2.0) Q3 (~317 GB) — frontier Western open weights (~20-30 tok\/s single, published reference)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eLlama 4 Maverick\u003c\/strong\u003e (400B\/17B) Q4 (~232 GB) with generous KV budget (~45-55 tok\/s single, published reference)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eLlama-3.1-Nemotron Ultra 253B\u003c\/strong\u003e Q4-Q6 (~119-207 GB)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003egpt-oss-120b\u003c\/strong\u003e MXFP4 native (80 GB) with concurrent fleet headroom\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003ePixtral Large \/ Mistral Large 2\u003c\/strong\u003e bf16 (~248 GB); \u003cstrong\u003eDevstral 2\u003c\/strong\u003e 123B bf16 — 256k top open coding\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eLlama 3.3 70B\u003c\/strong\u003e bf16 on a single card; 4x concurrent 70B deployments possible\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVision-Language Models\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eQwen3-VL-235B-A22B bf16 (~240 GB); InternVL3.5-241B-A28B Q4 (~135 GB); Llama 3.2 90B Vision bf16; Pixtral Large 124B bf16; Qwen3-Omni-30B-A3B; Molmo 72B; ERNIE-4.5-VL; GLM-4.6V 106B bf16 on TP. Blackwell fp8 delivers ~2x throughput on vision-tower inference vs Ada.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eImage generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eFLUX.1 [dev] \/ Kontext \/ Tools at fp8 native (~15-20 s per 1024x1024 image on single RTX Pro 6000, published reference); SD 3.5 Large; HunyuanImage-2.1 (17B native 2K); HunyuanImage-3.0 80B\/13B MoE; AuraFlow; OmniGen; 4x concurrent ComfyUI workers.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVideo generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eWan 2.2 T2V-A14B \/ I2V-A14B dual-expert bf16; HunyuanVideo 13B bf16 both experts; Open-Sora 2.0 (11B) bf16; CogVideoX-5B; Mochi-1; LTX-Video; Pyramid Flow; SVD \/ SV3D \/ SV4D; NVIDIA Cosmos Predict 2.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eAudio \/ Speech \/ TTS\u003c\/h3\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eASR:\u003c\/strong\u003e Whisper v3 large \/ turbo; Parakeet-TDT; Canary; Qwen3-ASR; SenseVoice\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eTTS:\u003c\/strong\u003e CosyVoice 2\/3; Kokoro; Stable Audio Open; XTTS v2; Step-Audio-EditX\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eRealtime \/ S2S:\u003c\/strong\u003e Kyutai Moshi; Step-Audio 2 mini \/ R1; Qwen2.5-Omni-7B\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eMusic \/ SFX:\u003c\/strong\u003e MusicGen \/ AudioGen \/ Bark \/ SeamlessM4T\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eMulti-model \/ multi-tenant serving\u003c\/h3\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eDeepSeek V3 Q3 + concurrent 70B + FLUX.1 + Whisper all resident\u003c\/li\u003e\n\u003cli\u003e4-way tensor-parallel on 350-400B class at Q4\u003c\/li\u003e\n\u003cli\u003ePer-card tenant isolation — one 96 GB Llama 3.3 70B bf16 per card, 4 independent inference silos\u003c\/li\u003e\n\u003cli\u003eMulti-model RAG: reader + reranker + vision + embedder all on one host\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eTarget workloads\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eFrontier open-weight inference for a lab \/ R\u0026amp;D team where acoustic budget matters\u003c\/li\u003e\n\u003cli\u003eSmall-team server room without dedicated datacenter airflow — Max-Q self-cooling tolerates open-rack placement\u003c\/li\u003e\n\u003cli\u003eOffice-adjacent AI workstation for a specialist team (ML research, agentic tools)\u003c\/li\u003e\n\u003cli\u003efp8-native serving (DeepSeek \/ R1 \/ Hunyuan) in lab settings\u003c\/li\u003e\n\u003cli\u003e4-tenant per-card isolation workload with noise budget\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003ePublished performance references\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#0d0d0d;color:#fff;border-radius:12px;padding:24px;margin-bottom:24px\"\u003e\n\u003cp style=\"margin:0 0 4px 0;font-size:13px;color:#888;text-transform:uppercase;letter-spacing:1px\"\u003eExternal references | Same silicon as Server Edition | Not measured on Kentino hardware\u003c\/p\u003e\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-top:16px;font-size:14px\"\u003e\n\u003cthead\u003e\u003ctr style=\"border-bottom:1px solid #333\"\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eBenchmark\u003c\/th\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eResult\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eRTX Pro 6000 per-card INT8 TOPS\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e2 000 TOPS\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eRTX Pro 6000 memory bandwidth\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~1 800 GB\/s per card\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003evLLM — DeepSeek V3 Q3 on 4x Blackwell PCIe (single)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~30-40 tok\/s\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003evLLM — DeepSeek V3 Q3 on 4x Blackwell PCIe (batch-8)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e~200 tok\/s aggregate\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eSGLang — Llama 4 Maverick Q4 on 4x Blackwell (single)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~45-55 tok\/s\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003ellama.cpp — Qwen3-Coder-480B Q3 on 4x Blackwell (single)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~18-25 tok\/s\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eFLUX.1 [dev] fp8 on single RTX Pro 6000\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~1.8 s per 1024x1024 image\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\u003cp style=\"margin:12px 0 0 0;font-size:13px;color:#666\"\u003eKentino will publish first-party numbers after initial customer build.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eNot ideal for\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eProper datacenter rack deployments with established hot-aisle airflow — choose the passive Server Edition (K-AI 384 Rome RTXPro6000) instead: same silicon, simpler mechanically\u003c\/li\u003e\n\u003cli\u003eSingle-user workloads up to 70B (4x RTX 5090 is materially cheaper for 128 GB pool)\u003c\/li\u003e\n\u003cli\u003eFrontier training from scratch (no NVLink)\u003c\/li\u003e\n\u003cli\u003eFull DeepSeek V3 Q4 on-card (~404 GB) — upgrade to 6x RTX Pro 6000 \/ 576 GB\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWarranty and lead time\u003c\/h2\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:24px\"\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e3 years\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003eNVIDIA OEM GPU warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e2 years\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003eparts warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e1 year\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elabor warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e10-28 days\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elead time\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"font-size:14px;color:#666\"\u003eBuild includes assembly, BIOS configuration, driver install, burn-in, memtest, and functional verification. Lead time depends on component availability, confirmed at order.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eRecommended add-ons\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eUpgrade RAM to 512 GB DDR4 (add 2x 64 GB — 2 DIMM slots open) for RAM-spill headroom on Q3 frontier quants\u003c\/li\u003e\n\u003cli\u003e4 TB NVMe Gen4 x4 for frontier-model library (DeepSeek V3 Q3 alone is ~290 GB on disk)\u003c\/li\u003e\n\u003cli\u003eFull 24U rack cabinet with managed PDU + online UPS\u003c\/li\u003e\n\u003cli\u003eAlternative silhouette: passive Server Edition (K-AI 384 Rome RTXPro6000) — same silicon, for datacenter airflow deployments\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003c\/div\u003e\n","brand":"Kentino s.r.o.","offers":[{"title":"Default Title","offer_id":52940336922952,"sku":null,"price":46583.0,"currency_code":"EUR","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0843\/5479\/3800\/files\/kentino-ai-server-4-gpu-topdown_6b2c51b2-25c1-479d-929a-29eebe60e5ef.jpg?v=1776940959"},{"product_id":"k-ai-576-genoa-rtxpro6000-12000tops-6-rtx-pro-6000-blackwell-server-edition-ai-frontier-server","title":"K-AI 576 Genoa RTXPro6000 12000TOPS — 6× RTX Pro 6000 Blackwell Server Edition AI Frontier Server","description":"\u003cdiv style=\"font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;line-height:1.7;color:#1a1a1a\"\u003e\n\n\u003cdiv style=\"background:linear-gradient(135deg,#0d0d0d 0%,#1a1a2e 100%);color:#fff;padding:32px;border-radius:12px;margin-bottom:32px\"\u003e\n\u003cp style=\"font-size:18px;margin:0 0 20px 0;color:#ccc\"\u003eK-AI 576 Genoa RTXPro6000 12000TOPS\u003c\/p\u003e\n\u003cp style=\"font-size:28px;font-weight:700;margin:0 0 16px 0;line-height:1.3\"\u003e576 GB ECC VRAM Frontier Research Server\u003cbr\u003e6x RTX Pro 6000 Server Edition | EPYC Genoa | 12 000 TOPS INT8\u003c\/p\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-top:24px\"\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e12 000\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eTOPS INT8\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e576 GB\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eECC VRAM pool\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003eBCM\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003ePCIe Gen5 switch\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003eFrontier\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eon-prem research\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"margin-top:20px;font-size:15px;color:#aaa\"\u003ePublished external references. Not measured on Kentino hardware.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003cp style=\"font-size:17px;color:#333;margin-bottom:24px\"\u003eA 7U rack-mount frontier-tier inference platform with six NVIDIA RTX Pro 6000 Blackwell Server Edition passive cards pooled to 576 GB ECC VRAM, one AMD EPYC 9354 Genoa CPU (32C\/64T), 768 GB DDR5-4800 ECC (all 12 channels populated), 4 TB NVMe boot, and 5x 1200 W server PSU. On-board Broadcom PCIe Gen5 switch fans out uniformly to all 6 GPU slots. DeepSeek V3 Q4 (~404 GB) comfortable with long context, Kimi-K2 Q2, Mistral Large 3 Q2-Q3 — the full frontier on-prem.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eHardware\u003c\/h2\u003e\n\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-bottom:24px;font-size:15px\"\u003e\n\u003cthead\u003e\u003ctr style=\"background:#0d0d0d;color:#fff\"\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eComponent\u003c\/th\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eDetail\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eGPUs\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e6x NVIDIA RTX Pro 6000 Blackwell Server Edition 96 GB ECC (passive, 600 W, PCIe 5.0 x16, 2000 INT8 TOPS per card)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eVRAM pool\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e576 GB total across 6 cards (no NVLink — P2P over PCIe Gen5 at ~55-60 GB\/s per direction)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCPU\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eAMD EPYC 9354 Genoa (32C\/64T, 280 W, 128x PCIe 5.0 lanes, 12-channel DDR5)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eMotherboard\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eASRock Rack GENOAD8X-2T\/BCM (SP5 Genoa, integrated Broadcom PEX PCIe Gen5 switch, 12x DDR5, 2x 10 GbE, IPMI)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eSystem RAM\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e768 GB DDR5-4800 ECC RDIMM (12x 64 GB — all channels populated, ~460 GB\/s aggregate)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eBoot \/ storage\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e4 TB NVMe M.2 (PCIe 4.0 x4) — sized for frontier checkpoint staging\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003ePower supply\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e5x 1200 W server PSU set (HP-compatible, 6 kW total)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eChassis\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e7U 8-GPU rack-mount, 10 PCIe slot capacity, active Gen5 risers\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCooling\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eSP5 Genoa tower cooler, 8x 120 mm chassis fans, front-to-back datacenter airflow required. Passive GPU cards.\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eNetwork\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eOnboard dual 10 GbE (Intel X550)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:32px\"\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003ePower envelope\u003c\/h3\u003e\n\u003cul style=\"margin:0;padding-left:18px;font-size:14px;color:#444\"\u003e\n\u003cli\u003eGPU draw: 6 x 600 W = 3 600 W\u003c\/li\u003e\n\u003cli\u003eSystem total at full load: ~4 080 W\u003c\/li\u003e\n\u003cli\u003ePSU total: 6 000 W (5x 1200 W) — 32% headroom\u003c\/li\u003e\n\u003cli\u003eNo power-cap required for steady-state inference\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003eLane topology\u003c\/h3\u003e\n\u003cp style=\"margin:0;font-size:14px;color:#444\"\u003eGENOAD8X-2T\/BCM integrates a Broadcom PEX PCIe Gen5 switch on-board. 128 Gen5 lanes from the EPYC Genoa root upstream the switch, which fans out uniformly to all 6 GPU slots at Gen5 x16 end-to-end via active risers. Clean single-root topology — simpler NUMA tuning than dual-socket. No NVLink; P2P at ~55-60 GB\/s per direction.\u003c\/p\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWhat you can run\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#fefaf0;border-left:4px solid #fab400;padding:16px 20px;margin-bottom:24px;border-radius:0 8px 8px 0\"\u003e\n\u003cp style=\"margin:0;font-size:15px;color:#333\"\u003eWith 576 GB of pooled ECC VRAM on Blackwell fp8 native silicon, this server runs the full Chinese + Western open-weight frontier at research-grade quants: DeepSeek V3 Q4 (~404 GB) with long context, Kimi-K2 Q2, Mistral Large 3 Q2-Q3, GLM-5 Q2, Qwen3-Coder-480B Q4.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eLLMs — text \/ reasoning \/ coding\u003c\/h3\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin-bottom:8px\"\u003eChinese frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eDeepSeek V3 \/ R1 \/ V3.1 \/ V3.2\u003c\/strong\u003e at Q4_K_M (~404 GB) comfortable with long context (~5-8 tok\/s single vLLM TP-6, published reference); fp8 native (~670 GB) with RAM spill\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eKimi-K2\u003c\/strong\u003e (Base \/ Instruct \/ Thinking) at Q2_K (~375 GB) comfortable (~5-8 tok\/s single, published reference)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eGLM-5 \/ GLM-5.1\u003c\/strong\u003e (~745B\/44B) at Q2_K (~260 GB) comfortable; Q3 (~420 GB) with RAM spill\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwen3-Coder-480B-A35B\u003c\/strong\u003e at Q4_K_M (~270 GB) with long context\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwen3-235B-A22B\u003c\/strong\u003e at bf16 (~470 GB) or fp8 (~240 GB)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eERNIE-4.5-424B-A47B\u003c\/strong\u003e at Q4 (~240 GB) with full 128k ctx\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eIntern-S1-Pro\u003c\/strong\u003e (1T\/22B active, SAGE) at Q2_K (~325 GB) comfortable\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eHunyuan-Large\u003c\/strong\u003e A52B at Q4 (~220 GB); \u003cstrong\u003eMiniMax-M1\u003c\/strong\u003e at Q4 (~260 GB)\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin:20px 0 8px 0\"\u003eWestern frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eMistral Large 3\u003c\/strong\u003e (675B\/41B MoE, Apache 2.0) at Q2-Q3 (~243-317 GB) comfortable (~20-30 tok\/s single, published reference)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eLlama 4 Maverick\u003c\/strong\u003e (400B\/17B) at Q4_K_M (~232 GB) with long ctx (~45-55 tok\/s single, published reference)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eLlama-3.1-Nemotron Ultra 253B\u003c\/strong\u003e at fp8 (~253 GB) or bf16 with RAM spill\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eGrok-1\u003c\/strong\u003e 314B at Q4 (~182 GB); \u003cstrong\u003eSnowflake Arctic\u003c\/strong\u003e at Q4 (~278 GB)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eDBRX Instruct\u003c\/strong\u003e 132B\/36B at bf16 (~264 GB) or fp8 multi-instance\u003c\/li\u003e\n\u003cli\u003eAll 70-120B class models at bf16 with room to spare\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVision-Language Models\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eQwen3-VL-235B-A22B flagship VLM; InternVL3.5-241B-A28B Q4 (~135 GB); GLM-4.5V \/ 4.6V 106B bf16 (~210 GB); Llama 3.2 90B Vision bf16; Pixtral Large 124B fp8; Molmo 72B bf16.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eImage generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eHunyuanImage-3.0 Instruct tier (3x 80 GB) — fits with headroom; FLUX.1 [dev] \/ [schnell] \/ Kontext multi-instance (~15-20 s per 1024x1024 image on single RTX Pro 6000 fp8, published reference); SD 3.5 Large; SDXL; AuraFlow; OmniGen; HunyuanImage-2.1; Kolors 2.0.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVideo generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eWan 2.2 T2V-A14B \/ I2V-A14B dual-expert MoE bf16 (~54 GB); HunyuanVideo 13B bf16 comfortable; Open-Sora 2.0 (11B) bf16; Mochi-1 (10B) fp16; NVIDIA Cosmos Predict 2 up to 14B; CogVideoX-5B; LTX-Video; Pyramid Flow.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eAudio \/ Speech \/ TTS\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eFull stack resident concurrently: Whisper v3 large, Parakeet-TDT 1.1B, Canary 1B, Moshi 7B realtime, Qwen3-Omni, Step-Audio R1, CosyVoice 3.0, Kokoro, Stable Audio Open.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eMulti-model \/ multi-tenant serving\u003c\/h3\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eDeepSeek V3 Q4 inference + FLUX image + HunyuanVideo + Whisper\/Moshi realtime voice all resident simultaneously\u003c\/li\u003e\n\u003cli\u003eConcurrent 70B tensor-parallel + 235B-MoE on separate PCIe domains via the Broadcom switch\u003c\/li\u003e\n\u003cli\u003eResearch A\/B evaluation: 3 frontier open-weight models resident concurrently\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eTarget workloads\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eFrontier open-weight research lab — on-prem access to DeepSeek V3 \/ Kimi-K2 \/ Mistral Large 3 class without cloud egress\u003c\/li\u003e\n\u003cli\u003eSovereign AI deployment — EU data residency with an Apache 2.0 \/ MIT model stack\u003c\/li\u003e\n\u003cli\u003eEnterprise multi-model RAG + agentic platform — several 200-400B MoE models resident\u003c\/li\u003e\n\u003cli\u003eModel evaluation \/ safety research comparing frontier Chinese vs Western open weights\u003c\/li\u003e\n\u003cli\u003eInference-at-scale for regulated industries requiring air-gap + ECC + PCIe Gen5\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003ePublished performance references\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#0d0d0d;color:#fff;border-radius:12px;padding:24px;margin-bottom:24px\"\u003e\n\u003cp style=\"margin:0 0 4px 0;font-size:13px;color:#888;text-transform:uppercase;letter-spacing:1px\"\u003eExternal references | Not measured on Kentino hardware\u003c\/p\u003e\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-top:16px;font-size:14px\"\u003e\n\u003cthead\u003e\u003ctr style=\"border-bottom:1px solid #333\"\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eBenchmark\u003c\/th\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eResult\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eRTX Pro 6000 per-card INT8 TOPS\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e2 000 TOPS\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003evLLM — DeepSeek V3 Q4 on 6x RTX Pro 6000 (single)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~25-40 tok\/s\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003evLLM — DeepSeek V3 Q4 on 6x RTX Pro 6000 (batch-32)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e200-400 tok\/s aggregate\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eFLUX.1 [dev] fp8 on single RTX Pro 6000\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~15-20 s per 1024x1024 image\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\u003cp style=\"margin:12px 0 0 0;font-size:13px;color:#666\"\u003eExact figures confirmed at PoC stage. Kentino will publish first-party numbers after initial customer build.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eNot ideal for\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eKimi-K2 \/ DeepSeek V3 at Q4 real-speed production serving — step up to the 768 GB Turin dual\u003c\/li\u003e\n\u003cli\u003eTraining from scratch on frontier-class models — no NVLink, PCIe P2P only\u003c\/li\u003e\n\u003cli\u003ePlug-and-play deployment — frontier MoE serving needs a skilled MLOps team\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWarranty and lead time\u003c\/h2\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:24px\"\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e2 years\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003eparts warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e1 year\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elabor warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e10-28 days\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elead time\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"font-size:14px;color:#666\"\u003eBuild includes assembly, BIOS config, driver install, burn-in, memtest, functional verification, and LLM environment setup (vLLM \/ SGLang \/ llama.cpp \/ CUDA 13 stack with fp8 Blackwell kernels). Lead time depends on component availability, confirmed at order.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eRecommended add-ons\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eNVIDIA ConnectX-5 MCX555A-ECAT 100 GbE NIC for multi-node scale-out\u003c\/li\u003e\n\u003cli\u003eSecond 4 TB NVMe for dataset \/ model library\u003c\/li\u003e\n\u003cli\u003eFull 24U rack cabinet with front perforated door\u003c\/li\u003e\n\u003cli\u003eOnline UPS 10 kVA\u003c\/li\u003e\n\u003cli\u003eManaged PDU\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003c\/div\u003e\n","brand":"Kentino s.r.o.","offers":[{"title":"Default Title","offer_id":52940370116936,"sku":null,"price":106069.0,"currency_code":"EUR","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0843\/5479\/3800\/files\/kentino-ai-server-4-gpu-topdown_6b2c51b2-25c1-479d-929a-29eebe60e5ef.jpg?v=1776940959"},{"product_id":"k-ai-576-genoa-rtxpro6000mq-12000tops-6-rtx-pro-6000-blackwell-max-q-ai-frontier-server","title":"K-AI 576 Genoa RTXPro6000MQ 12000TOPS — 6× RTX Pro 6000 Blackwell Max-Q AI Frontier Server","description":"\u003cdiv style=\"font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;line-height:1.7;color:#1a1a1a\"\u003e\n\n\u003cdiv style=\"background:linear-gradient(135deg,#0d0d0d 0%,#1a1a2e 100%);color:#fff;padding:32px;border-radius:12px;margin-bottom:32px\"\u003e\n\u003cp style=\"font-size:18px;margin:0 0 20px 0;color:#ccc\"\u003eK-AI 576 Genoa RTXPro6000MQ 12000TOPS\u003c\/p\u003e\n\u003cp style=\"font-size:28px;font-weight:700;margin:0 0 16px 0;line-height:1.3\"\u003e576 GB ECC VRAM Frontier Server\u003cbr\u003e6x RTX Pro 6000 Max-Q Turbofan | EPYC Genoa | 12 000 TOPS INT8\u003c\/p\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-top:24px\"\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e12 000\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eTOPS INT8\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e576 GB\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eECC VRAM pool\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003eGen5\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eBroadcom switch\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003eQuiet\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eturbofan cooling\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"margin-top:20px;font-size:15px;color:#aaa\"\u003ePublished external references. Not measured on Kentino hardware.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003cp style=\"font-size:17px;color:#333;margin-bottom:24px\"\u003eA 7U rack-mount frontier-tier inference platform with six NVIDIA RTX Pro 6000 Blackwell Max-Q turbofan cards pooled to 576 GB ECC VRAM, one AMD EPYC 9354 Genoa CPU (32C\/64T), 768 GB DDR5-4800 ECC (all 12 channels populated), 4 TB NVMe boot, and 5x 1200 W server PSU. Same silicon and memory pool as the passive Server Edition build — different cooler. The Max-Q turbofan is self-contained per card, runs quieter, and tolerates less strict chassis airflow. Identical model envelope to its passive sibling.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eHardware\u003c\/h2\u003e\n\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-bottom:24px;font-size:15px\"\u003e\n\u003cthead\u003e\u003ctr style=\"background:#0d0d0d;color:#fff\"\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eComponent\u003c\/th\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eDetail\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eGPUs\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e6x NVIDIA RTX Pro 6000 Blackwell Max-Q 96 GB ECC (turbofan blower, 600 W TDP spec, PCIe 5.0 x16, 2000 INT8 TOPS per card)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eVRAM pool\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e576 GB total across 6 cards (no NVLink — P2P over PCIe Gen5 at ~55-60 GB\/s per direction)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCPU\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eAMD EPYC 9354 Genoa (32C\/64T, 280 W, 128x PCIe 5.0 lanes, 12-channel DDR5)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eMotherboard\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eASRock Rack GENOAD8X-2T\/BCM (SP5 Genoa, integrated Broadcom PEX PCIe Gen5 switch, 12x DDR5, 2x 10 GbE, IPMI)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eSystem RAM\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e768 GB DDR5-4800 ECC RDIMM (12x 64 GB — all channels populated, ~460 GB\/s aggregate)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eBoot \/ storage\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e4 TB NVMe M.2 (PCIe 4.0 x4) — sized for frontier checkpoint staging\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003ePower supply\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e5x 1200 W server PSU set (HP-compatible, 6 kW total)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eChassis\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e7U 8-GPU rack-mount, 10 PCIe slot capacity, active Gen5 risers\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCooling\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eSP5 Genoa tower cooler + 8x 120 mm chassis fans. Per-GPU turbofan blowers are self-contained — datacenter airflow recommended but not strictly required. Quieter for lab environments.\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eNetwork\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eOnboard dual 10 GbE (Intel X550)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:32px\"\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003ePower envelope\u003c\/h3\u003e\n\u003cul style=\"margin:0;padding-left:18px;font-size:14px;color:#444\"\u003e\n\u003cli\u003eGPU draw (spec): 6 x 600 W = 3 600 W\u003c\/li\u003e\n\u003cli\u003eSystem total at spec full load: ~4 080 W\u003c\/li\u003e\n\u003cli\u003ePSU total: 6 000 W (5x 1200 W) — 32% headroom\u003c\/li\u003e\n\u003cli\u003eMax-Q cards typically run 520-550 W sustained — real-world headroom above 20%\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003eCooling (Max-Q differentiator)\u003c\/h3\u003e\n\u003cp style=\"margin:0;font-size:14px;color:#444\"\u003eEach card pulls air front-to-back via its own blower — self-contained per card. Tolerates mixed-rack \/ open-cabinet deployment. Quieter than an equivalent axial-fan stack. Max-Q firmware profile favours lower sustained power (520-550 W typical in inference). Recommended: cabinet with front perforated door and clear rear exhaust path.\u003c\/p\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWhat you can run\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#fefaf0;border-left:4px solid #fab400;padding:16px 20px;margin-bottom:24px;border-radius:0 8px 8px 0\"\u003e\n\u003cp style=\"margin:0;font-size:15px;color:#333\"\u003eIdentical to the Server Edition sibling — same silicon, same 576 GB pool. DeepSeek V3 Q4 (~404 GB) with long context, Kimi-K2 Q2, Mistral Large 3 Q2-Q3, GLM-5 Q2, Qwen3-Coder-480B Q4.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eLLMs — text \/ reasoning \/ coding\u003c\/h3\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin-bottom:8px\"\u003eChinese frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eDeepSeek V3 \/ R1 \/ V3.1 \/ V3.2\u003c\/strong\u003e at Q4_K_M (~404 GB) comfortable with long context (~5-8 tok\/s single vLLM TP-6, published reference); fp8 native (~670 GB) with RAM spill\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eKimi-K2\u003c\/strong\u003e (Base \/ Instruct \/ Thinking) at Q2_K (~375 GB) comfortable (~5-8 tok\/s single, published reference)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eGLM-5 \/ GLM-5.1\u003c\/strong\u003e (~745B\/44B) at Q2_K (~260 GB); Q3 (~420 GB) with RAM spill\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwen3-Coder-480B-A35B\u003c\/strong\u003e at Q4_K_M (~270 GB) with long context\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwen3-235B-A22B\u003c\/strong\u003e at bf16 (~470 GB) or fp8 (~240 GB)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eERNIE-4.5-424B-A47B\u003c\/strong\u003e at Q4 (~240 GB) with 128k ctx\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eIntern-S1-Pro\u003c\/strong\u003e at Q2_K (~325 GB); \u003cstrong\u003eHunyuan-Large\u003c\/strong\u003e at Q4 (~220 GB)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eMiniMax-Text-01 \/ M1\u003c\/strong\u003e at Q4 (~260 GB)\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin:20px 0 8px 0\"\u003eWestern frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eMistral Large 3\u003c\/strong\u003e at Q2-Q3 (~243-317 GB) comfortable (~20-30 tok\/s single, published reference)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eLlama 4 Maverick\u003c\/strong\u003e at Q4_K_M (~232 GB) with long ctx (~45-55 tok\/s single, published reference)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eLlama-3.1-Nemotron Ultra 253B\u003c\/strong\u003e at fp8 (~253 GB)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eGrok-1\u003c\/strong\u003e 314B at Q4 (~182 GB); \u003cstrong\u003eSnowflake Arctic\u003c\/strong\u003e at Q4 (~278 GB)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eDBRX Instruct\u003c\/strong\u003e 132B\/36B at bf16 (~264 GB) or fp8\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVision-Language Models\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eQwen3-VL-235B-A22B; InternVL3.5-241B-A28B Q4; GLM-4.5V \/ 4.6V 106B bf16; Llama 3.2 90B Vision bf16; Pixtral Large 124B fp8; Molmo 72B bf16.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eImage generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eHunyuanImage-3.0 Instruct; FLUX.1 [dev] \/ [schnell] \/ Kontext multi-instance (~15-20 s per 1024x1024 image, published reference); SD 3.5 Large; SDXL; AuraFlow; OmniGen; HunyuanImage-2.1; Kolors 2.0.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVideo generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eWan 2.2 T2V-A14B dual-expert MoE bf16; HunyuanVideo 13B bf16; Open-Sora 2.0 (11B); Mochi-1 (10B); NVIDIA Cosmos Predict 2 up to 14B; CogVideoX-5B; LTX-Video; Pyramid Flow.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eAudio \/ Speech \/ TTS\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eFull stack resident: Whisper v3 large, Parakeet-TDT 1.1B, Canary 1B, Moshi 7B realtime, Qwen3-Omni, Step-Audio R1, CosyVoice 3.0, Kokoro, Stable Audio Open.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eMulti-model \/ multi-tenant serving\u003c\/h3\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eDeepSeek V3 Q4 + FLUX + HunyuanVideo + Whisper\/Moshi realtime all resident\u003c\/li\u003e\n\u003cli\u003eConcurrent 70B tensor-parallel + 235B-MoE on separate PCIe domains\u003c\/li\u003e\n\u003cli\u003e3 frontier models resident for A\/B evaluation\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eTarget workloads\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eFrontier open-weight research lab with mixed \/ non-ideal airflow infra\u003c\/li\u003e\n\u003cli\u003eColocation \/ private-datacenter where per-card turbofan is operationally simpler than full passive airflow\u003c\/li\u003e\n\u003cli\u003eSovereign AI deployment with Apache 2.0 \/ MIT model stack\u003c\/li\u003e\n\u003cli\u003eEnterprise multi-model RAG + agentic platform\u003c\/li\u003e\n\u003cli\u003eLab environments with open racks\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003ePublished performance references\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#0d0d0d;color:#fff;border-radius:12px;padding:24px;margin-bottom:24px\"\u003e\n\u003cp style=\"margin:0 0 4px 0;font-size:13px;color:#888;text-transform:uppercase;letter-spacing:1px\"\u003eExternal references | Same silicon as Server Edition | Not measured on Kentino hardware\u003c\/p\u003e\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-top:16px;font-size:14px\"\u003e\n\u003cthead\u003e\u003ctr style=\"border-bottom:1px solid #333\"\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eBenchmark\u003c\/th\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eResult\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eRTX Pro 6000 per-card INT8 TOPS\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e2 000 TOPS\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003evLLM — DeepSeek V3 Q4 on 6x RTX Pro 6000 (single)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~25-40 tok\/s\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003evLLM — DeepSeek V3 Q4 on 6x RTX Pro 6000 (batch-32)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e200-400 tok\/s aggregate\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eFLUX.1 [dev] fp8 on single RTX Pro 6000\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~15-20 s per 1024x1024 image\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\u003cp style=\"margin:12px 0 0 0;font-size:13px;color:#666\"\u003eExact figures confirmed at PoC stage. Kentino will publish first-party numbers after initial customer build.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eNot ideal for\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eKimi-K2 \/ DeepSeek V3 at Q4 real-speed production serving — step up to K-AI 768 TurinDual RTXPro6000MQ\u003c\/li\u003e\n\u003cli\u003eTraining from scratch on frontier-class models — no NVLink\u003c\/li\u003e\n\u003cli\u003ePlug-and-play deployment — frontier MoE serving needs a skilled MLOps team\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWarranty and lead time\u003c\/h2\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:24px\"\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e2 years\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003eparts warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e1 year\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elabor warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e10-28 days\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elead time\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"font-size:14px;color:#666\"\u003eBuild includes assembly, BIOS config, driver install, burn-in, memtest, functional verification, and LLM environment setup. Lead time depends on component availability, confirmed at order.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eRecommended add-ons\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eNVIDIA ConnectX-5 MCX555A-ECAT 100 GbE NIC for multi-node scale-out\u003c\/li\u003e\n\u003cli\u003eSecond 4 TB NVMe for dataset \/ model library\u003c\/li\u003e\n\u003cli\u003eFull 24U rack cabinet with front perforated door\u003c\/li\u003e\n\u003cli\u003eOnline UPS 10 kVA\u003c\/li\u003e\n\u003cli\u003eManaged PDU\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003c\/div\u003e\n","brand":"Kentino s.r.o.","offers":[{"title":"Default Title","offer_id":52942285046088,"sku":null,"price":106069.0,"currency_code":"EUR","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0843\/5479\/3800\/files\/kentino-ai-server-4-gpu-topdown_6b2c51b2-25c1-479d-929a-29eebe60e5ef.jpg?v=1776940959"},{"product_id":"k-ai-768-turindual-rtxpro6000mq-16000tops-8-rtx-pro-6000-blackwell-max-q-ai-frontier-server-dual-turin","title":"K-AI 768 TurinDual RTXPro6000MQ 16000TOPS — 8× RTX Pro 6000 Blackwell Max-Q AI Frontier Server (Dual Turin)","description":"\u003cdiv style=\"font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;line-height:1.7;color:#1a1a1a\"\u003e\n\n\u003cdiv style=\"background:linear-gradient(135deg,#0d0d0d 0%,#1a1a2e 100%);color:#fff;padding:32px;border-radius:12px;margin-bottom:32px\"\u003e\n\u003cp style=\"font-size:18px;margin:0 0 20px 0;color:#ccc\"\u003eK-AI 768 TurinDual RTXPro6000MQ 16000TOPS\u003c\/p\u003e\n\u003cp style=\"font-size:28px;font-weight:700;margin:0 0 16px 0;line-height:1.3\"\u003e768 GB ECC VRAM Frontier Flagship\u003cbr\u003e8x RTX Pro 6000 Max-Q | Dual EPYC Turin | 16 000 TOPS INT8\u003c\/p\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-top:24px\"\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e16 000\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eTOPS INT8\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e768 GB\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eECC VRAM pool\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003eGen5\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003ePCIe end-to-end\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003eFlagship\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003efrontier multi-tenant\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"margin-top:16px;font-size:13px;color:#777\"\u003eCPU pricing finalized at order — Turin 9005-series market moves weekly in Q2 2026.\u003c\/p\u003e\n\u003cp style=\"margin-top:12px;font-size:15px;color:#aaa\"\u003ePublished external references. Not measured on Kentino hardware.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003cp style=\"font-size:17px;color:#333;margin-bottom:24px\"\u003eTop of the Kentino AI server lineup. A 7U rack-mount flagship frontier-tier inference platform with eight NVIDIA RTX Pro 6000 Blackwell Max-Q turbofan cards pooled to 768 GB ECC VRAM, two AMD EPYC Turin 9005-series CPUs (Zen5c, SP5), 1.5 TB DDR5-4800 ECC (all 24 channels populated), 4 TB NVMe boot, and 5x 1200 W server PSU. PCIe Gen5 end-to-end. DeepSeek V3 fp8 native (~670 GB) on-card. Kimi-K2 Q4-Q5. 4 frontier-class models resident simultaneously.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eHardware\u003c\/h2\u003e\n\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-bottom:24px;font-size:15px\"\u003e\n\u003cthead\u003e\u003ctr style=\"background:#0d0d0d;color:#fff\"\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eComponent\u003c\/th\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eDetail\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eGPUs\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e8x NVIDIA RTX Pro 6000 Blackwell Max-Q 96 GB ECC (turbofan, 600 W TDP spec, PCIe 5.0 x16, 2000 INT8 TOPS\/card, fp8 native)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eVRAM pool\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e768 GB total across 8 cards (no NVLink — P2P over PCIe Gen5 at ~55-60 GB\/s within socket, cross-socket via CPU interconnect)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCPU\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e2x AMD EPYC Turin 9005-series (Zen5c, SP5, PCIe 5.0) — quote-pending, exact SKU confirmed at order\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eMotherboard\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eASRock Rack TURIN2D24XGM\/500W (dual SP5 Turin, PCIe 5.0, 24x DDR5, 2x 10 GbE, IPMI)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eSystem RAM\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e1.5 TB DDR5-4800 ECC RDIMM (24x 64 GB — all 24 channels populated, ~920 GB\/s aggregate)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eBoot \/ storage\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e4 TB NVMe M.2 (PCIe 4.0 x4) — sized for frontier checkpoints\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003ePower supply\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e5x 1200 W server PSU set (6 kW total)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eChassis\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e7U 8-GPU rack-mount, 10 PCIe slot capacity, active Gen5 risers\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCooling\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e2x SP5 Turin tower coolers + 8x 120 mm Martech chassis fans. Per-GPU turbofan blowers self-contained.\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eNetwork\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eOnboard dual 10 GbE (Intel X550)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:32px\"\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003ePower envelope\u003c\/h3\u003e\n\u003cul style=\"margin:0;padding-left:18px;font-size:14px;color:#444\"\u003e\n\u003cli\u003eGPU draw (spec): 8 x 600 W = 4 800 W\u003c\/li\u003e\n\u003cli\u003eCPU draw: 2 x 360 W = 720 W (Turin mid-tier estimate)\u003c\/li\u003e\n\u003cli\u003eSystem total at spec full load: ~5 720 W\u003c\/li\u003e\n\u003cli\u003ePSU total: 6 000 W — ~4.7% raw headroom at spec\u003c\/li\u003e\n\u003cli\u003eReal-world: Max-Q sustains 520-550 W in inference, lifting sustained headroom to ~20%+\u003c\/li\u003e\n\u003cli\u003eFirmware power-cap at 520 W available for guaranteed headroom\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003eLane topology\u003c\/h3\u003e\n\u003cp style=\"margin:0;font-size:14px;color:#444\"\u003eDual Turin provides 2x 128 PCIe Gen5 lanes. TURIN2D24XGM\/500W routes 8 GPU slots direct-attached to the CPUs at Gen5 x16 via active risers — 4 slots per CPU root. No PCIe switch in the GPU path — clean dual-root topology. NUMA tuning required for optimal cross-socket peer-to-peer. No NVLink; P2P at ~55-60 GB\/s per direction within socket.\u003c\/p\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWhat you can run\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#fefaf0;border-left:4px solid #fab400;padding:16px 20px;margin-bottom:24px;border-radius:0 8px 8px 0\"\u003e\n\u003cp style=\"margin:0;font-size:15px;color:#333\"\u003eWith 768 GB of pooled ECC VRAM — the top of the Kentino envelope — this server runs DeepSeek V3 fp8 native (~670 GB) on-card, Kimi-K2 Q4-Q5 (~630 GB) comfortable, and the defining use case: 4 frontier-class models resident simultaneously for multi-tenant production serving.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eLLMs — text \/ reasoning \/ coding\u003c\/h3\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin-bottom:8px\"\u003eChinese frontier at production quants\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eKimi-K2\u003c\/strong\u003e (Base \/ Instruct \/ Thinking) at Q4_K_M \/ Q5_K_M (~630 GB) comfortable (~15-25 tok\/s single, published reference) — flagship Chinese frontier on a single box at production quants\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eDeepSeek V3 \/ R1 \/ V3.1 \/ V3.2\u003c\/strong\u003e at fp8 native (~670 GB) on-card (~30-50 tok\/s single, published reference) — Blackwell fp8 tensor cores run this natively at speed\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eDeepSeek V3\u003c\/strong\u003e at Q4_K_M (~404 GB) with multiple concurrent large-batch serving instances\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eGLM-5 \/ GLM-5.1\u003c\/strong\u003e (~745B\/44B) at Q3-Q4 (~420-560 GB) comfortable on-card\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eIntern-S1-Pro\u003c\/strong\u003e (1T\/22B active, SAGE) at Q3-Q4 (~440-580 GB) comfortable\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwen3-Coder-480B-A35B\u003c\/strong\u003e at Q5-Q6 (~340-400 GB) with 1M ctx\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwen3-235B-A22B\u003c\/strong\u003e at bf16 (~470 GB) with generous KV for long context\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eERNIE-4.5-424B-A47B\u003c\/strong\u003e at Q6 (~360 GB); \u003cstrong\u003eHunyuan-Large\u003c\/strong\u003e at fp8 (~390 GB)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eMiniMax-Text-01 \/ M1\u003c\/strong\u003e at Q5-Q6 (~325-390 GB)\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin:20px 0 8px 0\"\u003eWestern frontier at production quants\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eMistral Large 3\u003c\/strong\u003e (675B\/41B MoE, Apache 2.0) at Q3-Q4 (~317-404 GB) comfortable (~20-30 tok\/s single, published reference)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eLlama 4 Maverick\u003c\/strong\u003e (400B\/17B, 128 experts) at Q5-Q6 (~290-350 GB)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eLlama-3.1-Nemotron Ultra 253B\u003c\/strong\u003e at bf16 (~506 GB) on-card\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eSnowflake Arctic\u003c\/strong\u003e at Q5-Q6 (~350-420 GB); \u003cstrong\u003eGrok-1\u003c\/strong\u003e at Q5-Q6 (~225-270 GB)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eDBRX Instruct\u003c\/strong\u003e 132B\/36B at bf16 (~264 GB) multi-instance\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVision-Language Models\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eQwen3-VL-235B-A22B flagship VLM with long context; InternVL3.5-241B-A28B at bf16 (~482 GB); GLM-4.5V \/ 4.6V 106B bf16 multi-instance; Llama 3.2 90B Vision bf16 multi-instance; Pixtral Large 124B bf16; Molmo 72B bf16 multi-instance.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eImage generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eHunyuanImage-3.0 Instruct concurrent instances; FLUX.1 multi-instance (~15-20 s per 1024x1024 image, published reference); SD 3.5 Large; SDXL; AuraFlow; OmniGen; HunyuanImage-2.1; Kolors 2.0 — full Chinese + Western image stack resident concurrent.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVideo generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eWan 2.2 T2V-A14B \/ I2V-A14B — many concurrent streams; HunyuanVideo 13B bf16 multiple concurrent streams; Open-Sora 2.0 (11B) multi-instance; Mochi-1 (10B) multi-instance; NVIDIA Cosmos Predict 2 up to 14B.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eAudio \/ Speech \/ TTS\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eFull stack resident at batch: Whisper v3 large, Parakeet-TDT, Canary 1B, Moshi 7B realtime, Qwen3-Omni, Step-Audio R1, CosyVoice 3.0, Kokoro, Stable Audio Open.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eMulti-model \/ multi-tenant serving (the defining use case)\u003c\/h3\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eMulti-tenant frontier production:\u003c\/strong\u003e 4 frontier-class models resident simultaneously — e.g. DeepSeek V3 fp8 + Kimi-K2 Q4 + Mistral Large 3 Q3 + Qwen3-Coder-480B Q5 — with partitioned VRAM and per-tenant SLOs\u003c\/li\u003e\n\u003cli\u003eConcurrent fp8-native Blackwell inference (DeepSeek V3 \/ R1 family, Hunyuan fp8) + quantized serving on separate PCIe domains\u003c\/li\u003e\n\u003cli\u003eResearch A\/B across 4-5 frontier open-weight models at research-grade quants\u003c\/li\u003e\n\u003cli\u003eAgentic platform with a 400B+ primary + multiple 30-70B specialists resident\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eTarget workloads\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eMulti-tenant frontier open-weight production — multiple frontier models resident concurrently with per-tenant isolation\u003c\/li\u003e\n\u003cli\u003eSovereign frontier AI deployment — on-prem DeepSeek V3 fp8 \/ Kimi-K2 \/ Mistral Large 3 access, EU data residency\u003c\/li\u003e\n\u003cli\u003eFrontier research lab with A\/B evaluation across 4+ frontier open-weight models at research-grade quants\u003c\/li\u003e\n\u003cli\u003eEnterprise agentic platform where 400B+ MoE drives tools + multiple specialist models\u003c\/li\u003e\n\u003cli\u003eAir-gapped regulated-industry inference at frontier scale with ECC + PCIe Gen5\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003ePublished performance references\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#0d0d0d;color:#fff;border-radius:12px;padding:24px;margin-bottom:24px\"\u003e\n\u003cp style=\"margin:0 0 4px 0;font-size:13px;color:#888;text-transform:uppercase;letter-spacing:1px\"\u003eExternal references | Not measured on Kentino hardware\u003c\/p\u003e\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-top:16px;font-size:14px\"\u003e\n\u003cthead\u003e\u003ctr style=\"border-bottom:1px solid #333\"\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eBenchmark\u003c\/th\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eResult\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eRTX Pro 6000 per-card INT8 TOPS\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e2 000 TOPS\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003evLLM — DeepSeek V3 fp8 on 8x RTX Pro 6000 (single)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~30-50 tok\/s\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003evLLM — DeepSeek V3 fp8 on 8x RTX Pro 6000 (batch-32)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e300-500 tok\/s aggregate\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eKimi-K2 Q4 serving on 8x RTX Pro 6000 (single)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~15-25 tok\/s\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eFLUX.1 [dev] fp8 on single RTX Pro 6000\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~15-20 s per 1024x1024 image\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\u003cp style=\"margin:12px 0 0 0;font-size:13px;color:#666\"\u003eExact figures confirmed at PoC stage. Kentino will publish first-party numbers after initial customer build.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eNot ideal for\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eBudget-conscious deployments — flagship SKU at flagship price\u003c\/li\u003e\n\u003cli\u003eTraining from scratch on frontier-class models — no NVLink, PCIe P2P only (for training at this scale H100\/H200 SXM or GB200 NVLink fabric is the right tool)\u003c\/li\u003e\n\u003cli\u003ePlug-and-play deployment — frontier multi-tenant MoE serving requires a skilled MLOps team\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWarranty and lead time\u003c\/h2\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:24px\"\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e2 years\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003eparts warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e1 year\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elabor warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e10-28 days\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elead time\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"font-size:14px;color:#666\"\u003eBuild includes assembly, BIOS config, driver install, burn-in, memtest, functional verification, NUMA tuning, and LLM environment setup (vLLM \/ SGLang \/ llama.cpp \/ CUDA 13 stack with fp8 Blackwell kernels). Lead time depends on component availability, confirmed at order.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eRecommended add-ons\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eNVIDIA ConnectX-5 MCX555A-ECAT or ConnectX-7 Gen5 100 GbE NIC for multi-node scale-out\u003c\/li\u003e\n\u003cli\u003eMellanox ConnectX-6 25 GbE SFP28 for datacenter fabric\u003c\/li\u003e\n\u003cli\u003eSecond 4 TB NVMe for dataset \/ model library (frontier checkpoints are large — Kimi-K2 bf16 alone is ~1 TB)\u003c\/li\u003e\n\u003cli\u003eFull 24U rack cabinet with front perforated door and managed PDU\u003c\/li\u003e\n\u003cli\u003eOnline UPS 10 kVA (graceful shutdown on power event)\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003c\/div\u003e\n","brand":"Kentino s.r.o.","offers":[{"title":"Default Title","offer_id":52942290059592,"sku":null,"price":0.0,"currency_code":"EUR","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0843\/5479\/3800\/files\/kentino-ai-server-4-gpu-topdown_6b2c51b2-25c1-479d-929a-29eebe60e5ef.jpg?v=1776940959"},{"product_id":"k-ai-192-turin2u-rtxpro6000-4000tops-2-rtx-pro-6000-blackwell-server-edition-2u-turin-sp5","title":"K-AI 192 Turin2U RTXPro6000 4000TOPS — 2× RTX Pro 6000 Blackwell Server Edition — 2U Turin SP5","description":"\u003cdiv style=\"font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif;line-height:1.7;color:#1a1a1a\"\u003e\n\n\u003cdiv style=\"background:linear-gradient(135deg,#0d0d0d 0%,#1a1a2e 100%);color:#fff;padding:32px;border-radius:12px;margin-bottom:32px\"\u003e\n\u003cp style=\"font-size:18px;margin:0 0 20px 0;color:#ccc\"\u003eK-AI 192 Turin2U RTXPro6000 4000TOPS\u003c\/p\u003e\n\u003cp style=\"font-size:28px;font-weight:700;margin:0 0 16px 0;line-height:1.3\"\u003e192 GB ECC Blackwell Flagship Pair\u003cbr\u003e2x RTX Pro 6000 Server Edition | EPYC Turin SP5 | 4 000 TOPS INT8\u003c\/p\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-top:24px\"\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e4 000\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eINT8 TOPS\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e192 GB\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eECC VRAM\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003eBlackwell\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003efp8 native\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"background:rgba(250,180,0,0.15);border:1px solid #fab400;border-radius:8px;padding:14px 16px;text-align:center;flex:1;min-width:100px\"\u003e\n\u003cdiv style=\"font-size:26px;font-weight:800;color:#fab400\"\u003e2-card\u003c\/div\u003e\n\u003cdiv style=\"font-size:11px;color:#ccc;text-transform:uppercase;letter-spacing:1px\"\u003eminimal TP\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"margin-top:20px;font-size:15px;color:#aaa\"\u003eTwo passive RTX Pro 6000 Blackwell Server Edition cards -- 96 GB ECC each. Less tensor-parallel overhead than 4- or 8-card builds. Datacenter flagship pair on a Gen5\/DDR5 2U platform with genuine 1+1 redundant power.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003cp style=\"font-size:17px;color:#333;margin-bottom:24px\"\u003eA 2U rack-mount inference server with two passive RTX Pro 6000 Blackwell Server Edition cards (96 GB ECC GDDR7 per card), one AMD EPYC 9335 Turin CPU (32C\/64T, 3.0\/4.4 GHz), 512 GB DDR5-4800 ECC, 5.76 TB datacenter Gen5 NVMe, and a 1+1 redundant 2.7 kW 80+ Platinum CRPS power supply. Starting from €56 600 ex VAT. For 70B dense bf16 and mid-size MoE, fewer big cards beat more small cards -- two-card tensor parallelism has minimal communication overhead, and each 96 GB card carries a complete copy of most models.\u003c\/p\u003e\n\n\u003cp style=\"font-size:17px;color:#333;margin-bottom:32px\"\u003eThe same 192 GB Blackwell pair as our 4U Rome build, in a 2U rack-dense ASRock chassis with full Gen5 host-side, DDR5-4800 memory, and a genuine 1+1 redundant 2.7 kW Platinum CRPS power supply. Pick this build when rack density matters, when your grant or procurement spec mandates a modern PCIe 5.0 \/ DDR5 platform, or when redundant power is a requirement rather than an upsell.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eHardware\u003c\/h2\u003e\n\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-bottom:24px;font-size:15px\"\u003e\n\u003cthead\u003e\u003ctr style=\"background:#0d0d0d;color:#fff\"\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eComponent\u003c\/th\u003e\n\u003cth style=\"padding:12px 16px;text-align:left\"\u003eDetail\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eGPUs\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e2x NVIDIA RTX Pro 6000 Blackwell Server Edition 96 GB ECC GDDR7 (passive, 600 W, PCIe 5.0 x16, dual-slot)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eVRAM pool\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e192 GB ECC (96 GB x 2) -- each card holds a 70B bf16 model standalone\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCPU\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eAMD EPYC 9335 Turin (32C\/64T, 3.0\/4.4 GHz, 210 W, SP5, 128x PCIe 5.0 lanes, Zen5c, 256 MB L3)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eMotherboard\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eASRock Rack 2U4G-GENOA\/M3 (SP5, 4x PCIe 5.0 x16 dual-slot GPU, 8x DDR5 1DPC, OCP 3.0, IPMI AST2600)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eSystem RAM\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e512 GB DDR5-4800 ECC RDIMM (8x 64 GB, 1DPC fully populated -- max bandwidth configuration)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eBoot \/ storage\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eKioxia CD8-P 3.84 TB Gen5 U.3 (hot-tier, 1 DWPD, ~12 GB\/s read) + Kioxia CD8-P 1.92 TB Gen5 U.3 (boot OS tier) -- 5.76 TB total datacenter Gen5 NVMe\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003ePower supply\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e1+1 redundant 2.7 kW 80+ Platinum CRPS (2x 1350 W at 230 V) -- genuine N+1 redundancy; one PSU sustains full inference load\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eChassis\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003e2U rack-mount with front-to-back directed airflow (80 mm high-static-pressure fans). 24\/7-capable.\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"background:#f8f8f8\"\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eCooling\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eSP5 active CPU heatsink + 3x 80x38 mm front intake + 1x 80x80 mm rear exhaust (designed for 4x passive GPU thermal load; 2-card layout provides ample thermal headroom)\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 16px;font-weight:600\"\u003eNetwork\u003c\/td\u003e\n\u003ctd style=\"padding:10px 16px\"\u003eIntel X710-T2L PCIe dual 10GBASE-T + OCP 3.0 slot available for 25\/100 GbE upgrade\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:32px\"\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003ePower envelope\u003c\/h3\u003e\n\u003cul style=\"margin:0;padding-left:18px;font-size:14px;color:#444\"\u003e\n\u003cli\u003eGPU draw: 2x 600 W = 1 200 W\u003c\/li\u003e\n\u003cli\u003eSystem total at full load: ~1 510 W\u003c\/li\u003e\n\u003cli\u003ePSU config: 1+1 redundant CRPS, 2x 1350 W at 230 V (2 700 W total)\u003c\/li\u003e\n\u003cli\u003eHeadroom: 44.1 % under typical inference load\u003c\/li\u003e\n\u003cli\u003eGenuine N+1 redundancy -- one PSU sustains full inference load; no single-PSU failure risk\u003c\/li\u003e\n\u003c\/ul\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:250px;background:#f4f4f4;border-radius:8px;padding:20px\"\u003e\n\u003ch3 style=\"font-size:16px;font-weight:700;margin:0 0 12px 0\"\u003eLane topology\u003c\/h3\u003e\n\u003cp style=\"margin:0;font-size:14px;color:#444\"\u003ePCIe Gen5 x16 end-to-end -- both host and card native Gen5. Direct root-complex connection, no PCIe switch. One PCIe 5.0 x16 single-slot + one PCIe 5.0 x8 slot remain available (NIC occupies the x8 slot). No NVLink -- inter-GPU peer-to-peer via PCIe. Gen5 bandwidth eliminates the Gen4 host-cap present in the 4U Rome sibling.\u003c\/p\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWhat you can run\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#fefaf0;border-left:4px solid #fab400;padding:16px 20px;margin-bottom:24px;border-radius:0 8px 8px 0\"\u003e\n\u003cp style=\"margin:0;font-size:15px;color:#333\"\u003eWith 192 GB ECC VRAM on just two Blackwell cards with native fp8\/fp4, this is the cleanest path to dense 70B at bf16 and mid-size MoE. Two independent 70B streams -- one per card -- or 200B MoE across both with minimal 2-way TP overhead.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eLLMs -- text \/ reasoning \/ coding\u003c\/h3\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin-bottom:8px\"\u003eChinese frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eQwen3 \/ Qwen3.5 (Alibaba):\u003c\/strong\u003e Qwen3-235B-A22B Q4 (~132 GB) comfortable with long ctx (~15-25 tok\/s single-stream across 2 cards); Qwen3-Coder-480B-A35B Q2 (~160 GB); Qwen3.5-122B-A10B fp8 (~75 GB); Qwen3-32B dense bf16 with huge KV; QwQ-32B bf16\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eDeepSeek:\u003c\/strong\u003e DeepSeek-V3\/R1 Q2 (~215 GB with small RAM spill) -- Blackwell runs fp8 natively; DeepSeek-R2 32B bf16 two concurrent streams (one per card)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eGLM \/ Z.ai:\u003c\/strong\u003e GLM-4.5 \/ 4.6 \/ 4.7 Q4 (~177 GB) -- hero config at this tier; GLM-4.5-Air fp8 or bf16 with huge KV\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eTencent Hunyuan:\u003c\/strong\u003e Hunyuan-Large Q3 (~160 GB) -- 389B MoE with 256k ctx; Hunyuan-A13B fp8 native (~80 GB) with huge KV\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eOthers:\u003c\/strong\u003e Baidu ERNIE-4.5-424B Q3 (~180 GB); InternVL3.5-241B-A28B Q4 (~135 GB); MiniMax-M1 Q3 (~180 GB)\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003cp style=\"font-size:14px;font-weight:700;color:#fab400;text-transform:uppercase;letter-spacing:1px;margin:20px 0 8px 0\"\u003eWestern frontier\u003c\/p\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eMeta Llama:\u003c\/strong\u003e Llama 3.3 70B bf16 on one card -- two independent concurrent 70B streams (~20-30 tok\/s per stream); Llama 4 Scout bf16 (~218 GB, tight); Llama 4 Maverick Q3 (~188 GB)\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eMistral:\u003c\/strong\u003e Mistral Large 2 \/ Pixtral Large \/ Devstral 2 123B Q6 (~88 GB) single-card or bf16 across both; Mistral Small 3 multi-stream\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eOpenAI (open weights):\u003c\/strong\u003e gpt-oss-120b MXFP4 native (80 GB) -- fits on ONE card, two independent concurrent streams\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eNVIDIA Nemotron:\u003c\/strong\u003e Llama-3.1-Nemotron Ultra 253B Q4 (~147 GB); Super 49B bf16 on single card\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eOthers:\u003c\/strong\u003e Cohere Command R+ 104B Q6 (~85 GB) on one card; Google Gemma 3 27B bf16 multiple concurrent streams\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVision-Language Models\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eInternVL3.5-241B-A28B Q4 (~135 GB); Qwen3-VL-235B-A22B Q4; Qwen3-VL-32B bf16 single-card; Pixtral Large 124B bf16 or Q6; Llama 3.2 90B Vision bf16 (~180 GB); Molmo 72B bf16 (~144 GB); GLM-4.6V 106B fp8; Gemma 3 27B multimodal x 2-3 concurrent streams.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eImage generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eFLUX.1 [dev] bf16 multiple concurrent streams; FLUX.1 Kontext [dev]; FLUX Tools; SD 3.5 Large bf16 concurrent; HunyuanImage-2.1 bf16 (~34 GB) x 2-4 concurrent; HunyuanImage-3.0 base (80B MoE, 13B active) bf16 -- fits on one card; HunyuanDiT; Kolors \/ Kolors 2.0; AuraFlow; OmniGen v1; PixArt-Sigma.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eVideo generation\u003c\/h3\u003e\n\u003cp style=\"font-size:15px;color:#333\"\u003eWan 2.2 MoE dual-expert bf16 full context -- fits on one card, two concurrent generation streams; Wan 2.2 TI2V-5B; HunyuanVideo 13B bf16 both experts; HunyuanVideo 1.5; CogVideoX-5B bf16; Open-Sora 2.0 11B bf16; Mochi-1 bf16 (~42 GB); LTX-Video; Pyramid Flow; SVD \/ SV3D \/ SV4D; NVIDIA Cosmos Predict 2.\u003c\/p\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eAudio \/ Speech \/ TTS\u003c\/h3\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003e\n\u003cstrong\u003eASR:\u003c\/strong\u003e Whisper v3 large \/ turbo (~50x realtime); Parakeet-TDT; Canary 1B; Qwen3-ASR; SenseVoice\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eTTS:\u003c\/strong\u003e CosyVoice 2\/3; Kokoro 82M; XTTS v2; Stable Audio Open; Step-Audio-EditX\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eRealtime \/ S2S:\u003c\/strong\u003e Kyutai Moshi 7B; Step-Audio 2 mini\/R1; Qwen2.5-Omni-7B\u003c\/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eMusic \/ SFX:\u003c\/strong\u003e MusicGen \/ AudioGen \/ Bark; SeamlessM4T v2\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch3 style=\"font-size:18px;font-weight:700;margin:28px 0 12px 0;color:#0d0d0d\"\u003eMulti-model \/ multi-tenant serving\u003c\/h3\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eTwo independent 70B streams -- one per card, simplest form of tenant isolation\u003c\/li\u003e\n\u003cli\u003eDense 70B bf16 + supporting stack -- LLM on card 1, image\/video\/audio on card 2\u003c\/li\u003e\n\u003cli\u003e200B MoE across both cards -- minimal tensor-parallel overhead (2-way split)\u003c\/li\u003e\n\u003cli\u003efp8-native frontier -- DeepSeek V3 family, Hunyuan-Large fp8 with Blackwell native paths\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eTarget workloads\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eDense 70B bf16 inference -- two cards tensor-parallel with minimal overhead, or one model per card for streaming\u003c\/li\u003e\n\u003cli\u003e100-150B MoE at Q4-Q6 (GLM-4.5-Air, Qwen3.5-122B-A10B, Hunyuan-A13B, Llama 4 Scout)\u003c\/li\u003e\n\u003cli\u003eFP8-native frontier inference (DeepSeek V3 family, Hunyuan, Llama 4) -- Blackwell runs fp8 natively\u003c\/li\u003e\n\u003cli\u003eScientific computation requiring datacenter-grade Gen5 NVMe throughput and ECC memory\u003c\/li\u003e\n\u003cli\u003eImage + video generation studio at bf16 (Wan 2.2 T2V-A14B, HunyuanVideo 13B, FLUX.1 [dev])\u003c\/li\u003e\n\u003cli\u003eRack-density-constrained deployments -- 2U form factor vs the 4U Rome equivalent at same VRAM\u003c\/li\u003e\n\u003cli\u003eProcurement specs mandating PCIe 5.0 \/ DDR5 platform or redundant PSU\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eMeasured performance\u003c\/h2\u003e\n\n\u003cdiv style=\"background:#0d0d0d;color:#fff;border-radius:12px;padding:24px;margin-bottom:24px\"\u003e\n\u003cp style=\"margin:0 0 4px 0;font-size:13px;color:#888;text-transform:uppercase;letter-spacing:1px\"\u003ePublished references | NVIDIA RTX Pro 6000 Blackwell Server Edition datasheet + community benchmarks\u003c\/p\u003e\n\u003ctable style=\"width:100%;border-collapse:collapse;margin-top:16px;font-size:14px\"\u003e\n\u003cthead\u003e\u003ctr style=\"border-bottom:1px solid #333\"\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eBenchmark\u003c\/th\u003e\n\u003cth style=\"padding:8px 12px;text-align:left;color:#888;font-weight:600\"\u003eResult\u003c\/th\u003e\n\u003c\/tr\u003e\u003c\/thead\u003e\n\u003ctbody\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003ePer-card INT8 TOPS (NVIDIA datasheet)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e2 000 TOPS\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eAggregate INT8 TOPS (2 cards)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fab400;font-weight:700\"\u003e4 000 TOPS\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eMemory bandwidth per card\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~1 800 GB\/s, 96 GB ECC GDDR7\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eLlama 3.3 70B bf16 per-card (community)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e15-25 tok\/s single-stream, 60-90 tok\/s batch -- expected improvement from Gen5 host-side memory path in streaming batch workloads vs Gen4 host\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eGen5 host-side advantage (single-card same silicon)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003ePCIe 5.0 x16 end-to-end reduces host-device transfer latency for streaming batch workloads; on-card compute-bound tasks see identical throughput to Gen4-hosted builds\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr style=\"border-bottom:1px solid #222\"\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eDual-card tensor-parallel 70B (community)\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003e~30-45 tok\/s single-stream expected\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003ctr\u003e\n\u003ctd style=\"padding:10px 12px;color:#ccc\"\u003eBlackwell fp8 native\u003c\/td\u003e\n\u003ctd style=\"padding:10px 12px;color:#fff\"\u003eDeepSeek-V3 fp8, Hunyuan-A13B fp8 run without bf16 upcast\u003c\/td\u003e\n\u003c\/tr\u003e\n\u003c\/tbody\u003e\n\u003c\/table\u003e\n\u003cp style=\"margin:12px 0 0 0;font-size:13px;color:#666\"\u003ePublished external references, not measured on Kentino hardware. Kentino will publish first-party numbers after the first customer build.\u003c\/p\u003e\n\u003c\/div\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eNot ideal for\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eVery high concurrency multi-tenant serving -- 4x L40 or 6x L4 distributes better across more cards\u003c\/li\u003e\n\u003cli\u003eHeavy KV cache at very long context -- step up to K-AI 576 Genoa RTXPro6000 12000TOPS\u003c\/li\u003e\n\u003cli\u003eTraining -- Kentino does not sell H-class NVLink fabrics\u003c\/li\u003e\n\u003cli\u003eBudget inference at this VRAM pool -- the 4U Rome K-AI 192 RTXPro6000 4000TOPS build is lower-cost if Gen4 host-side is acceptable and PSU redundancy is not required\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eWarranty and lead time\u003c\/h2\u003e\n\u003cdiv style=\"display:flex;gap:16px;flex-wrap:wrap;margin-bottom:24px\"\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e2 years\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003eparts warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e1 year\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elabor warranty\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cdiv style=\"flex:1;min-width:150px;background:#f4f4f4;border-radius:8px;padding:20px;text-align:center\"\u003e\n\u003cdiv style=\"font-size:24px;font-weight:800;color:#0d0d0d\"\u003e14-21 days\u003c\/div\u003e\n\u003cdiv style=\"font-size:13px;color:#666;text-transform:uppercase;letter-spacing:1px\"\u003elead time\u003c\/div\u003e\n\u003c\/div\u003e\n\u003c\/div\u003e\n\u003cp style=\"font-size:14px;color:#666\"\u003eNVIDIA OEM 3-year warranty on RTX Pro 6000 Server Edition + 36-month chassis warranty + Kentino integration warranty. Build includes assembly, BIOS\/firmware configuration, IPMI setup, driver install, burn-in testing, and functional verification. Lead time of 14-21 business days reflects reseller order for Turin-class components; confirmed at order placement.\u003c\/p\u003e\n\n\u003ch2 style=\"font-size:22px;font-weight:700;margin:40px 0 16px 0;padding-bottom:8px;border-bottom:3px solid #2563eb\"\u003eRecommended add-ons\u003c\/h2\u003e\n\u003cul style=\"font-size:15px;color:#333;line-height:1.8\"\u003e\n\u003cli\u003eExpand to 4-card configuration -- chassis has 4 GPU bays natively (current build uses 2 of 4), upgrade path to K-AI 384 Turin2U RTXPro6000 8000TOPS\u003c\/li\u003e\n\u003cli\u003eAdd 25 GbE or 100 GbE via OCP 3.0 slot (Mellanox ConnectX-5\/6 OCP variant)\u003c\/li\u003e\n\u003cli\u003eAdditional Kioxia CD8-P NVMe in the 2 remaining U.2 bays for RAID or scratch storage\u003c\/li\u003e\n\u003cli\u003eUpgrade storage tier to Samsung PM1743 or Kioxia CM7-V for higher endurance (3 DWPD)\u003c\/li\u003e\n\u003cli\u003e24U rack cabinet + online UPS 5 kVA\u003c\/li\u003e\n\u003c\/ul\u003e\n\n\u003c\/div\u003e\n","brand":"Kentino s.r.o.","offers":[{"title":"Default Title","offer_id":52942435975496,"sku":null,"price":56600.0,"currency_code":"EUR","in_stock":true}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0843\/5479\/3800\/files\/kentino-ai-server-4-gpu-topdown_6b2c51b2-25c1-479d-929a-29eebe60e5ef.jpg?v=1776940959"}],"thumbnail_url":"\/\/cdn.shopify.com\/s\/files\/1\/0843\/5479\/3800\/collections\/collection-ai-servers.jpg?v=1749222353","url":"https:\/\/kentino.com\/fi\/collections\/ai-servers.oembed?page=2","provider":"Kentino","version":"1.0","type":"link"}