Kentino s.r.o.
K-AI 192 Rome ArcProB70 TBD — 6× Intel Arc Pro B70 — EPYC Milan (Pre-Order)
K-AI 192 Rome ArcProB70 TBD — 6× Intel Arc Pro B70 — EPYC Milan (Pre-Order)
No se pudo cargar la disponibilidad de retiro
IN PREPARATION
Pre-order — Intel Arc Pro B70 shipping target Q3 2026
K-AI 192 Rome ArcProB70 TBD
192 GB VRAM Intel Xe2 Inference Server
6x Arc Pro B70 | EPYC Milan | TOPS TBD
Budget-oriented high-VRAM build targeting the Intel open-source inference stack. Pricing locked at Intel availability.
A 4U rack-mount inference server with six Intel Arc Pro B70 Creator cards (32 GB Xe2-HPG "Battlemage" each, 192 GB aggregate), one AMD EPYC 7643 Milan CPU (48C/96T), 384 GB DDR4 ECC, 2 TB NVMe boot, and a 2 kW ATX PSU (dual-PSU upgrade strongly recommended). Built for the Intel software ecosystem: OpenVINO 2025+, IPEX-LLM, llama.cpp SYCL backend, and vLLM-Intel forks. CUDA-only workloads do not run on this hardware.
Hardware
| Component | Detail |
|---|---|
| GPUs | 6x Intel Arc Pro B70 Creator 32 GB (Xe2-HPG "Battlemage", 250 W, PCIe 5.0 x16, dual-slot) |
| VRAM pool | 192 GB aggregate across 6 cards (no inter-card fabric — peer traffic over PCIe) |
| CPU | AMD EPYC 7643 Milan (48C/96T, 225 W, 128x PCIe 4.0 lanes) |
| Motherboard | ASRock Rack ROMED8-2T (SP3, 7x PCIe 4.0 x16, 8x DDR4 ECC, 2x 10 GbE, IPMI) |
| System RAM | 384 GB DDR4-2666 ECC RDIMM (6x 64 GB) |
| Boot / storage | 2 TB NVMe M.2 (PCIe 4.0 x4) |
| Power supply | 1x 2 kW ATX PSU (dual 2 kW synced upgrade strongly recommended) |
| Chassis | 4U rack-mount (6-slot layout) |
| Cooling | SP3 tower cooler (Arctic Freezer 4U-M) + front-to-back directed airflow (industrial fans) |
| Network | Onboard dual 10 GbE (Intel X550) |
Power envelope
- GPU draw: 6 x 250 W = 1 500 W (Intel-published TDP)
- System total at full load: ~1 825 W
- PSU total: 2 000 W (single) — only 8.75 % headroom
- Dual 2 kW synced strongly recommended — restores ~45 % headroom
Lane topology
ROMED8-2T provides 7x PCIe 4.0 x16 lanes. Six slots populated; one free for NIC upsell. Arc Pro B70 is PCIe Gen5 native; ROMED8-2T runs at Gen4 — bandwidth impact negligible for inference at 32 GB per card. No PCIe switch. No Xe-Link equivalent.
What you can run
All compatibility claims are Intel-software-stack paths (OpenVINO, IPEX-LLM, llama.cpp SYCL, vLLM-Intel). CUDA-only workloads do not run on this hardware. All figures cite published external sources and are subject to independent verification when cards ship.
LLMs — text / reasoning / coding
Chinese frontier
- Qwen3 / Qwen3.5 (Alibaba): Qwen3-235B-A22B Q4 (~132 GB) with long context headroom; Qwen3-Coder-480B-A35B Q2 (~160 GB); Qwen3.5-397B-A17B Q3 (~170 GB)
- GLM / Z.ai: GLM-4.5 / 4.6 / 4.7 Q4 (~177 GB) — fits with moderate KV
- Tencent Hunyuan: Hunyuan-Large Q3 (~160 GB); Hunyuan-A13B fp8 (~80 GB) if Xe2 fp8 path exposed in driver
- Others: Baidu ERNIE-4.5-424B Q3 (~180 GB); MiniMax-M1 Q3 (~180 GB); DeepSeek-R2 32B (6x concurrent streams)
Western frontier
- Meta Llama: Llama 3.3 70B Q6-Q8 with generous KV; Llama 4 Scout 109B/17B Q4 (~63 GB) comfortable
- Mistral: Mistral Small 3 / Magistral Small / Devstral Small 2 (24B) at bf16; Pixtral Large Q4-Q6
- OpenAI (open weights): gpt-oss-120b MXFP4 native (~80 GB) — if MXFP4 dequant available in Intel stack
- NVIDIA Nemotron: Llama-3.1-Nemotron Ultra 253B Q4 (~120 GB)
- Others: Gemma 3 27B bf16 multimodal; Phi-4 / Phi-4-reasoning 14B; Cohere Command R+ 104B Q4
Vision-Language Models
Qwen3-VL-8B / 32B; Qwen3-VL-30B-A3B MoE; InternVL3 up to 78B; InternVL3.5-38B; Llama 3.2 90B Vision Q4; Pixtral 12B; Molmo 72B Q4; Gemma 3 12B/27B multimodal; MiniCPM-V 2.6 / MiniCPM-o 2.6. Intel's OpenVINO has strong vision-tower support — VLM is a plausible day-one strength.
Image generation
FLUX.1 [dev] / [schnell] fp8 or Q4 GGUF via llama.cpp SYCL; SDXL / SD 3.5 Large via OpenVINO genAI runtime; HunyuanDiT; HunyuanImage-2.1 bf16 (~34 GB); Kolors 2.0; AuraFlow; OmniGen; PixArt-Sigma.
Video generation
Wan 2.2 T2V-A14B / I2V-A14B MoE (~54 GB bf16); Wan 2.2 TI2V-5B; HunyuanVideo 13B bf16; HunyuanVideo 1.5; CogVideoX-5B; Open-Sora 2.0; LTX-Video; Pyramid Flow; Mochi-1 Q4. Video is the weakest Intel path today — expect functional but not throughput-optimal at ship time.
Audio / Speech / TTS
- ASR: Whisper v3 large / turbo via OpenVINO (first-class Intel Whisper support); Parakeet-TDT; Canary; SenseVoice
- TTS: CosyVoice 2/3; Kokoro 82M; Stable Audio Open; XTTS v2; StyleTTS 2; Step-Audio-EditX
- Realtime / S2S: Kyutai Moshi; MusicGen / AudioGen / Bark; SeamlessM4T v2
Multi-model / multi-tenant serving
- 6 concurrent streams of a 32 GB Q4 model (one per card) — e.g. 6x Qwen3-32B Q4 agents
- Embedding-fleet at scale — 6x parallel BGE-M3 / E5 / Nomic Embed streams (OpenVINO-optimized)
- Mixed residency — 70B Q4 (tensor-parallel over 3 cards) + FLUX.1 (1 card) + Whisper-turbo (1 card) + Moshi (1 card)
Target workloads
- Intel-software evaluation pilot for CUDA-alternative LLM serving
- Embedding / reranker backend where VRAM-per-EUR dominates throughput requirements
- Budget Q4 frontier-MoE inference (Qwen3-235B, GLM-4.5/4.6/4.7) for small internal dev teams
- OpenVINO-native model deployment alongside existing Intel Xeon / Arc Pro pipelines
- VLM / OCR / document-processing backend (Intel's OpenVINO strength)
Measured performance
Intel-published specs | Subject to independent verification when cards ship
| Spec | Value |
|---|---|
| VRAM per card | 32 GB GDDR6 |
| Memory bandwidth class | ~450 GB/s per card |
| Xe Matrix Extensions (XMX) | Accelerated via OpenVINO / IPEX-LLM |
| fp8 path | Xe2 silicon — verify driver exposure at ship time |
No Kentino measured data. Intel-published specs subject to independent verification. Kentino will publish first-party tok/s / QPS / bandwidth numbers once the first unit passes burn-in.
Not ideal for
- CUDA-native workloads — no CUDA on Intel, expect migration friction
- Production SLA-critical deployments until Intel Arc Pro supply and tooling stabilize
- Frontier 600B+ MoE at Q4+ (requires 6x RTX Pro 6000 / 576 GB pool)
- Training workloads — Arc Pro is inference-first, framework maturity for distributed training is limited
- Customers who require measured benchmarks before purchase — this SKU is pre-order
Warranty and lead time
Kentino standard warranty (2 years parts, 1 year labor); Intel distribution terms supersede where stricter. Build includes assembly, BIOS configuration, driver install, burn-in testing, and functional verification. Reserve your first-wave delivery slot via the Kentino contact form. 30-day price-commit window at order.
Recommended add-ons
- Dual 2 kW synced PSU upgrade (single-PSU headroom is tight at 1 825 W draw — strongly recommended)
- Upgrade RAM to 512 GB DDR4 (2x 64 GB — two slots open)
- 4 TB NVMe secondary drive for model library
Share
