Kentino s.r.o.

Kentino AI 192 Rome ArcProB70 TBD — 6× Intel Arc Pro B70 — EPYC Milan (Pre-Order)

Name: Kentino AI 192 Rome ArcProB70 TBD — 6× Intel Arc Pro B70 — EPYC Milan (Pre-Order)
Brand: Kentino s.r.o.
Price: 20793.00 EUR
Availability: InStock

€20.793,00 EUR

Alennusmyynti Loppuunmyyty

Toimituskulut lasketaan kassalla.

Määrä

IN PREPARATION

Pre-order — Intel Arc Pro B70 shipping target Q3 2026

Kentino AI 192 Rome ArcProB70 TBD

192 GB VRAM Intel Xe2 Inference Server
6x Arc Pro B70 | EPYC Milan | TOPS TBD

TBD

INT8 TOPS

192 GB

VRAM pool

Intel

Xe2 Battlemage

6-card

OpenVINO / SYCL

Budget-oriented high-VRAM build targeting the Intel open-source inference stack. Pricing locked at Intel availability.

A 4U rack-mount inference server with six Intel Arc Pro B70 Creator cards (32 GB Xe2-HPG "Battlemage" each, 192 GB aggregate), one AMD EPYC 7643 Milan CPU (48C/96T), 384 GB DDR4 ECC, 2 TB NVMe boot, and a 2 kW ATX PSU (dual-PSU upgrade strongly recommended). Built for the Intel software ecosystem: OpenVINO 2025+, IPEX-LLM, llama.cpp SYCL backend, and vLLM-Intel forks. CUDA-only workloads do not run on this hardware.

Hardware

Component	Detail
GPUs	6x Intel Arc Pro B70 Creator 32 GB (Xe2-HPG "Battlemage", 250 W, PCIe 5.0 x16, dual-slot)
VRAM pool	192 GB aggregate across 6 cards (no inter-card fabric — peer traffic over PCIe)
CPU	AMD EPYC 7643 Milan (48C/96T, 225 W, 128x PCIe 4.0 lanes)
Motherboard	ASRock Rack ROMED8-2T (SP3, 7x PCIe 4.0 x16, 8x DDR4 ECC, 2x 10 GbE, IPMI)
System RAM	384 GB DDR4-2666 ECC RDIMM (6x 64 GB)
Boot / storage	2 TB NVMe M.2 (PCIe 4.0 x4)
Power supply	1x 2 kW ATX PSU (dual 2 kW synced upgrade strongly recommended)
Chassis	4U rack-mount (6-slot layout)
Cooling	SP3 tower cooler (Arctic Freezer 4U-M) + front-to-back directed airflow (industrial fans)
Network	Onboard dual 10 GbE (Intel X550)

Power envelope

GPU draw: 6 x 250 W = 1 500 W (Intel-published TDP)
System total at full load: ~1 825 W
PSU total: 2 000 W (single) — only 8.75 % headroom
Dual 2 kW synced strongly recommended — restores ~45 % headroom

Lane topology

ROMED8-2T provides 7x PCIe 4.0 x16 lanes. Six slots populated; one free for NIC upsell. Arc Pro B70 is PCIe Gen5 native; ROMED8-2T runs at Gen4 — bandwidth impact negligible for inference at 32 GB per card. No PCIe switch. No Xe-Link equivalent.

What you can run

All compatibility claims are Intel-software-stack paths (OpenVINO, IPEX-LLM, llama.cpp SYCL, vLLM-Intel). CUDA-only workloads do not run on this hardware. All figures cite published external sources and are subject to independent verification when cards ship.

LLMs — text / reasoning / coding

Chinese frontier

Qwen3 / Qwen3.5 (Alibaba): Qwen3-235B-A22B Q4 (~132 GB) with long context headroom; Qwen3-Coder-480B-A35B Q2 (~160 GB); Qwen3.5-397B-A17B Q3 (~170 GB)
GLM / Z.ai: GLM-4.5 / 4.6 / 4.7 Q4 (~177 GB) — fits with moderate KV
Tencent Hunyuan: Hunyuan-Large Q3 (~160 GB); Hunyuan-A13B fp8 (~80 GB) if Xe2 fp8 path exposed in driver
Others: Baidu ERNIE-4.5-424B Q3 (~180 GB); MiniMax-M1 Q3 (~180 GB); DeepSeek-R2 32B (6x concurrent streams)

Western frontier

Meta Llama: Llama 3.3 70B Q6-Q8 with generous KV; Llama 4 Scout 109B/17B Q4 (~63 GB) comfortable
Mistral: Mistral Small 3 / Magistral Small / Devstral Small 2 (24B) at bf16; Pixtral Large Q4-Q6
OpenAI (open weights): gpt-oss-120b MXFP4 native (~80 GB) — if MXFP4 dequant available in Intel stack
NVIDIA Nemotron: Llama-3.1-Nemotron Ultra 253B Q4 (~120 GB)
Others: Gemma 3 27B bf16 multimodal; Phi-4 / Phi-4-reasoning 14B; Cohere Command R+ 104B Q4

Vision-Language Models

Qwen3-VL-8B / 32B; Qwen3-VL-30B-A3B MoE; InternVL3 up to 78B; InternVL3.5-38B; Llama 3.2 90B Vision Q4; Pixtral 12B; Molmo 72B Q4; Gemma 3 12B/27B multimodal; MiniCPM-V 2.6 / MiniCPM-o 2.6. Intel's OpenVINO has strong vision-tower support — VLM is a plausible day-one strength.

Image generation

FLUX.1 [dev] / [schnell] fp8 or Q4 GGUF via llama.cpp SYCL; SDXL / SD 3.5 Large via OpenVINO genAI runtime; HunyuanDiT; HunyuanImage-2.1 bf16 (~34 GB); Kolors 2.0; AuraFlow; OmniGen; PixArt-Sigma.

Video generation

Wan 2.2 T2V-A14B / I2V-A14B MoE (~54 GB bf16); Wan 2.2 TI2V-5B; HunyuanVideo 13B bf16; HunyuanVideo 1.5; CogVideoX-5B; Open-Sora 2.0; LTX-Video; Pyramid Flow; Mochi-1 Q4. Video is the weakest Intel path today — expect functional but not throughput-optimal at ship time.

Audio / Speech / TTS

ASR: Whisper v3 large / turbo via OpenVINO (first-class Intel Whisper support); Parakeet-TDT; Canary; SenseVoice
TTS: CosyVoice 2/3; Kokoro 82M; Stable Audio Open; XTTS v2; StyleTTS 2; Step-Audio-EditX
Realtime / S2S: Kyutai Moshi; MusicGen / AudioGen / Bark; SeamlessM4T v2

Multi-model / multi-tenant serving

6 concurrent streams of a 32 GB Q4 model (one per card) — e.g. 6x Qwen3-32B Q4 agents
Embedding-fleet at scale — 6x parallel BGE-M3 / E5 / Nomic Embed streams (OpenVINO-optimized)
Mixed residency — 70B Q4 (tensor-parallel over 3 cards) + FLUX.1 (1 card) + Whisper-turbo (1 card) + Moshi (1 card)

Target workloads

Intel-software evaluation pilot for CUDA-alternative LLM serving
Embedding / reranker backend where VRAM-per-EUR dominates throughput requirements
Budget Q4 frontier-MoE inference (Qwen3-235B, GLM-4.5/4.6/4.7) for small internal dev teams
OpenVINO-native model deployment alongside existing Intel Xeon / Arc Pro pipelines
VLM / OCR / document-processing backend (Intel's OpenVINO strength)

Measured performance

Intel-published specs | Subject to independent verification when cards ship

Spec	Value
VRAM per card	32 GB GDDR6
Memory bandwidth class	~450 GB/s per card
Xe Matrix Extensions (XMX)	Accelerated via OpenVINO / IPEX-LLM
fp8 path	Xe2 silicon — verify driver exposure at ship time

No Kentino measured data. Intel-published specs subject to independent verification. Kentino will publish first-party tok/s / QPS / bandwidth numbers once the first unit passes burn-in.

Not ideal for

CUDA-native workloads — no CUDA on Intel, expect migration friction
Production SLA-critical deployments until Intel Arc Pro supply and tooling stabilize
Frontier 600B+ MoE at Q4+ (requires 6x RTX Pro 6000 / 576 GB pool)
Training workloads — Arc Pro is inference-first, framework maturity for distributed training is limited
Customers who require measured benchmarks before purchase — this SKU is pre-order

Warranty and lead time

2 years

parts warranty

1 year

labor warranty

Q3 2026

target shipping

Kentino standard warranty (2 years parts, 1 year labor); Intel distribution terms supersede where stricter. Build includes assembly, BIOS configuration, driver install, burn-in testing, and functional verification. Reserve your first-wave delivery slot via the Kentino contact form. 30-day price-commit window at order.

Recommended add-ons

Dual 2 kW synced PSU upgrade (single-PSU headroom is tight at 1 825 W draw — strongly recommended)
Upgrade RAM to 512 GB DDR4 (2x 64 GB — two slots open)
4 TB NVMe secondary drive for model library