Kentino s.r.o.

K-AI 48 Rome 4090 1322TOPS — 2x RTX 4090 Entry AI Server

Name: K-AI 48 Rome 4090 1322TOPS — 2x RTX 4090 Entry AI Server
Brand: Kentino s.r.o.
Price: 12050.00 EUR
Availability: InStock

€12.050,00 EUR

Aanbieding Uitverkocht

Verzendkosten worden berekend bij de checkout.

Aantal

K-AI 48 Rome 4090 1322TOPS

48 GB VRAM Entry 2-GPU Server
2x RTX 4090 | EPYC Rome | 1 322 TOPS INT8

1 322

TOPS INT8

48 GB

VRAM pool

2 GPU

tensor parallel

rack

ready

48 GB VRAM pool across two RTX 4090 — the cost-floor for 32B-class tensor-parallel inference.

A two-GPU Ada workstation-class AI server built on ROMED8-2T / EPYC Rome. Two RTX 4090 give a 48 GB pooled VRAM envelope that comfortably runs 32B dense Q6-Q8, Hunyuan-A13B at Q6, Wan 2.1 14B video, and Pixtral 12B vision — the best all-round model selection per Euro the Kentino lineup offers, before stepping up to Blackwell.

Hardware

Component	Detail
GPUs	2x NVIDIA GeForce RTX 4090 24 GB GDDR6X (450 W, PCIe 4.0 x16)
VRAM pool	48 GB (no NVLink — tensor-parallel over PCIe)
CPU	AMD EPYC 7542 Rome (32C/64T, 225 W, 128x PCIe 4.0 lanes)
Motherboard	ASRock Rack ROMED8-2T (SP3, 7x PCIe 4.0 x16, 8x DDR4 ECC, 2x 10 GbE, IPMI)
System RAM	128 GB DDR4-2666 ECC RDIMM (2x 64 GB)
Boot / storage	1 TB NVMe M.2 (PCIe 4.0 x4)
Power supply	Single 2 kW ATX PSU
Chassis	4U rack-mount, passive Gen4 x16 risers
Cooling	SP3 tower cooler, 3x 120 mm front intake + 1x 120 mm rear exhaust
Network	Onboard dual 10 GbE (Intel X550) + IPMI

Power envelope

GPU draw: 2 x 450 W = 900 W
System total at full load: ~1 225 W
PSU total: 2 000 W (single 2 kW ATX) — 38.75 % headroom
Comfortable single-PSU margin

Lane topology

ROMED8-2T fans out 2x16 directly from CPU root complex — no PLX switch. Consumer 4090 has no NVLink; tensor-parallel communicates over PCIe. PCIe Gen4 x16 at both GPUs.

What you can run

With 48 GB of pooled VRAM across 2 cards, this server handles 32B-class dense LLMs at Q6-Q8, MoE flagships, image and video generation, speech AI, and multi-tenant serving.

LLMs — text / reasoning / coding

Chinese frontier

Qwen3-32B dense Q6-Q8 (~25-35 tok/s single-stream on 2x 4090, published reference); QwQ-32B Q6; Qwen3.5-27B Q6-Q8
Qwen3-30B-A3B / Qwen3-Coder-30B-A3B bf16 (~60 GB tight; use Q6)
Hunyuan-A13B Q6 or fp8 (~48 GB) — 80B/13B MoE, 256k ctx
Seed-OSS-36B Q6 — 512k native ctx
DeepSeek-R2 32B sparse MoE bf16 (~64 GB tight — prefer Q6 ~45 GB) (~30-40 tok/s single-stream at Q4, published reference)
ERNIE-4.5-47B-A3B Q4 (~28 GB with headroom) / Q6 (~42 GB)

Western frontier

Llama 3.3 70B Q4_K_M (~43 GB) tensor-parallel 2-way — the sweet spot of this class (~14-17 tok/s single-stream on 2x 4090, published reference)
Llama 4 Scout 109B/17B MoE Q3_K (~51 GB tight)
Mistral Small 3 / Magistral Small / Devstral Small 2 (24B) bf16
Mixtral 8x7B Q6
Gemma 3 27B bf16; Phi-4 14B bf16
Nemotron-Super 49B Q4 (~28 GB)
Others: OLMo 2 32B; Reka Flash 3 21B bf16; Falcon H1R 7B

Vision-Language

Qwen3-VL-32B / Qwen3-VL-30B-A3B MoE / Qwen3-Omni-30B-A3B; InternVL3-38B Q4-Q5; InternVL3.5-38B; DeepSeek-VL2; ERNIE-4.5-VL-28B-A3B-Thinking; Llama 3.2 11B Vision bf16; Pixtral 12B bf16; Gemma 3 27B multimodal; PaliGemma 2 28B Q4; MiniCPM-V 2.6 / MiniCPM-o 2.6.

Image generation

FLUX.1 [dev] / [schnell] fp16 (24 GB) or fp8 (~12 GB) with generous batch (~15-25 seconds per 1024x1024 image at fp8 per card, published reference); FLUX.1 Kontext [dev]; SD 3.5 Large (18 GB fp16); SDXL 1.0 + ControlNet + AnimateDiff; HunyuanImage-2.1 bf16 (~34 GB fits in pool); AuraFlow v0.3 / OmniGen v1 / Kolors 2.0.

Video generation

Wan 2.1 14B T2V/I2V Q6/fp8; Wan 2.2 TI2V-5B bf16 single-card; Wan 2.2 T2V-A14B / I2V-A14B Q4 (~32 GB); HunyuanVideo 13B Q4-Q5 (~30 GB); HunyuanVideo 1.5 (8.3B) bf16; Open-Sora 2.0 (11B) Q8; CogVideoX-5B / 1.5 bf16; Mochi-1 Q4-Q8; LTX-Video 2B; Pyramid Flow 2B.

Audio / Speech / TTS

Full 24 GB tier stack fits with room for concurrent use: Whisper v3 large + Parakeet-TDT + Canary 1B + Moshi + Step-Audio 2 mini + CosyVoice 3.0 + Kokoro 82M + Stable Audio Open all residable simultaneously. Whisper v3 turbo runs at ~50x realtime on a single card (published reference).

Multi-model / multi-tenant

2-4 concurrent users on 32B Q6 class LLMs via vLLM tensor-parallel
Mixed workload: Qwen3-32B Q6 (~20 GB) + FLUX.1 fp8 (~12 GB) + Whisper-turbo (1.6 GB) + Moshi (8 GB) resident across 2 cards
LoRA / QLoRA fine-tuning of 7-14B models comfortably, 24-32B tight

Target workloads

Two-operator AI workstation with mixed LLM + image + audio stacks
32B-class serving endpoint for small-team developer environment (4-8 concurrent users on Qwen3-32B / Gemma 3 27B)
Image generation pipeline (FLUX.1 + SD 3.5 + ControlNet) batch production
Video-gen development box (Wan 2.1 / Wan 2.2 TI2V / HunyuanVideo 1.5)
LoRA / QLoRA fine-tuning research box for 7-34B Chinese + Western weights

Published performance references

Published reference | 2x RTX 4090 comparable hardware

Benchmark	Result
Llama 3.3 70B Q4_K_M llama.cpp decode	~14-17 tok/s single-stream
Qwen3-32B Q6 vLLM single-stream	~35-45 tok/s decode
FLUX.1 [dev] fp8	~2.5-3.0 s per 1024x1024 at 20 steps
vLLM batch-32 aggregate (extrapolated from 4x4090)	~90 tok/s aggregate

Published reference points from comparable 2x4090 hardware. Not measured on Kentino hardware.

Not ideal for

70B dense at Q6+ (needs 96 GB pool — step up to 4x RTX 4090 or 4x RTX 5090)
Frontier 100B+ MoE at bf16 (GLM-4.5, Kimi K2, Mistral Large 3)

Warranty and lead time

2 years

parts warranty

1 year

labor warranty

10-28 days

lead time

Build includes assembly, BIOS configuration, driver install, burn-in testing, and functional verification. Lead time depends on component availability, confirmed at order.