Ga direct naar productinformatie
1 van 14

Kentino s.r.o.

K-AI 48 Rome 4090 1322TOPS — 2x RTX 4090 Entry AI Server

K-AI 48 Rome 4090 1322TOPS — 2x RTX 4090 Entry AI Server

Normale prijs €11.434,00 EUR
Normale prijs Aanbiedingsprijs €11.434,00 EUR
Aanbieding Uitverkocht
Belastingen inbegrepen. Verzendkosten worden berekend bij de checkout.

K-AI 48 Rome 4090 1322TOPS

48 GB VRAM Entry 2-GPU Server
2x RTX 4090 | EPYC Rome | 1 322 TOPS INT8

1 322
TOPS INT8
48 GB
VRAM pool
2 GPU
tensor parallel
rack
ready

48 GB VRAM pool across two RTX 4090 — the cost-floor for 32B-class tensor-parallel inference.

A two-GPU Ada workstation-class AI server built on ROMED8-2T / EPYC Rome. Two RTX 4090 give a 48 GB pooled VRAM envelope that comfortably runs 32B dense Q6-Q8, Hunyuan-A13B at Q6, Wan 2.1 14B video, and Pixtral 12B vision — the best all-round model selection per Euro the Kentino lineup offers, before stepping up to Blackwell.

Hardware

Component Detail
GPUs 2x NVIDIA GeForce RTX 4090 24 GB GDDR6X (450 W, PCIe 4.0 x16)
VRAM pool 48 GB (no NVLink — tensor-parallel over PCIe)
CPU AMD EPYC 7542 Rome (32C/64T, 225 W, 128x PCIe 4.0 lanes)
Motherboard ASRock Rack ROMED8-2T (SP3, 7x PCIe 4.0 x16, 8x DDR4 ECC, 2x 10 GbE, IPMI)
System RAM 128 GB DDR4-2666 ECC RDIMM (2x 64 GB)
Boot / storage 1 TB NVMe M.2 (PCIe 4.0 x4)
Power supply Single 2 kW ATX PSU
Chassis 4U rack-mount, passive Gen4 x16 risers
Cooling SP3 tower cooler, 3x 120 mm front intake + 1x 120 mm rear exhaust
Network Onboard dual 10 GbE (Intel X550) + IPMI

Power envelope

  • GPU draw: 2 x 450 W = 900 W
  • System total at full load: ~1 225 W
  • PSU total: 2 000 W (single 2 kW ATX) — 38.75 % headroom
  • Comfortable single-PSU margin

Lane topology

ROMED8-2T fans out 2x16 directly from CPU root complex — no PLX switch. Consumer 4090 has no NVLink; tensor-parallel communicates over PCIe. PCIe Gen4 x16 at both GPUs.

What you can run

With 48 GB of pooled VRAM across 2 cards, this server handles 32B-class dense LLMs at Q6-Q8, MoE flagships, image and video generation, speech AI, and multi-tenant serving.

LLMs — text / reasoning / coding

Chinese frontier

  • Qwen3-32B dense Q6-Q8 (~25-35 tok/s single-stream on 2x 4090, published reference); QwQ-32B Q6; Qwen3.5-27B Q6-Q8
  • Qwen3-30B-A3B / Qwen3-Coder-30B-A3B bf16 (~60 GB tight; use Q6)
  • Hunyuan-A13B Q6 or fp8 (~48 GB) — 80B/13B MoE, 256k ctx
  • Seed-OSS-36B Q6 — 512k native ctx
  • DeepSeek-R2 32B sparse MoE bf16 (~64 GB tight — prefer Q6 ~45 GB) (~30-40 tok/s single-stream at Q4, published reference)
  • ERNIE-4.5-47B-A3B Q4 (~28 GB with headroom) / Q6 (~42 GB)

Western frontier

  • Llama 3.3 70B Q4_K_M (~43 GB) tensor-parallel 2-way — the sweet spot of this class (~14-17 tok/s single-stream on 2x 4090, published reference)
  • Llama 4 Scout 109B/17B MoE Q3_K (~51 GB tight)
  • Mistral Small 3 / Magistral Small / Devstral Small 2 (24B) bf16
  • Mixtral 8x7B Q6
  • Gemma 3 27B bf16; Phi-4 14B bf16
  • Nemotron-Super 49B Q4 (~28 GB)
  • Others: OLMo 2 32B; Reka Flash 3 21B bf16; Falcon H1R 7B

Vision-Language

Qwen3-VL-32B / Qwen3-VL-30B-A3B MoE / Qwen3-Omni-30B-A3B; InternVL3-38B Q4-Q5; InternVL3.5-38B; DeepSeek-VL2; ERNIE-4.5-VL-28B-A3B-Thinking; Llama 3.2 11B Vision bf16; Pixtral 12B bf16; Gemma 3 27B multimodal; PaliGemma 2 28B Q4; MiniCPM-V 2.6 / MiniCPM-o 2.6.

Image generation

FLUX.1 [dev] / [schnell] fp16 (24 GB) or fp8 (~12 GB) with generous batch (~15-25 seconds per 1024x1024 image at fp8 per card, published reference); FLUX.1 Kontext [dev]; SD 3.5 Large (18 GB fp16); SDXL 1.0 + ControlNet + AnimateDiff; HunyuanImage-2.1 bf16 (~34 GB fits in pool); AuraFlow v0.3 / OmniGen v1 / Kolors 2.0.

Video generation

Wan 2.1 14B T2V/I2V Q6/fp8; Wan 2.2 TI2V-5B bf16 single-card; Wan 2.2 T2V-A14B / I2V-A14B Q4 (~32 GB); HunyuanVideo 13B Q4-Q5 (~30 GB); HunyuanVideo 1.5 (8.3B) bf16; Open-Sora 2.0 (11B) Q8; CogVideoX-5B / 1.5 bf16; Mochi-1 Q4-Q8; LTX-Video 2B; Pyramid Flow 2B.

Audio / Speech / TTS

Full 24 GB tier stack fits with room for concurrent use: Whisper v3 large + Parakeet-TDT + Canary 1B + Moshi + Step-Audio 2 mini + CosyVoice 3.0 + Kokoro 82M + Stable Audio Open all residable simultaneously. Whisper v3 turbo runs at ~50x realtime on a single card (published reference).

Multi-model / multi-tenant

  • 2-4 concurrent users on 32B Q6 class LLMs via vLLM tensor-parallel
  • Mixed workload: Qwen3-32B Q6 (~20 GB) + FLUX.1 fp8 (~12 GB) + Whisper-turbo (1.6 GB) + Moshi (8 GB) resident across 2 cards
  • LoRA / QLoRA fine-tuning of 7-14B models comfortably, 24-32B tight

Target workloads

  • Two-operator AI workstation with mixed LLM + image + audio stacks
  • 32B-class serving endpoint for small-team developer environment (4-8 concurrent users on Qwen3-32B / Gemma 3 27B)
  • Image generation pipeline (FLUX.1 + SD 3.5 + ControlNet) batch production
  • Video-gen development box (Wan 2.1 / Wan 2.2 TI2V / HunyuanVideo 1.5)
  • LoRA / QLoRA fine-tuning research box for 7-34B Chinese + Western weights

Published performance references

Published reference | 2x RTX 4090 comparable hardware

Benchmark Result
Llama 3.3 70B Q4_K_M llama.cpp decode ~14-17 tok/s single-stream
Qwen3-32B Q6 vLLM single-stream ~35-45 tok/s decode
FLUX.1 [dev] fp8 ~2.5-3.0 s per 1024x1024 at 20 steps
vLLM batch-32 aggregate (extrapolated from 4x4090) ~90 tok/s aggregate

Published reference points from comparable 2x4090 hardware. Not measured on Kentino hardware.

Not ideal for

  • 70B dense at Q6+ (needs 96 GB pool — step up to 4x RTX 4090 or 4x RTX 5090)
  • Frontier 100B+ MoE at bf16 (GLM-4.5, Kimi K2, Mistral Large 3)

Warranty and lead time

2 years
parts warranty
1 year
labor warranty
10-28 days
lead time

Build includes assembly, BIOS configuration, driver install, burn-in testing, and functional verification. Lead time depends on component availability, confirmed at order.

Recommended add-ons

  • NVIDIA ConnectX-5 100 GbE MCX555A-ECAT
  • Upgrade boot drive to 2 TB NVMe
  • Upgrade RAM to 256 GB (4x 64 GB) — more KV cache headroom for long-ctx MoE
  • Rack PDU (C13/C19 metered) and 2 kVA online UPS
Alle details bekijken