Kentino s.r.o.

Kentino AI 32 Rome 5090 1676TOPS — 1x RTX 5090 AI Workstation

Name: Kentino AI 32 Rome 5090 1676TOPS — 1x RTX 5090 AI Workstation
Brand: Kentino s.r.o.
Price: 13000.00 EUR
Availability: InStock

€13.000,00 EUR

Aanbieding Uitverkocht

Verzendkosten worden berekend bij de checkout.

Aantal

Kentino AI 32 Rome 5090 1676TOPS

Single-GPU Blackwell Workstation
1x RTX 5090 | EPYC Milan | 1 676 TOPS INT8

1 676

TOPS INT8

32 GB

VRAM GDDR7

fp8

native tensor

rack

ready

Single Blackwell GPU, 32 GB GDDR7, fp8 native — the sharpest single-card AI workstation Kentino builds.

A single-GPU, workstation-class AI server on the ROMED8-2T / EPYC Milan platform. One RTX 5090 delivers 32 GB of GDDR7 VRAM with native fp8 tensor math — the sweet spot for a developer box, a small-team inference endpoint, or an image/video generation workstation where one strong GPU beats two weaker ones. 4U rack form factor, but drop-in for a quiet office under-desk deployment.

Hardware

Component	Detail
GPU	1x NVIDIA GeForce RTX 5090 32 GB GDDR7 (575 W, PCIe 5.0 x16, Blackwell)
VRAM pool	32 GB
CPU	AMD EPYC 7643 Milan (48C/96T, 225 W, 128x PCIe 4.0 lanes)
Motherboard	ASRock Rack ROMED8-2T (SP3, 7x PCIe 4.0 x16, 8x DDR4 ECC, 2x 10 GbE, IPMI)
System RAM	128 GB DDR4-2666 ECC RDIMM (2x 64 GB)
Boot / storage	1 TB NVMe M.2 (PCIe 4.0 x4)
Power supply	Single 2 kW ATX PSU
Chassis	4U rack-mount, passive Gen4 x16 riser
Cooling	SP3 tower cooler (Arctic Freezer 4U-M class), 3x 120 mm front intake + 1x 120 mm rear exhaust
Network	Onboard dual 10 GbE (Intel X550) + IPMI

Power envelope

GPU draw: 1 x 575 W = 575 W
System total at full load: ~900 W
PSU total: 2 000 W (single 2 kW ATX) — 55 % headroom
Generous transient margin, silent operation at light load

Lane topology

PCIe Gen4 x16 at the GPU (ROMED8-2T is Gen4; 5090 is Gen5 silicon running Gen4 without bandwidth penalty for inference). 16 lanes direct from CPU root complex. No PCIe switch. No NVLink on GeForce 5090.

What you can run

With 32 GB of GDDR7 VRAM and native fp8 tensor math, this workstation handles open-weight LLMs up to 32B dense, image generation with FLUX.1, video generation, speech AI, and single-developer multi-model stacks.

LLMs — text / reasoning / coding

Chinese frontier

Qwen3-32B dense Q6_K — 32k context, flagship general reasoning (~40-55 tok/s single-stream on Blackwell fp8, published reference)
Qwen3-30B-A3B MoE at Q4_K_M with long KV headroom (Qwen3-Coder-30B-A3B agentic, 256k ctx)
QwQ-32B Q6 — reasoning preview
DeepSeek-R2 32B sparse MoE at Q4-Q6 — single-GPU reasoning that scores 92.7 % AIME-2025 (~45-60 tok/s single-stream on Blackwell fp8, published reference)
Qwen3.5-27B dense Q6 (Feb 2026 release)
Hunyuan-A13B at Q4_K_M (~28-30 GB) — 80B/13B MoE, 256k ctx, dual-mode reasoning
Seed-OSS-36B Q4_K_M — 512k native context for long-doc analysis

Western frontier

Llama 3.3 70B at Q2_K (~27 GB tight) or Q3_K (~34 GB with RAM spill) — usable for general chat
Mistral Small 3 / Magistral Small / Devstral Small 2 (24B dense) at Q6-Q8 or bf16
Gemma 3 27B multimodal at Q6 with 128k context
Phi-4 14B / Phi-4-reasoning bf16
Reka Flash 3 (21B Apache 2.0) at bf16
gpt-oss-20b native MXFP4 (~16 GB — fits with generous KV)

Vision-Language

Qwen3-VL-8B / -32B at Q4-Q6; Qwen3-VL-30B-A3B MoE; InternVL3.5-8B / -38B Q4; MiniCPM-V 2.6 / MiniCPM-o 2.6 (8B); Llama 3.2 11B Vision bf16; Pixtral 12B bf16 (24 GB — tight, use Q8); Gemma 3 12B / 27B multimodal; PaliGemma 2 (3/10B); Phi-4-multimodal 5.6B; Aya Vision 8B.

Image generation

FLUX.1 [dev] / [schnell] fp8 (~12 GB) native Blackwell speedup (~8-12 seconds per 1024x1024 image at 20 steps on Blackwell, published reference); FLUX.1 Kontext [dev] — in-context editing, character consistency; SD 3.5 Large (18 GB fp16 / 11 GB fp8); SDXL 1.0 10-12 GB fp16; HunyuanImage-2.1 NF4 (~14 GB); Kolors 2.0 fp8; AuraFlow v0.3 / OmniGen v1 / PixArt-Sigma.

Video generation

Wan 2.2 TI2V-5B at ~16 GB — 720p@24fps on a single 5090; Wan 2.1 T2V/I2V 14B at Q4-Q6 (~16 GB); HunyuanVideo 1.5 (8.3B) — 14 GB minimum; CogVideoX-5B / 5B-I2V int8 (~12 GB); LTX-Video 2B realtime-class 30 fps; Mochi-1 Q4 (~17-18 GB).

Audio / Speech / TTS

ASR: Whisper v3 large / turbo (~50x realtime on single GPU, published reference); NVIDIA Parakeet-TDT 1.1B; Canary 1B
TTS: CosyVoice 2.0 / Fun-CosyVoice 3.0; Kokoro 82M; Stable Audio Open
Realtime / S2S: Kyutai Moshi (7B) — only open realtime full-duplex voice; Step-Audio 2 mini / R1

Multi-model / multi-tenant

Resident stack for a single developer: Qwen3-32B Q6 (~20 GB) + FLUX.1 fp8 (~12 GB fits tight) on swap, or Qwen3-14B Q6 (~9 GB) + FLUX.1 + Whisper-turbo + Kokoro simultaneously (~20-24 GB pinned)
2-4 concurrent users on 14-32B class LLMs via vLLM / SGLang
LoRA / QLoRA fine-tuning of 7-14B dense models

Target workloads

Developer workstation for a single AI engineer running mixed inference + image gen
Small-team coding-agent endpoint (Qwen3-Coder-30B-A3B) with 1-4 concurrent users
Content pipeline: FLUX.1 or SD 3.5 Large batch image gen + Wan 2.2 short-form video
On-premises ASR + TTS voice stack (Whisper + Kokoro + Moshi) for a branch office
Prosumer LLM + VLM research box — test Qwen3, Llama 3.3, Gemma 3, Phi-4 on real hardware

Published performance references

Published reference | single RTX 5090 comparable hardware

Benchmark	Result
Llama 3.3 70B Q4_K_M llama.cpp decode	~18-22 tok/s with CPU KV offload
Qwen3-32B Q6 vLLM single-stream	~45-55 tok/s decode at fp8
FLUX.1 [dev] fp8 on Blackwell	~1.7-2.0 s per 1024x1024 image at 20 steps
Wan 2.2 TI2V-5B 720p clip	~3-4 minutes at fp16

Published reference points from comparable single-5090 hardware. Kentino measured numbers will be posted once gf-logic extends bench to single-5090.

Not ideal for

70B dense models at Q6+ (32 GB is insufficient — use 2x 5090 for proper 64 GB pool)
Multi-user concurrent serving at scale (single tensor-parallel partition)
Frontier 100B+ MoE (GLM-4.5, Kimi K2, Mistral Large 3 — out of reach on a single consumer card)

Warranty and lead time

2 years

parts warranty

1 year

labor warranty

10-28 days

lead time

Build includes assembly, BIOS configuration, driver install, burn-in testing, and functional verification. Lead time depends on component availability, confirmed at order.

Recommended add-ons

NVIDIA ConnectX-5 100 GbE MCX555A-ECAT
Upgrade boot drive to 2 TB NVMe — or 4 TB
Upgrade RAM to 256 GB (4x 64 GB DDR4) for bigger KV cache / multi-model concurrent stacks
Rack PDU (C13/C19 metered) and 2 kVA online UPS