Passa alle informazioni sul prodotto
1 su 14

Kentino s.r.o.

K-AI 32 Rome 5090 1676TOPS — 1x RTX 5090 AI Workstation

K-AI 32 Rome 5090 1676TOPS — 1x RTX 5090 AI Workstation

Prezzo di listino €8.092,00 EUR
Prezzo di listino Prezzo scontato €8.092,00 EUR
In offerta Esaurito
Imposte incluse. Spese di spedizione calcolate al check-out.

K-AI 32 Rome 5090 1676TOPS

Single-GPU Blackwell Workstation
1x RTX 5090 | EPYC Milan | 1 676 TOPS INT8

1 676
TOPS INT8
32 GB
VRAM GDDR7
fp8
native tensor
rack
ready

Single Blackwell GPU, 32 GB GDDR7, fp8 native — the sharpest single-card AI workstation Kentino builds.

A single-GPU, workstation-class AI server on the ROMED8-2T / EPYC Milan platform. One RTX 5090 delivers 32 GB of GDDR7 VRAM with native fp8 tensor math — the sweet spot for a developer box, a small-team inference endpoint, or an image/video generation workstation where one strong GPU beats two weaker ones. 4U rack form factor, but drop-in for a quiet office under-desk deployment.

Hardware

Component Detail
GPU 1x NVIDIA GeForce RTX 5090 32 GB GDDR7 (575 W, PCIe 5.0 x16, Blackwell)
VRAM pool 32 GB
CPU AMD EPYC 7643 Milan (48C/96T, 225 W, 128x PCIe 4.0 lanes)
Motherboard ASRock Rack ROMED8-2T (SP3, 7x PCIe 4.0 x16, 8x DDR4 ECC, 2x 10 GbE, IPMI)
System RAM 128 GB DDR4-2666 ECC RDIMM (2x 64 GB)
Boot / storage 1 TB NVMe M.2 (PCIe 4.0 x4)
Power supply Single 2 kW ATX PSU
Chassis 4U rack-mount, passive Gen4 x16 riser
Cooling SP3 tower cooler (Arctic Freezer 4U-M class), 3x 120 mm front intake + 1x 120 mm rear exhaust
Network Onboard dual 10 GbE (Intel X550) + IPMI

Power envelope

  • GPU draw: 1 x 575 W = 575 W
  • System total at full load: ~900 W
  • PSU total: 2 000 W (single 2 kW ATX) — 55 % headroom
  • Generous transient margin, silent operation at light load

Lane topology

PCIe Gen4 x16 at the GPU (ROMED8-2T is Gen4; 5090 is Gen5 silicon running Gen4 without bandwidth penalty for inference). 16 lanes direct from CPU root complex. No PCIe switch. No NVLink on GeForce 5090.

What you can run

With 32 GB of GDDR7 VRAM and native fp8 tensor math, this workstation handles open-weight LLMs up to 32B dense, image generation with FLUX.1, video generation, speech AI, and single-developer multi-model stacks.

LLMs — text / reasoning / coding

Chinese frontier

  • Qwen3-32B dense Q6_K — 32k context, flagship general reasoning (~40-55 tok/s single-stream on Blackwell fp8, published reference)
  • Qwen3-30B-A3B MoE at Q4_K_M with long KV headroom (Qwen3-Coder-30B-A3B agentic, 256k ctx)
  • QwQ-32B Q6 — reasoning preview
  • DeepSeek-R2 32B sparse MoE at Q4-Q6 — single-GPU reasoning that scores 92.7 % AIME-2025 (~45-60 tok/s single-stream on Blackwell fp8, published reference)
  • Qwen3.5-27B dense Q6 (Feb 2026 release)
  • Hunyuan-A13B at Q4_K_M (~28-30 GB) — 80B/13B MoE, 256k ctx, dual-mode reasoning
  • Seed-OSS-36B Q4_K_M — 512k native context for long-doc analysis

Western frontier

  • Llama 3.3 70B at Q2_K (~27 GB tight) or Q3_K (~34 GB with RAM spill) — usable for general chat
  • Mistral Small 3 / Magistral Small / Devstral Small 2 (24B dense) at Q6-Q8 or bf16
  • Gemma 3 27B multimodal at Q6 with 128k context
  • Phi-4 14B / Phi-4-reasoning bf16
  • Reka Flash 3 (21B Apache 2.0) at bf16
  • gpt-oss-20b native MXFP4 (~16 GB — fits with generous KV)

Vision-Language

Qwen3-VL-8B / -32B at Q4-Q6; Qwen3-VL-30B-A3B MoE; InternVL3.5-8B / -38B Q4; MiniCPM-V 2.6 / MiniCPM-o 2.6 (8B); Llama 3.2 11B Vision bf16; Pixtral 12B bf16 (24 GB — tight, use Q8); Gemma 3 12B / 27B multimodal; PaliGemma 2 (3/10B); Phi-4-multimodal 5.6B; Aya Vision 8B.

Image generation

FLUX.1 [dev] / [schnell] fp8 (~12 GB) native Blackwell speedup (~8-12 seconds per 1024x1024 image at 20 steps on Blackwell, published reference); FLUX.1 Kontext [dev] — in-context editing, character consistency; SD 3.5 Large (18 GB fp16 / 11 GB fp8); SDXL 1.0 10-12 GB fp16; HunyuanImage-2.1 NF4 (~14 GB); Kolors 2.0 fp8; AuraFlow v0.3 / OmniGen v1 / PixArt-Sigma.

Video generation

Wan 2.2 TI2V-5B at ~16 GB — 720p@24fps on a single 5090; Wan 2.1 T2V/I2V 14B at Q4-Q6 (~16 GB); HunyuanVideo 1.5 (8.3B) — 14 GB minimum; CogVideoX-5B / 5B-I2V int8 (~12 GB); LTX-Video 2B realtime-class 30 fps; Mochi-1 Q4 (~17-18 GB).

Audio / Speech / TTS

  • ASR: Whisper v3 large / turbo (~50x realtime on single GPU, published reference); NVIDIA Parakeet-TDT 1.1B; Canary 1B
  • TTS: CosyVoice 2.0 / Fun-CosyVoice 3.0; Kokoro 82M; Stable Audio Open
  • Realtime / S2S: Kyutai Moshi (7B) — only open realtime full-duplex voice; Step-Audio 2 mini / R1

Multi-model / multi-tenant

  • Resident stack for a single developer: Qwen3-32B Q6 (~20 GB) + FLUX.1 fp8 (~12 GB fits tight) on swap, or Qwen3-14B Q6 (~9 GB) + FLUX.1 + Whisper-turbo + Kokoro simultaneously (~20-24 GB pinned)
  • 2-4 concurrent users on 14-32B class LLMs via vLLM / SGLang
  • LoRA / QLoRA fine-tuning of 7-14B dense models

Target workloads

  • Developer workstation for a single AI engineer running mixed inference + image gen
  • Small-team coding-agent endpoint (Qwen3-Coder-30B-A3B) with 1-4 concurrent users
  • Content pipeline: FLUX.1 or SD 3.5 Large batch image gen + Wan 2.2 short-form video
  • On-premises ASR + TTS voice stack (Whisper + Kokoro + Moshi) for a branch office
  • Prosumer LLM + VLM research box — test Qwen3, Llama 3.3, Gemma 3, Phi-4 on real hardware

Published performance references

Published reference | single RTX 5090 comparable hardware

Benchmark Result
Llama 3.3 70B Q4_K_M llama.cpp decode ~18-22 tok/s with CPU KV offload
Qwen3-32B Q6 vLLM single-stream ~45-55 tok/s decode at fp8
FLUX.1 [dev] fp8 on Blackwell ~1.7-2.0 s per 1024x1024 image at 20 steps
Wan 2.2 TI2V-5B 720p clip ~3-4 minutes at fp16

Published reference points from comparable single-5090 hardware. Kentino measured numbers will be posted once gf-logic extends bench to single-5090.

Not ideal for

  • 70B dense models at Q6+ (32 GB is insufficient — use 2x 5090 for proper 64 GB pool)
  • Multi-user concurrent serving at scale (single tensor-parallel partition)
  • Frontier 100B+ MoE (GLM-4.5, Kimi K2, Mistral Large 3 — out of reach on a single consumer card)

Warranty and lead time

2 years
parts warranty
1 year
labor warranty
10-28 days
lead time

Build includes assembly, BIOS configuration, driver install, burn-in testing, and functional verification. Lead time depends on component availability, confirmed at order.

Recommended add-ons

  • NVIDIA ConnectX-5 100 GbE MCX555A-ECAT
  • Upgrade boot drive to 2 TB NVMe — or 4 TB
  • Upgrade RAM to 256 GB (4x 64 GB DDR4) for bigger KV cache / multi-model concurrent stacks
  • Rack PDU (C13/C19 metered) and 2 kVA online UPS
Visualizza dettagli completi