Перейти к информации о продукте
1 из 7

Kentino s.r.o.

K-AI 48 Rome L4 484TOPS — 2x NVIDIA L4 Passive Edge AI Server

K-AI 48 Rome L4 484TOPS — 2x NVIDIA L4 Passive Edge AI Server

Обычная цена €11.374,00 EUR
Обычная цена Цена со скидкой €11.374,00 EUR
Распродажа Продано
Налоги включены. Стоимость доставки рассчитывается при оформлении заказа.

K-AI 48 Rome L4 484TOPS

Silent 2x L4 Passive Edge Server
48 GB ECC VRAM | EPYC Milan | 484 TOPS INT8

484
TOPS INT8
48 GB
ECC VRAM
144 W
GPU total
24/7
datacenter

Silent 2x L4 passive inference box — datacenter-grade warranty path, 72 W per card, 48 GB ECC VRAM for always-on edge deployment.

A 2-GPU edge inference server built around passive NVIDIA L4 cards — the datacenter-class silent option in the Kentino lineup. 48 GB total ECC VRAM, 144 W total GPU draw, single-slot card footprint, and airflow driven entirely by the chassis. For branch offices, broadcast facilities, always-on transcription, and any deployment where acoustic profile and a datacenter warranty path matter more than raw tensor throughput.

Hardware

Component Detail
GPUs 2x NVIDIA L4 24 GB GDDR6 passive (72 W, PCIe 4.0 x16, Ada Lovelace, ECC)
VRAM pool 48 GB ECC
CPU AMD EPYC 7643 Milan (48C/96T, 225 W, 128x PCIe 4.0 lanes)
Motherboard ASRock Rack ROMED8-2T (SP3, 7x PCIe 4.0 x16, 8x DDR4 ECC, 2x 10 GbE, IPMI)
System RAM 128 GB DDR4-2666 ECC RDIMM (2x 64 GB)
Boot / storage 1 TB NVMe M.2 (PCIe 4.0 x4)
Power supply Single 2 kW ATX PSU
Chassis 4U rack-mount, passive Gen4 x16 risers
Cooling SP3 tower cooler, 3x 120 mm front intake + 1x 120 mm rear exhaust (low-RPM PWM)
Network Onboard dual 10 GbE (Intel X550) + IPMI

Power envelope

  • GPU draw: 2 x 72 W = 144 W
  • System total at full load: ~469 W
  • PSU total: 2 000 W — 76.55 % headroom
  • Drives fans at idle-low RPM (~35 dBA idle, <45 dBA sustained inference)

Lane topology

PCIe Gen4 x16 at both GPUs. L4 is native Gen4 x16; ROMED8-2T fans out 2x16 directly from CPU. No switch, no NVLink. 55-65 C GPU temperature sustained — passive cards rely entirely on chassis airflow.

What you can run

With 48 GB of ECC VRAM across 2 passive L4 cards, this server handles always-on LLM inference, 24/7 ASR + TTS pipelines, VLM document processing, and edge deployments where silence and datacenter warranty matter.

LLMs — text / reasoning / coding

Chinese frontier

  • Qwen3-32B dense Q6 with 32k ctx (~15-20 tok/s single-stream on L4, published reference)
  • Qwen3-30B-A3B / Qwen3-Coder-30B-A3B Q4-Q6 (MoE, 256k ctx)
  • QwQ-32B Q6; DeepSeek-R2 32B sparse MoE Q4-Q6 (~18-24 tok/s single-stream at Q4 on L4, published reference)
  • Hunyuan-A13B Q6 or fp8 (~48 GB) — 80B/13B MoE, 256k ctx
  • Seed-OSS-36B Q4-Q6 — 512k native ctx
  • ERNIE-4.5-47B-A3B Q4-Q6 (~28-42 GB)

Western frontier

  • Llama 3.3 70B Q4_K_M (~43 GB) tensor-parallel 2-way (~8-12 tok/s single-stream on 2x L4, published reference)
  • Mistral Small 3 / Magistral / Devstral Small 2 (24B) bf16
  • Gemma 3 27B multimodal bf16
  • Phi-4 14B / Phi-4-reasoning bf16
  • Nemotron-Super 49B Q4 (~28 GB)
  • OLMo 2 32B / OLMo 3.1-32B-Think — fully open reasoning research

Vision-Language

Qwen3-VL-8B / 32B Q4-Q6; InternVL3.5-38B Q4; Pixtral 12B bf16 (24 GB); Llama 3.2 11B Vision bf16; Gemma 3 12B / 27B multimodal; MiniCPM-V 2.6 / MiniCPM-o 2.6; Aya Vision 8B / 32B for 23-language VLM.

Image generation

L4 is inference-tuned — usable for steady-state image pipelines, not batch generation: FLUX.1 [dev] fp8 / Q4 — single image in 8-12 s; SD 3.5 Large fp8 / SDXL 1.0 / SD 3.5 Medium; HunyuanImage-2.1 NF4 (~14 GB); Kolors 2.0 fp8.

Video generation

Not recommended for new video projects on L4 — prefer a 4090/5090 build. For light T2V pipelines: Wan 2.2 TI2V-5B at bf16 — 5 s 720p in ~6-10 minutes; HunyuanVideo 1.5 (8.3B) Wan2GP optimization path.

Audio / Speech / TTS

The L4's real strength — 24/7 ASR + TTS + realtime voice stacks.

  • ASR: Whisper v3 large / turbo (~30x realtime on L4, published reference); NVIDIA Parakeet-TDT 1.1B; Canary 1B
  • TTS: CosyVoice 2.0 / Fun-CosyVoice 3.0; Kokoro 82M; Stable Audio Open
  • Realtime / S2S: Kyutai Moshi (7B, 200 ms latency full-duplex); Step-Audio 2 mini / R1
  • Translation: Meta SeamlessM4T v2 (~100 languages)

Multi-model / multi-tenant

  • Whisper v3 + Kokoro + Moshi + Qwen3-14B Q6 all resident on card 1 (~18-20 GB); card 2 reserved for a second tenant or a VLM
  • 8-16 concurrent ASR sessions on a single L4 at Whisper-turbo real-time
  • RAG endpoint: Qwen3-14B / Llama 3.1 8B (~48-72 tok/s single-stream on L4, published reference) + BGE-M3 embeddings + reranker

Target workloads

  • Branch office or broadcast facility silent inference box
  • Always-on ASR + translation pipeline (call centers, lecture transcription, media captioning)
  • Edge RAG endpoint over corporate documents with datacenter warranty path
  • 24/7 multimodal assistant (Qwen3-VL-8B + MiniCPM-o 2.6) for a small office
  • Development staging box for datacenter-class deployments — same L4 silicon as hyperscale edge

Published performance references

Published reference | 2x NVIDIA L4 comparable hardware

Benchmark Result
Llama 3.1 8B Q4_K_M llama.cpp decode ~30-40 tok/s single-stream
Qwen3-14B Q6 vLLM decode ~20-28 tok/s
Whisper v3 large realtime factor ~15-20x per L4
Parakeet-TDT 1.1B English ASR ~40-60x real-time
Moshi 7B full-duplex voice 200 ms latency, fits on single L4

Published, not measured on Kentino hardware.

Not ideal for

  • 70B dense at Q6+ (even 48 GB pool is tight — use 4x4090 or 2x5090)
  • Image / video generation batch work at scale (L4 tensor throughput is inference-tuned)
  • LoRA / fine-tuning workflows — use 4090/5090 builds instead

Warranty and lead time

2 years
parts warranty
1 year
labor warranty
10-28 days
lead time

L4 carries NVIDIA datacenter warranty path — meaningful advantage over consumer cards for 24/7 SLA deployment. Build includes assembly, BIOS configuration, driver install, burn-in testing, and functional verification.

Recommended add-ons

  • Upgrade to K-AI 96 Rome L4 968TOPS (4x L4, 96 GB pool) for doubled throughput
  • Upgrade boot drive to 2 TB NVMe
  • Upgrade RAM to 256 GB (4x 64 GB) for multi-model concurrent serving
  • Rack PDU + 2 kVA online UPS for branch deployment
Просмотреть всю информацию