AI corner

Case Study: 4x RTX 4090 AI Workstation

This article documents a complete build commissioned for a research customer who needed a rack-mountable, 24/7-capable LLM inference workstation with enough VRAM to host 70B-class models without cloud dependency. Everything...

Case Study: 4x RTX 4090 AI Workstation

This article documents a complete build commissioned for a research customer who needed a rack-mountable, 24/7-capable LLM inference workstation with enough VRAM to host 70B-class models without cloud dependency. Everything...

TurboQuant: Reading the KV Cache Compression Br...

Reading time: 10 min | How Google's 3-bit compression makes long-context LLMs cheaper, and what it tells us about the next 18 months of AI inference There is a quiet...

TurboQuant: Reading the KV Cache Compression Br...

Reading time: 10 min | How Google's 3-bit compression makes long-context LLMs cheaper, and what it tells us about the next 18 months of AI inference There is a quiet...

AI Model VRAM Requirements Across Different GPU...

AI Model VRAM Requirements Across Different GPU Configurations This table provides an overview of approximate model sizes (in billions of parameters) that can be run on various VRAM configurations, along...

AI Model VRAM Requirements Across Different GPU...

AI Model VRAM Requirements Across Different GPU Configurations This table provides an overview of approximate model sizes (in billions of parameters) that can be run on various VRAM configurations, along...