AI corner

AI Model VRAM Requirements Across Different GPU Configurations

by Michal Beno on Sep 05, 2024

hyper-realistic shot of the silicon die with added depth and a vibrant purple hue

AI Model VRAM Requirements Across Different GPU Configurations

This table provides an overview of approximate model sizes (in billions of parameters) that can be run on various VRAM configurations, along with examples of known models. Note that these are estimates and may vary based on specific implementations, architectures, and optimizations.

VRAM (GB)	FP32	FP16/BF16	INT8	INT4	INT2	Example Models
16	3-4B	6-8B	12-16B	24-32B	48-64B	GPT-2 (1.5B), BERT-Large (340M)
24	5-6B	10-12B	20-24B	40-48B	80-96B	GPT-J (6B), BLOOM-7B1
48	10-12B	20-24B	40-48B	80-96B	160-192B	T5-11B, BLOOM-7B1 (FP32)
80	18-20B	36-40B	72-80B	144-160B	288-320B	GPT-NeoX-20B, BLOOM-176B2
96	22-24B	44-48B	88-96B	176-192B	352-384B	BLOOM-176B2, Jurassic-1 Jumbo (178B)2
128	30-32B	60-64B	120-128B	240-256B	480-512B	GPT-3 175B2, PaLM 540B2
160	38-40B	76-80B	152-160B	304-320B	608-640B	PaLM 540B2, Megatron-Turing NLG 530B2
192	46-48B	92-96B	184-192B	368-384B	736-768B	BLOOM-176B (FP16)
256	62-64B	124-128B	248-256B	496-512B	992-1024B	GPT-3 175B (INT8), LLaMA 2 70B (FP32)
320	78-80B	156-160B	312-320B	624-640B	1248-1280B	Chinchilla 70B (FP32)
384	94-96B	188-192B	376-384B	752-768B	1504-1536B	PaLM 540B (INT8)
512	126-128B	252-256B	504-512B	1008-1024B	2016-2048B	GPT-3 175B (FP16), BLOOM-176B (FP32)

Notes:

Can run in full precision (FP32)
Requires quantization or other optimization techniques

Additional Considerations:

These estimates assume the entire VRAM is available for the model, which is often not the case in practice due to memory used by the framework, operating system, and other processes.
Model parallelism and other advanced techniques can allow running even larger models by distributing them across multiple GPUs.
Inference typically requires less memory than training, so larger models can often be run for inference on smaller VRAM configurations.
The exact sizes can vary based on model architecture, implementation details, and specific optimizations used.

Key Takeaways:

16-24GB VRAM: Suitable for most consumer-grade AI tasks and smaller research models.
48-96GB VRAM: Enables work with medium to large-scale models, often used in professional and research settings.
128-256GB VRAM: Allows running some of the largest publicly available models with various optimizations.
320-512GB VRAM: Provides capacity for the largest current models and future developments, often achieved through multi-GPU setups.

This table demonstrates the significant impact of quantization and other optimization techniques in enabling larger models to run on limited VRAM. As AI continues to advance, we can expect further innovations in model compression and memory-efficient architectures to push these boundaries even further.

And hold on ... It does not end with transformers...

Tags: AIInfrastructure, AIModelSize, AIPerformance, AIScaling, ComputeEfficiency, DeepLearningHardware, GPUConfigurations, GPUMemory, HPC, LargeLanguageModels, MachineLearningResources, ModelOptimization, ModelQuantization, NeuralNetworks, VRAMRequirements

AI Model VRAM Requirements Across Different GPU Configurations

AI Model VRAM Requirements Across Different GPU Configurations

Tags

Instagram

Main menu

Shopping Cart

Search our store

AI Model VRAM Requirements Across Different GPU Configurations

AI Model VRAM Requirements Across Different GPU Configurations

Related Articles

Tags

Instagram

Main menu

Confirm your age

Come back when you're older

Shopping Cart