hyper-realistic shot of the silicon die with added depth and a vibrant purple hue

AI Model VRAM Requirements Across Different GPU Configurations

AI Model VRAM Requirements Across Different GPU Configurations

This table provides an overview of approximate model sizes (in billions of parameters) that can be run on various VRAM configurations, along with examples of known models. Note that these are estimates and may vary based on specific implementations, architectures, and optimizations.

VRAM (GB) FP32 FP16/BF16 INT8 INT4 INT2 Example Models
16 3-4B 6-8B 12-16B 24-32B 48-64B GPT-2 (1.5B), BERT-Large (340M)
24 5-6B 10-12B 20-24B 40-48B 80-96B GPT-J (6B), BLOOM-7B1
48 10-12B 20-24B 40-48B 80-96B 160-192B T5-11B, BLOOM-7B1 (FP32)
80 18-20B 36-40B 72-80B 144-160B 288-320B GPT-NeoX-20B, BLOOM-176B2
96 22-24B 44-48B 88-96B 176-192B 352-384B BLOOM-176B2, Jurassic-1 Jumbo (178B)2
128 30-32B 60-64B 120-128B 240-256B 480-512B GPT-3 175B2, PaLM 540B2
160 38-40B 76-80B 152-160B 304-320B 608-640B PaLM 540B2, Megatron-Turing NLG 530B2
192 46-48B 92-96B 184-192B 368-384B 736-768B BLOOM-176B (FP16)
256 62-64B 124-128B 248-256B 496-512B 992-1024B GPT-3 175B (INT8), LLaMA 2 70B (FP32)
320 78-80B 156-160B 312-320B 624-640B 1248-1280B Chinchilla 70B (FP32)
384 94-96B 188-192B 376-384B 752-768B 1504-1536B PaLM 540B (INT8)
512 126-128B 252-256B 504-512B 1008-1024B 2016-2048B GPT-3 175B (FP16), BLOOM-176B (FP32)

Notes:

  1. Can run in full precision (FP32)
  2. Requires quantization or other optimization techniques

Additional Considerations:

  • These estimates assume the entire VRAM is available for the model, which is often not the case in practice due to memory used by the framework, operating system, and other processes.
  • Model parallelism and other advanced techniques can allow running even larger models by distributing them across multiple GPUs.
  • Inference typically requires less memory than training, so larger models can often be run for inference on smaller VRAM configurations.
  • The exact sizes can vary based on model architecture, implementation details, and specific optimizations used.

Key Takeaways:

  1. 16-24GB VRAM: Suitable for most consumer-grade AI tasks and smaller research models.
  2. 48-96GB VRAM: Enables work with medium to large-scale models, often used in professional and research settings.
  3. 128-256GB VRAM: Allows running some of the largest publicly available models with various optimizations.
  4. 320-512GB VRAM: Provides capacity for the largest current models and future developments, often achieved through multi-GPU setups.

This table demonstrates the significant impact of quantization and other optimization techniques in enabling larger models to run on limited VRAM. As AI continues to advance, we can expect further innovations in model compression and memory-efficient architectures to push these boundaries even further.

And hold on ... It does not end with transformers...

Takaisin blogiin