HomeDay 18

High-Bandwidth Memory

HBM, GDDR, LPDDR: memory technologies for AI chips. 2D/3D stacking, interfacing, thermal challenges.

Memory Bandwidth Hierarchy

Register file: ~30 KB, ~100 TB/s bandwidth (local to ALU) SRAM (on-chip): 1-100 MB, 5-50 TB/s HBM3 (stacked): 8-80 GB, 2-5 TB/s DDR5 (CPU memory):16-256 GB, 100 GB/s
TechCapacityBandwidthLatencyPowerUse
HBM38-16 GB2-5 TB/s50-100 ns5-10WGPU/TPU
GDDR6X8-24 GB500-700 GB/s100-200 ns8-15WGaming GPU
LPDDR5X2-8 GB100-200 GB/s200-300 ns1-2WMobile

HBM (High-Bandwidth Memory)

Used by: Google TPU, NVIDIA H100, Apple devices

How HBM Works

Standard DRAM: CPU → 64-bit parallel bus → Memory Controller → DRAM module Bandwidth: 100 GB/s (limited by bus width) HBM (3D stacking): CPU → Many small buses (in parallel) → Stacked DRAM layers Bandwidth: 2 TB/s (16-32 parallel buses × 125 GB/s each) Stacking: Layer 1: DRAM array Layer 2: DRAM array ... Layer 12: DRAM array All connected via Through-Silicon Vias (TSVs)

Design Tradeoffs

TPU v4 HBM Configuration

8 GB HBM3 per chip - 2000 GB/s bandwidth - 12 layers of DRAM (256 Mb each) - TSV connections between layers Cost implications: - Bare HBM die: $150-200 - HBM packaging (BGA): $400-600 - TPU v4 die (logic + HBM): $2000-3000

GDDR (Gaming-Optimized)

Cheaper than HBM, used in consumer GPUs and NVIDIA RTX cards.

LPDDR (Low-Power)

Mobile and edge devices (Apple A17, Qualcomm):

Day 19: Network-on-Chip (NoC) for multi-tile designs.