HomeDay 29

Area & Cost Reduction

Die size, yields, packaging, NRE: the economics of silicon. Trade-offs between performance, power, and cost.

Cost Breakdown: H100 Example

H100 cost structure ($40,000 retail): - NVIDIA R&D: $2,000 (amortized over 100k units) - Die cost: $3,000 (manufacturing) ├─ Wafer: $12,000 ├─ Good dies per wafer: 4 (80mm² die, 300mm wafer) ├─ Yield: 60% (240 dies scrap) └─ Cost per good die: $3,000 - HBM stacking: $800 - Packaging: $400 - PCB, interconnect: $300 - NVIDIA margin: $15,000+

Area Optimization Strategies

1. Systolic Array Sizing

Array SizeDie AreaPeak TFLOPSTFLOPS/mm²
64×645 mm²326.4
128×12820 mm²1316.6
256×25680 mm²5206.5
512×512320 mm²2,0806.5

Insight: Efficiency is constant (roughly), but absolute cost is not. 512×512 has 4× the area but 4× the cost.

2. Memory Hierarchy

SRAM is expensive (area-wise):

Option A: Large SRAM (24 MB) - Faster, fewer HBM accesses - Area: ~10 mm² (large!) - Power: Lower (fewer HBM I/O) Option B: Small SRAM (4 MB) - Slower, more HBM accesses - Area: ~2 mm² - Power: Higher (HBM bandwidth-limited) Trade-off: 8 mm² area savings = $800 cost savings But 50% more HBM I/O = 20W more power = $5k in power supply cost Economics favor Option A (in datacenter)

Yield and Manufacturing Cost

Wafer Yield Formula

Good dies per wafer = π × (R / d) × (d / 2A)^2 × Y R = wafer radius (150 mm for 300mm wafer) d = defect density (defects/cm²) A = die area (mm²) Y = yield coefficient (0.6-0.8) Example: - Die area: 80 mm² (TPU v4) - Defect density: 0.1 defects/cm² (5nm) - Yield: 60% - Good dies: 4 per wafer Cost per good die = $12,000 wafer / 4 dies = $3,000

Cost per TFLOPS

ChipDie AreaCostPeak TFLOPSCost/TFLOPS
A17 ANE1.5 mm²$10017$5.88
TPU v480 mm²$3,000430$6.98
H100815 mm²$5,0001,450$3.45

Tiling for Cost Reduction

Instead of one large die, use multiple smaller dies:

Option A: Single 256×256 systolic (TPU-style) - Die area: 80 mm² - Yield: 60% - Cost: $3,000 Option B: Four 128×128 systolic (tiled) - Die area: 4 × 20 = 80 mm² - Yield: 60% per die = 84% when only 1 is needed - Cost: $2,400 (if smaller die is cheaper to fab) - Benefit: One bad die doesn't kill product Production reality: Google uses multi-chip TPU Pods (not monolithic)

Day 30: RTL to Silicon: the entire journey from HDL to tape-out.