Cost Breakdown: H100 Example
H100 cost structure ($40,000 retail):
- NVIDIA R&D: $2,000 (amortized over 100k units)
- Die cost: $3,000 (manufacturing)
├─ Wafer: $12,000
├─ Good dies per wafer: 4 (80mm² die, 300mm wafer)
├─ Yield: 60% (240 dies scrap)
└─ Cost per good die: $3,000
- HBM stacking: $800
- Packaging: $400
- PCB, interconnect: $300
- NVIDIA margin: $15,000+
Area Optimization Strategies
1. Systolic Array Sizing
| Array Size | Die Area | Peak TFLOPS | TFLOPS/mm² |
|---|---|---|---|
| 64×64 | 5 mm² | 32 | 6.4 |
| 128×128 | 20 mm² | 131 | 6.6 |
| 256×256 | 80 mm² | 520 | 6.5 |
| 512×512 | 320 mm² | 2,080 | 6.5 |
Insight: Efficiency is constant (roughly), but absolute cost is not. 512×512 has 4× the area but 4× the cost.
2. Memory Hierarchy
SRAM is expensive (area-wise):
Option A: Large SRAM (24 MB)
- Faster, fewer HBM accesses
- Area: ~10 mm² (large!)
- Power: Lower (fewer HBM I/O)
Option B: Small SRAM (4 MB)
- Slower, more HBM accesses
- Area: ~2 mm²
- Power: Higher (HBM bandwidth-limited)
Trade-off: 8 mm² area savings = $800 cost savings
But 50% more HBM I/O = 20W more power = $5k in power supply cost
Economics favor Option A (in datacenter)
Yield and Manufacturing Cost
Wafer Yield Formula
Good dies per wafer = π × (R / d) × (d / 2A)^2 × Y
R = wafer radius (150 mm for 300mm wafer)
d = defect density (defects/cm²)
A = die area (mm²)
Y = yield coefficient (0.6-0.8)
Example:
- Die area: 80 mm² (TPU v4)
- Defect density: 0.1 defects/cm² (5nm)
- Yield: 60%
- Good dies: 4 per wafer
Cost per good die = $12,000 wafer / 4 dies = $3,000
Cost per TFLOPS
| Chip | Die Area | Cost | Peak TFLOPS | Cost/TFLOPS |
|---|---|---|---|---|
| A17 ANE | 1.5 mm² | $100 | 17 | $5.88 |
| TPU v4 | 80 mm² | $3,000 | 430 | $6.98 |
| H100 | 815 mm² | $5,000 | 1,450 | $3.45 |
Tiling for Cost Reduction
Instead of one large die, use multiple smaller dies:
Option A: Single 256×256 systolic (TPU-style)
- Die area: 80 mm²
- Yield: 60%
- Cost: $3,000
Option B: Four 128×128 systolic (tiled)
- Die area: 4 × 20 = 80 mm²
- Yield: 60% per die = 84% when only 1 is needed
- Cost: $2,400 (if smaller die is cheaper to fab)
- Benefit: One bad die doesn't kill product
Production reality: Google uses multi-chip TPU Pods (not monolithic)
Day 30: RTL to Silicon: the entire journey from HDL to tape-out.