AI Chip Day 12 Enhanced — Power & Thermal Management

1. Power Consumption Basics

Total Power = Dynamic Power + Static Power Dynamic Power = C × V² × f C = capacitance (depends on transistor count) V = supply voltage f = clock frequency Static Power = Leakage Current × V Leakage increases exponentially with temperature Example (5nm process): Dynamic: 80% of total power (compute) Static: 20% of total power (always on, even idle)

Implication: Lower voltage and frequency = exponential power savings

2. Power by Component

Component	Power %	Optimization
Compute (MAC units)	40-50%	Use lower precision (INT8 vs FP32)
Memory (SRAM access)	20-30%	Reduce memory bandwidth
Interconnect (data movement)	15-20%	Local compute, cache weights
Control & other	10-15%	Minimal overhead

3. Power Gating & DVFS

Power Gating: Shut off unused units completely

Idle cores → turn off voltage (0W)
Helps with heterogeneous workloads
Wake-up latency trade-off

DVFS (Dynamic Voltage and Frequency Scaling): Adjust voltage/frequency to workload

High throughput task → high voltage/frequency
Low latency task → lower frequency (still meets deadline)
Idle → minimum frequency (near 0W)

4. Thermal Management

Challenge: High power density (watts/mm²) creates heat hotspots

Solutions:

Heat sinks: Passive cooling (data center standard)
Liquid cooling: Active cooling (high performance chips)
Thermal throttling: Reduce frequency if temp exceeds limit
Placement awareness: Distribute power evenly across die

5. Real-World Power Examples

Device	Peak Power	Efficiency (pJ/op)	Use Case
Apple Neural Engine	2-5W	10-20 pJ	Mobile (battery)
Google TPU v4	400W	3-8 pJ	Datacenter
NVIDIA H100	700W	2-5 pJ	Datacenter (high perf)
Mobile GPU	10W	30-50 pJ	Smartphones

6. Mobile vs Datacenter Power Trade-offs

Mobile (Apple Neural Engine):

Power budget: < 5W (battery life)
Specialization: CNN inference only
Efficiency: 10-20 pJ/op (excellent)
Performance: Lower peak (suitable for single inference)

Datacenter (Google TPU):

Power budget: Unlimited (plug in)
Specialization: Matrix multiply
Efficiency: 3-8 pJ/op (even better through scale)
Performance: Higher peak (training/batch inference)

7. Power Design Checklist

✅ Define power budget: Mobile < 5W, datacenter 200-700W
✅ Estimate dynamic power: Use C × V² × f analysis
✅ Plan DVFS: Multiple voltage/frequency modes
✅ Consider power gating: For heterogeneous designs
✅ Thermal analysis: Peak power dissipation, hotspot locations
✅ Select cooling solution: Passive (datacenter), active (high density)
✅ Measure efficiency: Target pJ/operation (industry benchmark)

Next (Day 13): Latency, throughput, and design tradeoffs.

Power & Thermal Design