1. Power Consumption Basics
Total Power = Dynamic Power + Static Power
Dynamic Power = C × V² × f
C = capacitance (depends on transistor count)
V = supply voltage
f = clock frequency
Static Power = Leakage Current × V
Leakage increases exponentially with temperature
Example (5nm process):
Dynamic: 80% of total power (compute)
Static: 20% of total power (always on, even idle)
Implication: Lower voltage and frequency = exponential power savings
2. Power by Component
| Component | Power % | Optimization |
|---|---|---|
| Compute (MAC units) | 40-50% | Use lower precision (INT8 vs FP32) |
| Memory (SRAM access) | 20-30% | Reduce memory bandwidth |
| Interconnect (data movement) | 15-20% | Local compute, cache weights |
| Control & other | 10-15% | Minimal overhead |
3. Power Gating & DVFS
Power Gating: Shut off unused units completely
- Idle cores → turn off voltage (0W)
- Helps with heterogeneous workloads
- Wake-up latency trade-off
DVFS (Dynamic Voltage and Frequency Scaling): Adjust voltage/frequency to workload
- High throughput task → high voltage/frequency
- Low latency task → lower frequency (still meets deadline)
- Idle → minimum frequency (near 0W)
4. Thermal Management
Challenge: High power density (watts/mm²) creates heat hotspots
Solutions:
- Heat sinks: Passive cooling (data center standard)
- Liquid cooling: Active cooling (high performance chips)
- Thermal throttling: Reduce frequency if temp exceeds limit
- Placement awareness: Distribute power evenly across die
5. Real-World Power Examples
| Device | Peak Power | Efficiency (pJ/op) | Use Case |
|---|---|---|---|
| Apple Neural Engine | 2-5W | 10-20 pJ | Mobile (battery) |
| Google TPU v4 | 400W | 3-8 pJ | Datacenter |
| NVIDIA H100 | 700W | 2-5 pJ | Datacenter (high perf) |
| Mobile GPU | 10W | 30-50 pJ | Smartphones |
6. Mobile vs Datacenter Power Trade-offs
Mobile (Apple Neural Engine):
- Power budget: < 5W (battery life)
- Specialization: CNN inference only
- Efficiency: 10-20 pJ/op (excellent)
- Performance: Lower peak (suitable for single inference)
Datacenter (Google TPU):
- Power budget: Unlimited (plug in)
- Specialization: Matrix multiply
- Efficiency: 3-8 pJ/op (even better through scale)
- Performance: Higher peak (training/batch inference)
7. Power Design Checklist
- ✅ Define power budget: Mobile < 5W, datacenter 200-700W
- ✅ Estimate dynamic power: Use C × V² × f analysis
- ✅ Plan DVFS: Multiple voltage/frequency modes
- ✅ Consider power gating: For heterogeneous designs
- ✅ Thermal analysis: Peak power dissipation, hotspot locations
- ✅ Select cooling solution: Passive (datacenter), active (high density)
- ✅ Measure efficiency: Target pJ/operation (industry benchmark)
Next (Day 13): Latency, throughput, and design tradeoffs.