AI Chip Design Day 28

Power Breakdown

Total Power = Dynamic Power + Static Power + Switching Power Dynamic Power = α × C × V² × f α = activity factor (% of gates switching) C = capacitance (load) V = voltage f = frequency Static Power = I_leakage × V Dominated by subthreshold leakage in modern tech nodes

Technique 1: Clock Gating

Don't clock logic that's not computing

Without gating: - All flip-flops toggle every cycle (even if data doesn't change) - Power ∝ f × number of flip-flops With gating: - AND gate checks: "is data incoming?" - If not, disable clock to those flip-flops - Saves ~40% dynamic power in typical designs

Technique 2: Voltage Scaling

Lower voltage = quadratic power reduction, but slower clock

Voltage	Power (relative)	Freq Possible	Use Case
1.0V (nominal)	1.0×	2.0 GHz	Peak performance
0.9V	0.73×	1.8 GHz	Typical
0.8V	0.57×	1.5 GHz	Low power
0.6V (near threshold)	0.30×	0.5 GHz	Battery mode

Apple A17 Approach

Dynamic voltage and frequency scaling (DVFS): - Running heavy inference: 1.0V @ 2.0 GHz → 2W - Running light task: 0.7V @ 0.5 GHz → 100 mW - Idle (power gates disabled): <1 mW Controller monitors workload, adjusts V/f 100× per second.

Technique 3: Precision Reduction

Lower-precision arithmetic uses smaller multipliers → less power

Precision	Multiplier Area	Power/MAC	Speed
INT8	64 gates	1.0 pJ	1.0 ns
INT16	256 gates	2.5 pJ	1.2 ns
INT32	1024 gates	5 pJ	1.5 ns

Technique 4: Power Gating

Turn off entire blocks when not needed

Systolic array: Can gate unused tiles if batch size < 256
Mobile NPU: Entire unit off when no inference needed
Cost: Retention registers + slow wake-up (10s of μs)

Real Example: Google TPU v4 Power

TPU v4 (200W sustained): - Compute (systolic): 100W (50%) - Memory (HBM): 50W (25%) - Interconnect (NoC): 30W (15%) - Control/misc: 20W (10%) Optimizations: - Clock gating: Save 20W (reduce activity from 70% to 50%) - Voltage scaling: Save 10W (lower compute voltage by 0.1V) - Sparsity: Skip zero multiplies (save 15W on sparse workloads) Result: 200W → 155W possible (22% reduction)

Day 29: Area & cost reduction: how to shrink silicon and manufacturing cost.