1. Power Consumption Sources in VLSI
Every transistor switching, leaking, or conducting contributes to chip power. Modern VLSI designers spend enormous effort on power optimization because:
- Data center chips drive electricity costs of millions of dollars/year
- Mobile chips must achieve days of battery life from a small cell
- IoT devices may run years from coin cells
- Thermal limits cap performance if power density is too high
Total Power = Dynamic Power + Short-Circuit Power + Static Leakage
1. Dynamic (Switching) Power:
P_dynamic = α × C × V_DD² × f
α = activity factor (fraction of clock cycles where node switches)
C = total capacitance (load cap + wire cap)
V = supply voltage
f = clock frequency
Example: 5nm chip core
α = 0.15 (15% average switching activity)
C = 2pF total (all gates + wires)
V = 0.8V
f = 3GHz
P = 0.15 × 2e-12 × (0.8)² × 3e9 = 576mW
2. Short-Circuit Power:
P_sc = I_sc × V_DD
I_sc = current when both PMOS and NMOS briefly conduct during transition
Typical: 5–15% of dynamic power (can be minimized with sharp input slews)
3. Static (Leakage) Power:
P_leak = I_leak × V_DD
I_leak = sum of all transistor subthreshold currents (even when OFF)
In 5nm: leakage can be 30–50% of total power at idle!
Dominant in advanced nodes due to thin gate oxide and low Vt
2. Dynamic Power Reduction Techniques
2.1 Clock Gating
The single highest-impact power optimization in most designs. When a group of flip-flops doesn't need to update, gating their clock stops all switching power — both the flip-flop power AND the combinational logic power feeding them.
Clock Gating — Integrated Clock Gate (ICG) Cell
WITHOUT Clock Gating (always toggling):
CLK ─────┬──────────────────────────────────► FF1 (always switches)
│
└──────────────────────────────────► FF2 (always switches)
Every cycle: both FFs toggle → waste power if data unchanged
WITH Clock Gating (ICG cell):
CLK ─────────────────┐
AND ──────────────────► FF1 (only when EN=1)
EN ─────[LATCH]─────┘ ←─ ICG cell
FF2 (only when EN=1)
ICG = Integrated Clock Gating cell (latch + AND, glitch-free)
Power savings calculation:
Clock gating ratio = fraction of cycles when EN=0
If 70% of cycles gate OFF: Power saved = 0.70 × P_dynamic(FFs)
For a 500mW design where FFs = 30%:
FF power = 150mW
Gated savings = 70% × 150mW = 105mW ← significant!
Hierarchical gating (coarse + fine):
Block-level gate: turns off entire CPU core when idle
Register-level gate: turns off specific register banks
Cell-level gate: finest granularity, max savings but area overhead
2.2 Operand Isolation
When a functional block output is ignored (e.g., multiplier result not used this cycle), its input operands can be frozen to prevent unnecessary switching through the entire datapath.
Operand Isolation — Multiplier Example
Without isolation (WASTED power every cycle):
A[31:0] ─────────────────────────────────► MULT (64-bit multiply)
B[31:0] ─────────────────────────────────► MULT ↑ switches even
when result unused
With operand isolation (save ~40% mult power):
A[31:0] ─────[AND_32bit]─────────────────► MULT
B[31:0] ─────[AND_32bit]─────────────────► MULT
│
EN ───────┘ (EN=1 only when result needed)
When EN=0: all operand inputs clamped to 0
Multiplier internal nodes don't toggle
Power saved: ~40% of multiplier's dynamic power
3. Voltage Scaling — DVFS
Dynamic Voltage and Frequency Scaling (DVFS) simultaneously reduces voltage and frequency when maximum performance is not needed. It is the most powerful single technique for power reduction:
Power Scaling With DVFS:
P_dynamic ∝ V_DD² × f
If V drops from 1.0V to 0.7V and f drops proportionally:
Power ratio = (0.7/1.0)² × (0.7/1.0) = 0.49 × 0.7 = 0.343
Power savings = 65.7% (enormous!)
Apple A17 DVFS operating points (efficiency core example):
High performance: 2.4GHz @ 1.05V → P = 1.0 (normalized)
Active balanced: 1.8GHz @ 0.90V → P = 0.56
Low power: 1.2GHz @ 0.80V → P = 0.30
Ultra low: 600MHz @ 0.70V → P = 0.13 (7.7× savings!)
DVFS implementation requires:
1. On-chip voltage regulator (PMIC on separate die or co-packaged)
2. OS/firmware voltage table (each OPP = Operating Performance Point)
3. Response time: voltage change takes 5–20µs (fast enough for workload)
4. Guardband: extra voltage margin to cover transition uncertainty
DVFS Voltage & Frequency Over Time (Mobile Workload)
Frequency (GHz)
3.0 │ ████████████
2.5 │ ██ ██████
2.0 │ ██ ████
1.5 │ ██ ████████████████
1.0 │ █ █████████
0.5 │ ████ ████
0.0 └──────────────────────────────────────────────────────────────► Time (s)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Voltage (V)
1.1 │ ████████████
1.0 │ ██ ██████
0.9 │ ██ ████
0.8 │ ██ ████████████████
0.7 │ ████ ─ ─ ─ ─ ─ ─ ─ ─ ─████
0.6 └──────────────────────────────────────────────────────────────► Time
Power (W)
8W │ ████████████
6W │ ██ ██████
4W │ ██ ████
2W │ ██ ████████████████
1W │ ████████████
0W └──────────────────────────────────────────────────────────────► Time
Scenario: Mobile app launch → active use → background → idle
Peak: 8W (launch burst), Idle: 0.8W
Average power: ~2.5W → 7+ hours on 20Wh battery
4. Multi-Voltage Power Domains
Different chip blocks need different voltages. Running everything at maximum voltage wastes power. Multi-voltage design assigns the minimum necessary voltage to each block.
| Power Domain | Block Type | Voltage (typical) | Frequency | Power Savings vs Max V |
| VDD_CPU_HIGH | Performance CPU cores | 1.0–1.1V | 3.0–4.0GHz | Baseline |
| VDD_CPU_LOW | Efficiency CPU cores | 0.8–0.9V | 1.5–2.5GHz | 35–50% |
| VDD_GPU | GPU shaders | 0.85–0.95V | 1.5–2.0GHz | 25–40% |
| VDD_MEM | L2/L3 cache SRAM | 0.75–0.85V | 2.0GHz | 40–55% |
| VDD_IO | I/O pads, PHY blocks | 1.8V | N/A | Required for interface |
| VDD_ALWAYS_ON | Power controller, RTC | 0.7–0.8V | 32kHz–100MHz | 70%+ (lowest V) |
Level Shifter — Required at Voltage Domain Boundary
VDD_HIGH (1.1V) │ VDD_LOW (0.8V)
│
Signal A ─────────────────────────│────────────────────► FF_B
(Logic 1 = 1.1V) │ (Needs logic 1 = 0.8V)
│
Without level shifter: FAIL! 1.1V driving 0.8V input
might overstress gate oxide
or violate I/O spec
WITH level shifter:
Signal A ─────────────────────[LS]─────────────────────► FF_B
(1.1V swing) │ (0.8V swing)
Level
Shifter
cell from
cell library
Level shifter types:
Low-to-High (LH): 0.8V → 1.1V (signal crossing to higher domain)
High-to-Low (HL): 1.1V → 0.8V (signal crossing to lower domain)
Required at EVERY net crossing domain boundary!
Area overhead: ~20–30 µm² per level shifter instance
Timing overhead: 20–50ps (adds to path delay at domain crossing)
5. Power Gating — Retention Registers
Power gating cuts VDD entirely to idle blocks. It eliminates even leakage power — the dominant concern in advanced nodes. But state must be preserved and restored correctly.
Power Gating Implementation:
Standard block with power gating:
VDD_MAIN ─── HEADER_SWITCH ─── VDD_VIRTUAL ─── Logic
HEADER_SWITCH: Large PMOS transistor controlled by power controller
ON (normal): PMOS conducting, VDD_VIRTUAL ≈ VDD_MAIN - V_drop
OFF (sleep): PMOS cut off, entire block loses power
State retention options:
Option 1: Save state to memory before power down, restore on wake
Wake latency: 1–10µs (write all registers to SRAM)
Use case: deep sleep (seconds–minutes off)
Option 2: Retention flip-flops (shadow latch)
Retention FF = normal FF + always-on shadow latch
Sleep: data saved to shadow latch (tiny always-on power)
Wake: shadow latch restores main FF in 1 clock cycle
Wake latency: ~1ns ← fast!
Use case: fine-grained power gating (microseconds off)
Leakage savings from power gating (7nm):
Active: 500µW leakage per 100K gates
Gated: 5µW (header + retention overhead only)
Savings: 99% leakage reduction on gated block!
6. Thermal Management
Power dissipated as heat. Heat causes timing slowdown, reliability failures, and in extreme cases thermal runaway. Physical design must ensure no thermal hotspot exceeds the junction temperature limit.
Thermal Heatmap (Die Top-Down View)
Temperature distribution (°C) at 5W total power, 25°C ambient:
CPU Core 0 CPU Core 1 GPU Shader Array
┌─────────┐┌─────────┐┌───────────────────────┐
Top of die: │ 105°C ││ 103°C ││ 108°C │ ← HOT!
│ Core 0 ││ Core 1 ││ (high activity) │
└─────────┘└─────────┘└───────────────────────┘
▲ ▲ ▲
│ Heat flow through Si → package → ambient
┌─────────────────────────────────────────────┐
Memory (L3): │ 85–90°C │
└─────────────────────────────────────────────┘
┌──────────────┐ ┌────────────────────────────┐
I/O blocks: │ 75°C │ │ 70°C │
└──────────────┘ └────────────────────────────┘
Thermal limits:
Junction temperature limit: 125°C (commercial) / 150°C (automotive)
Thermal resistance die→ambient (θ_JA): ~3–15 °C/W depending on package
Thermal budget: T_junction = T_ambient + P_chip × θ_JA
Example: 85°C + 5W × 8°C/W = 125°C ← right at limit!
Hotspot mitigation strategies:
1. Spread high-activity blocks across die (don't cluster)
2. Reduce utilization near hot blocks (more whitespace → routing for heat)
3. Use thermal vias (metal-filled vias conduct heat vertically)
4. Co-design with package (heatspreader directly over hotspot)
5. Dynamic thermal management: throttle core if T > 110°C
Thermal-Aware Placement
EDA tools can perform thermal-aware placement by modeling power density per tile and adjusting cell placement to distribute heat more evenly.
| Block | Power Density | Placement Strategy | Thermal Impact |
| CPU cores (active) | 0.5 W/mm² | Spread across die quadrants | +20°C over baseline |
| GPU shaders | 0.4 W/mm² | Spread in array, not clustered | +18°C |
| L2 cache | 0.15 W/mm² | Wrap around CPU cores | +6°C |
| L3 cache | 0.08 W/mm² | Die periphery (cooler zone) | +3°C |
| I/O PHYs | 0.05 W/mm² | Die edge (near thermal path) | +2°C |
7. Power Analysis Tools and Flow
Power Sign-Off Flow:
1. Activity generation:
Run RTL simulation with representative workloads
Capture switching activity (VCD file — value change dump)
2. Gate-level power analysis:
Apply VCD to gate netlist
Power tools: Synopsys PrimePower, Cadence Joules, Mentor PowerPro
3. Components analyzed:
a) Cell internal power (from liberty models: rise/fall energy)
b) Net switching power (C × V² × f × α per net)
c) Leakage power (Iddq from liberty models, temperature-dependent)
4. Power report example:
──────────────────────────────────────────────────────
Block Dynamic(mW) Leakage(mW) %Total
──────────────────────────────────────────────────────
CPU_P-core 1,200 280 38%
CPU_E-core 400 80 12%
GPU 800 200 25%
Memory Ctrl 150 40 5%
L2 Cache 80 30 3%
L3 Cache 120 50 4%
I/O & PHY 200 20 6%
Misc 80 20 2%
──────────────────────────────────────────────────────
TOTAL 3,030 720 100%
Peak = 3,750mW Average (mix) = 2,100mW
8. Production Power Sign-Off Checklist
Power Optimization Checklist
- ✅ Power budget defined: Total chip TDP and per-block budget allocated
- ✅ Clock gating coverage ≥ 85%: Most register groups have ICG cells
- ✅ Operand isolation: All major datapaths have input isolation when idle
- ✅ DVFS OPP table complete: All voltage/frequency operating points validated
- ✅ Multi-voltage verified: Level shifters at all domain crossings, isolation cells inserted
- ✅ Power gating implemented: Retention FFs or SRAM save/restore for all gated domains
- ✅ PDN IR drop analysis: Static + dynamic IR drop within 10% of supply
- ✅ Thermal hotspot analysis: No zone exceeds 125°C at max workload
- ✅ Thermal-aware placement: High-power blocks distributed across die
- ✅ VCD-based power analysis: Representative workload activity captured
- ✅ Peak power estimation: Package and PCB rated for peak current
- ✅ Idle/sleep power: Leakage meets always-on budget
Next — Day 9: Signal integrity and crosstalk mitigation — coupling capacitance, aggressor/victim nets, crosstalk delay, and noise-aware routing strategies.