HomePhysical DesignDay 19 — Low Power Design

Low Power Design — Multi-Vt, Clock Gating & Power Gating

Every milliwatt matters at advanced nodes. Learn the complete toolkit: multi-threshold voltage cell selection, integrated clock gating, MTCMOS power gating with retention, UPF power intent, and low-power signoff with Innovus and Voltus.

By EcrioniX Engineering Team · Published June 19, 2026 · ~5,000 words · 17 min read

1. Power Budget — Where Does the Power Go?

Before applying any low-power technique you need to understand the two fundamentally different components of CMOS power consumption. Getting the breakdown wrong means applying the wrong fix.

Total Power = Dynamic Power + Leakage Power Dynamic Power (switching activity): P_dyn = α × C_L × V_DD² × f α = activity factor (0–1, what fraction of FFs switch each cycle) C_L = load capacitance (wire + gate) V_DD = supply voltage f = clock frequency Leakage Power (always-on, even when idle): P_leak = I_leak × V_DD I_leak = subthreshold leakage + gate leakage + junction leakage At 28nm: Dynamic ≈ 70%, Leakage ≈ 30% At 7nm: Dynamic ≈ 50%, Leakage ≈ 50% At 3nm: Dynamic ≈ 40%, Leakage ≈ 60% (leakage dominates!) → Leakage grows exponentially at advanced nodes because Vt scaling lags VDD scaling.

Multi-Vt

Targets leakage. Uses HVT cells (slow, low leakage) on non-critical paths. 30–50% leakage reduction.

Clock Gating

Targets dynamic power. ICG cells block clock to idle FFs. 20–40% dynamic power reduction.

Power Gating

Targets leakage in idle blocks. MTCMOS switches cut VDD. Near-zero standby power.

Voltage Scaling

P ∝ V². Reducing VDD from 1.0V→0.8V cuts dynamic power by 36%. Combined with DVFS.

TechniqueTargetsTypical SavingsArea OverheadWhen to Use
Multi-VtLeakage30–50% leakage0% (cell swap)Always — first step
Clock GatingDynamic20–40% dynamic1–2% (ICG cells)Any block with idle cycles
Power GatingLeakage + Dynamic90–99% block power2–5% switchesBlocks idle >1 ms
Voltage ScalingDynamic (P∝V²)Up to 50% dynamic0% (voltage rail)Throughput-flexible blocks
Operand IsolationDynamic5–15% dynamic<1%Datapaths with enable signals

2. Multi-Threshold Voltage (Multi-Vt) Cell Selection

Every standard cell library at 28nm and below ships three or more Vt flavors: HVT (high threshold voltage), SVT (standard), and LVT (low threshold voltage). The names describe the MOSFET threshold voltage, which controls the trade-off between drive strength and leakage current.

Subthreshold leakage current: I_sub = I₀ × exp(-(Vt - Vgs) / n×VT) Vt = threshold voltage VT = thermal voltage (kT/q ≈ 26 mV at 300K) n = subthreshold slope factor (~1.2–1.5) HVT cells: Vt ↑ → I_sub ↓↓ → fast: NO, leakage: LOW SVT cells: balanced LVT cells: Vt ↓ → I_sub ↑↑ → fast: YES, leakage: HIGH Typical leakage ratio at 28nm: LVT : SVT : HVT = 10 : 3 : 1 → replacing LVT with HVT reduces leakage by 10×
Multi-Vt Cell Assignment Strategy CRITICAL PATH — LVT cells (fast, high leakage) FF LVT AND LVT OR LVT INV LVT FF LVT Slack: -50ps → needs LVT NEAR-CRITICAL — SVT cells (balanced) FF SVT NAND SVT NOR SVT Slack: 50–200ps → SVT OK NON-CRITICAL — HVT cells (low leakage, slow) — 60–70% of all cells FF HVT INV HVT AND HVT BUF HVT FF HVT Slack: >300ps → HVT
Fig 1: Multi-Vt assignment by slack. LVT on critical paths (slack < 50ps), SVT on near-critical, HVT everywhere else. 60–70% HVT coverage is a common target for 28nm/16nm designs.
Tcl — Innovus multi-Vt leakage optimization
# Set leakage optimization mode — swap LVT→HVT where timing allows setOptMode -leakagePowerEffort high setOptMode -powerEffort high setOptMode -holdTargetSlack 0.05 ;# 50ps hold margin # Constrain minimum HVT usage (% of cells by count) setDesignMode -process 28 setMultiCpuUsage -localCpu 8 # Perform post-place leakage optimization optDesign -postPlace -leakagePower # Report Vt distribution before and after report_cell_usage -threshold_voltage_type > reports/vt_dist_before.rpt # Force HVT on non-critical cells (slack headroom > 200ps) foreach_in_collection cell [get_cells -filter "lib_cell.threshold_voltage_type == LVT"] { set slack [get_attribute [get_timing_paths -through $cell] slack] if {$slack > 0.2} { swap_cell $cell [get_lib_cells *HVT*[get_attribute $cell ref_name]] } } # Post-route leakage optimization optDesign -postRoute -leakagePower -setupEffort medium

3. Clock Gating — ICG Cells

Clock gating is the single most impactful dynamic power technique in digital design. Integrated Clock Gating (ICG) cells combine a latch and an AND gate in a single standard cell. The latch samples the enable signal on the falling edge of the clock, preventing glitches on the gated clock output — the most critical requirement for correct clock gating.

Why Latch-Based Gating (Not a Simple AND)?

Naive clock gating (WRONG — creates glitches): GCLK = CLK AND EN Problem: EN can change while CLK=1, causing a glitch on GCLK → spurious flip at the sink FFs → data corruption ICG cell (CORRECT): EN_latched = latch(EN, clocked on CLK_falling) GCLK = CLK AND EN_latched → EN is sampled only when CLK is LOW → EN_latched never changes while CLK is HIGH → GCLK transitions only between complete clock pulses — no glitches Power saving with clock gating: Without CG: every FF switches every cycle → P_ff = α_full × C × V² × f With CG: idle FFs get no clock → P_ff = α_real × C × V² × f Saving = (α_full - α_real) / α_full × P_clock_tree Typical: 20–40% total chip power reduction
ICG Cell — Integrated Clock Gating CLK EN D-Latch (CLK falling edge) CLK EN_latch AND GCLK FF bank FF bank Key constraint: ICG must be placed upstream in clock tree. Setup check on EN: EN must be stable T_setup before CLK falling edge. Typical: 16 FFs per ICG cell. Minimum gating granularity: 4 FFs (below 4 → ICG overhead > savings).
Fig 2: ICG cell internal structure. The D-latch samples EN on CLK falling edge, preventing glitches. The AND gate produces a glitch-free GCLK only when EN_latched is high.
Verilog — ICG cell behavioral model
// ICG cell behavioral model — use this in simulation only // In implementation, use the foundry-provided ICG standard cell module ICG ( input logic CLK, // system clock input logic EN, // enable (active-high): 1 = let clock through input logic TE, // test enable (bypass gating during scan) output logic GCLK // gated clock output ); logic en_latch; // Latch on falling edge of CLK — standard ICG implementation always_latch begin if (!CLK) en_latch = EN | TE; // transparent when CLK=0 end assign GCLK = CLK & en_latch; endmodule // Usage: RTL clock gating pattern (synthesis tool auto-inserts ICG) module reg_bank_with_cg #(parameter W=32)( input logic clk, rst_n, input logic we, // write enable → becomes ICG enable input logic [W-1:0] d, output logic [W-1:0] q ); always_ff @(posedge clk or negedge rst_n) if (!rst_n) q <= '0; else if (we) q <= d; // 'if(we)' pattern → tool inserts ICG endmodule
Tcl — Innovus clock gating insertion and analysis
# Enable clock gating during synthesis (Design Compiler / Genus) set_clock_gating_style -sequential_cell latch \ -minimum_bitwidth 4 \ -control_point before \ -control_signal scan_enable # In Innovus: verify clock gating is preserved post-CTS report_clock_gating > reports/clock_gating.rpt # Check ICG setup timing — EN must arrive T_setup before CLK falling check_timing -type clock_gating # Power analysis with clock gating enabled set_activity -toggle_rate 0.25 [get_nets *] ;# 25% toggle rate default set_activity -toggle_rate 0.0 [get_nets *GCLK*] ;# gated = 0 when disabled report_power -pg_pin_voltage_drop > reports/dynamic_power_post_cg.rpt

4. Power Gating (MTCMOS)

Power gating takes low-power one step further: instead of stopping the clock, it cuts the power supply to an entire logic block. MTCMOS (Multi-Threshold CMOS) header (PMOS) or footer (NMOS) switch cells are placed in the power rail between VDD/VSS and the virtual VVDD/VVSS that supply the standard cells. When the block is idle, the switches open — zero static leakage flows.

Header vs Footer Switch

PropertyHeader Switch (PMOS)Footer Switch (NMOS)
ControlsVVDD (virtual VDD)VVSS (virtual VSS)
Transistor typeHigh-Vt PMOSHigh-Vt NMOS
Drive strengthWeaker (PMOS ≈ 0.5× NMOS)Stronger (NMOS preferred)
Area overheadHigher (more cells needed)Lower
Control signalActive-low (SLEEP_N=0 → OFF)Active-high (SLEEP=0 → OFF)
Use whenNoise-sensitive blocks (VVDD stable)Most logic blocks
MTCMOS Power Gating — Footer Switch Architecture VDD (real supply rail — always powered) Logic Block — powered from VVDD/VVSS (virtual rails) FF HVT NAND INV AOI FF HVT Retention FF (shadow latch) VVDD (virtual — floats when footer is open) HVT Footer Switch Cells (SLEEP → open = power off) VSS (real ground rail — always connected) SLEEP VVDD
Fig 3: MTCMOS footer switch architecture. HVT footer cells (yellow row) sit between the standard cells and real VSS. When SLEEP=1, all footer switches open → zero static current through the block. Retention FFs preserve state before power-off.
UPF — power gating domain definition
# UPF 2.1 — Power gating domain for DSP accelerator block create_power_domain PD_DSP \ -elements {u_dsp_core} \ -supply {.primary PD_DSP_supply} create_supply_net VVDD_DSP -domain PD_DSP create_supply_net VSS -domain PD_TOP # Footer power switch (NMOS, high-Vt) create_power_switch SW_DSP \ -domain PD_DSP \ -output_supply_port {.vss VVSS_DSP} \ -input_supply_port {.vss VSS} \ -control_port {.sleep SLEEP_DSP} \ -on_state {on .sleep {!SLEEP_DSP}} \ -off_state {off .sleep {SLEEP_DSP}} # Isolation cells — must be in always-on domain set_isolation ISO_DSP \ -domain PD_DSP \ -isolation_power_net VDD \ -isolation_ground_net VSS \ -clamp_value 0 \ -applies_to outputs set_isolation_control ISO_DSP -domain PD_DSP \ -isolation_signal SLEEP_DSP \ -isolation_sense high \ -location parent # Retention flip-flops for state save/restore set_retention RET_DSP \ -domain PD_DSP \ -retention_supply_set {.power VDD .ground VSS} \ -save_signal {SAVE_DSP high} \ -restore_signal {RESTORE_DSP high}

5. Retention Flip-Flops — Saving State Across Power-Off

When a power domain is gated off, all flip-flop state is lost — the virtual supply collapses. Retention flip-flops solve this by embedding a small "shadow latch" inside each FF that is supplied from the always-on VDD rail. Before gating off, a SAVE pulse copies the master latch state into the shadow latch. After wake-up, a RESTORE pulse copies it back into the master latch.

Retention FF timing sequence: Normal operation: SAVE=0, RESTORE=0, FF behaves as standard FF Power-off sequence: 1. Assert SAVE=1 → shadow latch captures current state 2. Wait T_save → typically 2–3 cycles 3. Assert SLEEP=1 → footer switches open, VVSS floats 4. Block is OFF → all standard logic off, shadow latch holds state on VDD Wake-up sequence: 1. Deassert SLEEP=0 → footer switches close, VVSS recharges 2. Wait T_wakeup → 1–100 µs depending on block capacitance 3. Assert RESTORE=1 → master latch restores state from shadow 4. Deassert SAVE=0 → FF returns to normal clocked operation Retention FF overhead vs standard FF: Area: ~1.5–2× larger (extra shadow latch transistors) Power: ~10% more leakage on always-on domain (shadow latch static) Use: only on state-holding FFs that must survive power-off (not on datapath pipeline registers — those can reset on wake)

6. UPF Low-Power Design Flow

UPF (Unified Power Format), standardised as IEEE 1801, is the industry-standard language for describing power intent separately from the RTL. The UPF file specifies power domains, supply voltages, isolation cells, level shifters, retention elements, and power state tables. It travels alongside the netlist through synthesis, place-and-route, and verification.

Complete Low-Power Implementation Flow

Low-power physical design flow: 1. RTL + UPF ← designer supplies both ↓ 2. Synthesis (Genus/DC) ← reads UPF, inserts isolation/level-shifters ↓ 3. Place & Route (Innovus/ICC2) ← power domain floorplanning ↓ place_design (VDD rings per domain) ↓ add_power_switches (footer/header cells) ↓ route_design ↓ 4. Power analysis (Voltus/PTA) ← dynamic + static IR drop per domain ↓ 5. LVS (Calibre) ← verify power intent matches netlist ↓ 6. Low-power simulation (UVM) ← verify save/restore correctness ↓ 7. Tapeout
Tcl — Innovus multi-voltage / power gating flow
# Load power intent along with the netlist read_mmmc mmmc_lowpower.view ;# includes multi-VDD corners read_physical -lef {tech.lef cells_hvt.lef cells_lvt.lef switch_cells.lef} read_netlist gate_level_iso.v read_upf power_intent.upf # Floorplan: carve out power domain regions floorPlan -site unit -ar 0.6 -coreSW 5 5 5 5 addPowerDomain -name PD_DSP -box {200 200 600 500} # Add power rings for each domain addRing -nets {VDD VSS} -type core_rings -width 8 -spacing 2 addRing -nets {VVDD_DSP VSS} -type block_rings -width 5 -spacing 1.5 \ -around PD_DSP # Insert power switches (Innovus auto-selects footer cell from lib) add_power_switches -domain PD_DSP \ -switch_cell MTCMOS_FOOTER_HVT \ -control_port sleep \ -acknowledge_port ack # Run placement respecting power domains place_design -concurrent_placement setOptMode -powerGating true optDesign -postPlace -power # Verify UPF intent after P&R check_pg_connectivity -error_view verify_power_domain -upf power_intent.upf

7. Low-Power Signoff Checklist

CheckToolWhat to VerifyPass Criterion
Static IR DropVoltus / RedHawkAverage voltage drop on each domain<5% of VDD per domain
Dynamic IR DropVoltus vectorbasedWorst-case voltage droop on wake-up<10% VDD at worst corner
Isolation cell checkConformal LP / PAAll cross-domain outputs isolatedZero missing isolation cells
Level shifter checkConformal LPAll cross-domain signals have correct shifterZero missing level shifters
UPF vs netlist LVSCalibre PV UPFPower switch connectivity matches UPFZero LVS violations
Retention FF coverageConformal LPAll state-critical FFs are retention type100% coverage on specified FFs
Power state table STAPrimeTime PXTiming paths valid in each power stateWNS > 0 in all active states
Save/restore simVCS / QuestaState preserved across power-off eventAll retention FFs restore correctly

Real-World Power Numbers — Apple A17 (3nm)

The Apple A17 Pro implements at least 8 independently power-gated domains including CPU big cores, efficiency cores, GPU, ANE (AI engine), memory controller, and I/O. Total chip leakage with all blocks gated off (deep sleep): <5 mW. In active use: 8–12W peak (GPU-heavy). Clock gating contributes ~35% of active-mode power savings. Multi-Vt with 80% HVT coverage reduces leakage by an estimated 4× vs an all-LVT design.

8. Interview Q&A — Low Power Design

#QuestionAnswer Points
1Why do we use a latch in an ICG cell instead of a simple AND gate?A simple AND of CLK and EN creates glitches when EN changes while CLK=1 — the output can produce a narrow pulse that causes spurious FF transitions. The latch samples EN only on CLK's falling edge (when CLK=0), so EN_latched is stable whenever CLK goes high — guaranteeing a glitch-free gated clock.
2What is the difference between isolation cells and level shifters?Isolation cells prevent a powered-down domain's outputs from driving undefined (floating) values into the always-on domain — they clamp to a known logic value (0 or 1) when the source domain is off. Level shifters translate signal voltage from one supply domain to another (e.g. 0.7V→1.0V) when both domains are powered. Both are inserted at domain crossings but serve different purposes.
3What happens to flip-flop state when a power domain is gated off?All state stored in standard flip-flops is lost because their supply collapses. Retention flip-flops solve this: they have a shadow latch always supplied from the parent (always-on) VDD. A SAVE operation copies FF state into the shadow before power-off; a RESTORE copies it back on wake-up. Only critical state registers need retention FFs — pipeline registers and dead-state FFs can simply reset.
4What is a power state table (PST) in UPF?A PST defines all legal combinations of power domain states (ON/OFF/RETENTION) in the chip and maps them to power supply voltages for each state. The STA tool uses the PST to know which timing paths are active in each state and apply appropriate voltage-corner analysis. Example: CPU=ON (1.0V), DSP=OFF, GPU=RETENTION → only CPU timing paths need to close in this state.
5How do you choose between power gating and clock gating for a block?Clock gating targets dynamic power and has near-zero overhead — always use it. Power gating targets leakage and has significant overhead (wake-up latency 1–100 µs, area overhead 2–5%, software complexity for save/restore). Power gate a block only if it will be idle for >1 ms (so the power savings outweigh the wake-up cost) and it can tolerate the latency. For blocks that idle for <1 ms, clock gating is sufficient.
6What is DVFS and how does it differ from power gating?DVFS (Dynamic Voltage and Frequency Scaling) reduces both voltage and frequency together when full performance is not needed — P ∝ V²·f, so halving both saves 4× power. Power gating cuts power completely but the block is completely non-functional. DVFS keeps the block operational at reduced throughput. In modern SoCs both are used: DVFS for active workload scaling, power gating for deep idle states.

Day 19 — Low Power Design Checklist

← Day 18Signal Integrity & Crosstalk Next → Day 20Signoff, IR Drop, EM & Tapeout