Every milliwatt matters at advanced nodes. Learn the complete toolkit: multi-threshold voltage cell selection, integrated clock gating, MTCMOS power gating with retention, UPF power intent, and low-power signoff with Innovus and Voltus.
Before applying any low-power technique you need to understand the two fundamentally different components of CMOS power consumption. Getting the breakdown wrong means applying the wrong fix.
Targets leakage. Uses HVT cells (slow, low leakage) on non-critical paths. 30–50% leakage reduction.
Targets dynamic power. ICG cells block clock to idle FFs. 20–40% dynamic power reduction.
Targets leakage in idle blocks. MTCMOS switches cut VDD. Near-zero standby power.
P ∝ V². Reducing VDD from 1.0V→0.8V cuts dynamic power by 36%. Combined with DVFS.
| Technique | Targets | Typical Savings | Area Overhead | When to Use |
|---|---|---|---|---|
| Multi-Vt | Leakage | 30–50% leakage | 0% (cell swap) | Always — first step |
| Clock Gating | Dynamic | 20–40% dynamic | 1–2% (ICG cells) | Any block with idle cycles |
| Power Gating | Leakage + Dynamic | 90–99% block power | 2–5% switches | Blocks idle >1 ms |
| Voltage Scaling | Dynamic (P∝V²) | Up to 50% dynamic | 0% (voltage rail) | Throughput-flexible blocks |
| Operand Isolation | Dynamic | 5–15% dynamic | <1% | Datapaths with enable signals |
Every standard cell library at 28nm and below ships three or more Vt flavors: HVT (high threshold voltage), SVT (standard), and LVT (low threshold voltage). The names describe the MOSFET threshold voltage, which controls the trade-off between drive strength and leakage current.
# Set leakage optimization mode — swap LVT→HVT where timing allows
setOptMode -leakagePowerEffort high
setOptMode -powerEffort high
setOptMode -holdTargetSlack 0.05 ;# 50ps hold margin
# Constrain minimum HVT usage (% of cells by count)
setDesignMode -process 28
setMultiCpuUsage -localCpu 8
# Perform post-place leakage optimization
optDesign -postPlace -leakagePower
# Report Vt distribution before and after
report_cell_usage -threshold_voltage_type > reports/vt_dist_before.rpt
# Force HVT on non-critical cells (slack headroom > 200ps)
foreach_in_collection cell [get_cells -filter "lib_cell.threshold_voltage_type == LVT"] {
set slack [get_attribute [get_timing_paths -through $cell] slack]
if {$slack > 0.2} {
swap_cell $cell [get_lib_cells *HVT*[get_attribute $cell ref_name]]
}
}
# Post-route leakage optimization
optDesign -postRoute -leakagePower -setupEffort mediumClock gating is the single most impactful dynamic power technique in digital design. Integrated Clock Gating (ICG) cells combine a latch and an AND gate in a single standard cell. The latch samples the enable signal on the falling edge of the clock, preventing glitches on the gated clock output — the most critical requirement for correct clock gating.
// ICG cell behavioral model — use this in simulation only
// In implementation, use the foundry-provided ICG standard cell
module ICG (
input logic CLK, // system clock
input logic EN, // enable (active-high): 1 = let clock through
input logic TE, // test enable (bypass gating during scan)
output logic GCLK // gated clock output
);
logic en_latch;
// Latch on falling edge of CLK — standard ICG implementation
always_latch begin
if (!CLK) en_latch = EN | TE; // transparent when CLK=0
end
assign GCLK = CLK & en_latch;
endmodule
// Usage: RTL clock gating pattern (synthesis tool auto-inserts ICG)
module reg_bank_with_cg #(parameter W=32)(
input logic clk, rst_n,
input logic we, // write enable → becomes ICG enable
input logic [W-1:0] d,
output logic [W-1:0] q
);
always_ff @(posedge clk or negedge rst_n)
if (!rst_n) q <= '0;
else if (we) q <= d; // 'if(we)' pattern → tool inserts ICG
endmodule# Enable clock gating during synthesis (Design Compiler / Genus)
set_clock_gating_style -sequential_cell latch \
-minimum_bitwidth 4 \
-control_point before \
-control_signal scan_enable
# In Innovus: verify clock gating is preserved post-CTS
report_clock_gating > reports/clock_gating.rpt
# Check ICG setup timing — EN must arrive T_setup before CLK falling
check_timing -type clock_gating
# Power analysis with clock gating enabled
set_activity -toggle_rate 0.25 [get_nets *] ;# 25% toggle rate default
set_activity -toggle_rate 0.0 [get_nets *GCLK*] ;# gated = 0 when disabled
report_power -pg_pin_voltage_drop > reports/dynamic_power_post_cg.rptPower gating takes low-power one step further: instead of stopping the clock, it cuts the power supply to an entire logic block. MTCMOS (Multi-Threshold CMOS) header (PMOS) or footer (NMOS) switch cells are placed in the power rail between VDD/VSS and the virtual VVDD/VVSS that supply the standard cells. When the block is idle, the switches open — zero static leakage flows.
| Property | Header Switch (PMOS) | Footer Switch (NMOS) |
|---|---|---|
| Controls | VVDD (virtual VDD) | VVSS (virtual VSS) |
| Transistor type | High-Vt PMOS | High-Vt NMOS |
| Drive strength | Weaker (PMOS ≈ 0.5× NMOS) | Stronger (NMOS preferred) |
| Area overhead | Higher (more cells needed) | Lower |
| Control signal | Active-low (SLEEP_N=0 → OFF) | Active-high (SLEEP=0 → OFF) |
| Use when | Noise-sensitive blocks (VVDD stable) | Most logic blocks |
# UPF 2.1 — Power gating domain for DSP accelerator block
create_power_domain PD_DSP \
-elements {u_dsp_core} \
-supply {.primary PD_DSP_supply}
create_supply_net VVDD_DSP -domain PD_DSP
create_supply_net VSS -domain PD_TOP
# Footer power switch (NMOS, high-Vt)
create_power_switch SW_DSP \
-domain PD_DSP \
-output_supply_port {.vss VVSS_DSP} \
-input_supply_port {.vss VSS} \
-control_port {.sleep SLEEP_DSP} \
-on_state {on .sleep {!SLEEP_DSP}} \
-off_state {off .sleep {SLEEP_DSP}}
# Isolation cells — must be in always-on domain
set_isolation ISO_DSP \
-domain PD_DSP \
-isolation_power_net VDD \
-isolation_ground_net VSS \
-clamp_value 0 \
-applies_to outputs
set_isolation_control ISO_DSP -domain PD_DSP \
-isolation_signal SLEEP_DSP \
-isolation_sense high \
-location parent
# Retention flip-flops for state save/restore
set_retention RET_DSP \
-domain PD_DSP \
-retention_supply_set {.power VDD .ground VSS} \
-save_signal {SAVE_DSP high} \
-restore_signal {RESTORE_DSP high}When a power domain is gated off, all flip-flop state is lost — the virtual supply collapses. Retention flip-flops solve this by embedding a small "shadow latch" inside each FF that is supplied from the always-on VDD rail. Before gating off, a SAVE pulse copies the master latch state into the shadow latch. After wake-up, a RESTORE pulse copies it back into the master latch.
UPF (Unified Power Format), standardised as IEEE 1801, is the industry-standard language for describing power intent separately from the RTL. The UPF file specifies power domains, supply voltages, isolation cells, level shifters, retention elements, and power state tables. It travels alongside the netlist through synthesis, place-and-route, and verification.
# Load power intent along with the netlist
read_mmmc mmmc_lowpower.view ;# includes multi-VDD corners
read_physical -lef {tech.lef cells_hvt.lef cells_lvt.lef switch_cells.lef}
read_netlist gate_level_iso.v
read_upf power_intent.upf
# Floorplan: carve out power domain regions
floorPlan -site unit -ar 0.6 -coreSW 5 5 5 5
addPowerDomain -name PD_DSP -box {200 200 600 500}
# Add power rings for each domain
addRing -nets {VDD VSS} -type core_rings -width 8 -spacing 2
addRing -nets {VVDD_DSP VSS} -type block_rings -width 5 -spacing 1.5 \
-around PD_DSP
# Insert power switches (Innovus auto-selects footer cell from lib)
add_power_switches -domain PD_DSP \
-switch_cell MTCMOS_FOOTER_HVT \
-control_port sleep \
-acknowledge_port ack
# Run placement respecting power domains
place_design -concurrent_placement
setOptMode -powerGating true
optDesign -postPlace -power
# Verify UPF intent after P&R
check_pg_connectivity -error_view
verify_power_domain -upf power_intent.upf| Check | Tool | What to Verify | Pass Criterion |
|---|---|---|---|
| Static IR Drop | Voltus / RedHawk | Average voltage drop on each domain | <5% of VDD per domain |
| Dynamic IR Drop | Voltus vectorbased | Worst-case voltage droop on wake-up | <10% VDD at worst corner |
| Isolation cell check | Conformal LP / PA | All cross-domain outputs isolated | Zero missing isolation cells |
| Level shifter check | Conformal LP | All cross-domain signals have correct shifter | Zero missing level shifters |
| UPF vs netlist LVS | Calibre PV UPF | Power switch connectivity matches UPF | Zero LVS violations |
| Retention FF coverage | Conformal LP | All state-critical FFs are retention type | 100% coverage on specified FFs |
| Power state table STA | PrimeTime PX | Timing paths valid in each power state | WNS > 0 in all active states |
| Save/restore sim | VCS / Questa | State preserved across power-off event | All retention FFs restore correctly |
The Apple A17 Pro implements at least 8 independently power-gated domains including CPU big cores, efficiency cores, GPU, ANE (AI engine), memory controller, and I/O. Total chip leakage with all blocks gated off (deep sleep): <5 mW. In active use: 8–12W peak (GPU-heavy). Clock gating contributes ~35% of active-mode power savings. Multi-Vt with 80% HVT coverage reduces leakage by an estimated 4× vs an all-LVT design.
| # | Question | Answer Points |
|---|---|---|
| 1 | Why do we use a latch in an ICG cell instead of a simple AND gate? | A simple AND of CLK and EN creates glitches when EN changes while CLK=1 — the output can produce a narrow pulse that causes spurious FF transitions. The latch samples EN only on CLK's falling edge (when CLK=0), so EN_latched is stable whenever CLK goes high — guaranteeing a glitch-free gated clock. |
| 2 | What is the difference between isolation cells and level shifters? | Isolation cells prevent a powered-down domain's outputs from driving undefined (floating) values into the always-on domain — they clamp to a known logic value (0 or 1) when the source domain is off. Level shifters translate signal voltage from one supply domain to another (e.g. 0.7V→1.0V) when both domains are powered. Both are inserted at domain crossings but serve different purposes. |
| 3 | What happens to flip-flop state when a power domain is gated off? | All state stored in standard flip-flops is lost because their supply collapses. Retention flip-flops solve this: they have a shadow latch always supplied from the parent (always-on) VDD. A SAVE operation copies FF state into the shadow before power-off; a RESTORE copies it back on wake-up. Only critical state registers need retention FFs — pipeline registers and dead-state FFs can simply reset. |
| 4 | What is a power state table (PST) in UPF? | A PST defines all legal combinations of power domain states (ON/OFF/RETENTION) in the chip and maps them to power supply voltages for each state. The STA tool uses the PST to know which timing paths are active in each state and apply appropriate voltage-corner analysis. Example: CPU=ON (1.0V), DSP=OFF, GPU=RETENTION → only CPU timing paths need to close in this state. |
| 5 | How do you choose between power gating and clock gating for a block? | Clock gating targets dynamic power and has near-zero overhead — always use it. Power gating targets leakage and has significant overhead (wake-up latency 1–100 µs, area overhead 2–5%, software complexity for save/restore). Power gate a block only if it will be idle for >1 ms (so the power savings outweigh the wake-up cost) and it can tolerate the latency. For blocks that idle for <1 ms, clock gating is sufficient. |
| 6 | What is DVFS and how does it differ from power gating? | DVFS (Dynamic Voltage and Frequency Scaling) reduces both voltage and frequency together when full performance is not needed — P ∝ V²·f, so halving both saves 4× power. Power gating cuts power completely but the block is completely non-functional. DVFS keeps the block operational at reduced throughput. In modern SoCs both are used: DVFS for active workload scaling, power gating for deep idle states. |