Home VLSI Digital Electronics STA RTL Design About Contact
VLSI · Power Optimization

Low Power RTL Design

Power is the primary constraint in modern VLSI — from mobile SoCs to data-center accelerators. Learn clock gating, power gating, operand isolation, multi-Vt strategies, and DVFS to build chips that sip microwatts instead of gulping watts.

1. The CMOS Power Equation

Every watt consumed in a CMOS chip traces back to one of two sources: dynamic power from switching and static power from leakage. RTL designers primarily control the dynamic component.

P_total = α · C · V²_dd · f + I_leak · V_dd

Dynamic (switching) + Static (leakage)

Activity Factor (α)

Fraction of clock cycles a node switches. The primary lever for RTL designers — reduce unnecessary transitions.

Capacitance (C)

Physical size of wires and transistors. Optimized by cell sizing and layout, not directly in RTL.

Voltage (V²_dd)

Quadratic relationship — halving voltage reduces dynamic power by 4×. DVFS exploits this directly.

Leakage (I_leak)

Sub-threshold and gate-oxide leakage. Exponentially sensitive to Vt — controlled via multi-Vt cell selection.

2. Clock Gating

Clock gating is the highest-impact low-power technique available at RTL. By disabling the clock to flip-flops that don't need to update, you eliminate both the toggle energy in the clock tree and the internal switching energy of the flip-flops themselves. Well-implemented clock gating can reduce dynamic power by 20–40%.

ICG Cells — the Right Way

Never gate the clock with a combinational AND gate in RTL — glitches on the enable signal create spurious clock edges and cause functional failures. Use a dedicated Integrated Clock Gating (ICG) cell, which internally latches the enable on the falling edge of the clock to produce a glitch-free output.

✕ Wrong — glitch-prone
// Never gate clock this way assign gated_clk = clk & en; always_ff @(posedge gated_clk) data_reg <= next_data;
✓ Correct — ICG inference
// Synthesis infers ICG cell always_ff @(posedge clk) if (en) data_reg <= next_data;

Tool tip: Most synthesis tools (Synopsys DC, Cadence Genus) automatically extract ICG cells from if (en) guarded always_ff blocks when you enable the clock gating optimization flag (compile_seqmap_propagate_constants true in DC).

Hierarchical Clock Gating

Apply clock gating at multiple levels of hierarchy. A top-level gate can disable an entire subsystem; sub-module gates provide finer granularity. The effectiveness depends on the enable activity factor — an enable that is low 80% of the time provides 80% power reduction on those registers.

// Module-level enable propagated hierarchically module dsp_core ( input logic clk, rst_n, core_en, input logic [15:0] data_in, output logic [31:0] result ); logic [15:0] coeff_reg; logic [31:0] accum; // Coefficient update gated by core_en always_ff @(posedge clk) if (core_en) coeff_reg <= data_in; // Accumulator gated by separate control always_ff @(posedge clk or negedge rst_n) if (!rst_n) accum <= 32'h0; else if (core_en) accum <= accum + (coeff_reg * data_in); assign result = accum; endmodule

3. Power Gating

Clock gating stops switching but leakage current continues as long as VDD is applied. Power gating inserts header (PMOS) or footer (NMOS) switch transistors between VDD/GND and the logic block, completely cutting off power to idle blocks. This eliminates both dynamic and static power, at the cost of power-up latency and design complexity.

AspectClock GatingPower Gating
Power reducedDynamic onlyDynamic + Leakage
ImplementationICG cells in clock treeHeader/footer switch cells + power mesh
State retentionState preserved automaticallyRequires retention flip-flops (balloon flops)
Power-up latency1 cycleHundreds of nanoseconds
UPF/CPF requiredOptionalRequired for multi-domain intent specification
Best forBlocks idle <10 cyclesBlocks idle for milliseconds or more

UPF Power Intent

Power gating is specified via Unified Power Format (UPF) at RTL. The implementation of header cells, isolation, and retention is handled by synthesis tools reading the UPF.

# UPF snippet — power gating a DSP block create_power_domain PD_DSP \ -include_scope dsp_core create_supply_net VDD_DSP -domain PD_DSP create_power_switch SW_DSP \ -domain PD_DSP \ -input_supply_port {vin VDD} \ -output_supply_port {vout VDD_DSP} \ -control_port {sleep_n pg_sleep_n} set_isolation ISO_DSP \ -domain PD_DSP -isolation_power_net VDD \ -clamp_value 0 -applies_to outputs set_retention RET_DSP \ -domain PD_DSP -retention_power_net VDD_RET

4. Operand Isolation

Combinational arithmetic blocks (multipliers, MAC units, dividers) have deep logic cones that glitch heavily as inputs propagate through. If the output of such a block is not consumed in a given cycle (e.g., a multiply-accumulate is idle), its inputs may still toggle, wasting power through pointless switching across thousands of gates.

Operand isolation places a simple AND/OR gate on the datapath inputs to hold them constant when the block is idle. This is especially important for wide multipliers where even a single bit toggle causes a ripple through the partial-product tree.

module mac_unit #(parameter W = 16) ( input logic clk, valid, input logic [W-1:0] a, b, output logic [2*W-1:0] accum ); // Operand isolation: hold inputs at 0 when not valid logic [W-1:0] a_iso, b_iso; assign a_iso = a & {W{valid}}; // AND-gate isolation assign b_iso = b & {W{valid}}; logic [2*W-1:0] product; assign product = a_iso * b_iso; // No spurious glitching always_ff @(posedge clk) if (valid) accum <= accum + product; endmodule

5. Multi-Threshold Voltage (Multi-Vt) Strategy

Standard cell libraries offer multiple threshold voltage variants of the same logic gate. The threshold voltage (Vt) determines the trade-off between switching speed and leakage current.

Cell TypeVtSpeedLeakageTypical Use
LVT (Low-Vt)LowFastestHighest (10–100×)Critical timing paths only
RVT (Regular-Vt)MediumModerateModerateModerately critical paths
HVT (High-Vt)HighSlowestLowestNon-critical / idle logic
ULVT (Ultra-Low-Vt)Very LowFastestExtremely highUltra-critical paths in high-perf nodes

Design rule: Start synthesis with all HVT cells. Synthesis/PnR tools automatically swap in RVT/LVT cells only on paths failing timing. A typical balanced design uses 60–80% HVT, achieving significant leakage savings without sacrificing timing closure.

6. Dynamic Voltage & Frequency Scaling (DVFS)

DVFS exploits the quadratic relationship between voltage and dynamic power. When a compute block is not under full load, both its operating frequency and supply voltage can be reduced simultaneously — frequency reduction allows a lower voltage, and the combination saves power cubically (P ∝ V² · f ∝ V³ when f scales with V).

Full Performance

V_dd = 1.0V, f = 1 GHz
P ∝ 1.0² × 1.0 = 1.0×

Mid Power

V_dd = 0.8V, f = 600 MHz
P ∝ 0.64 × 0.6 = 0.38×

Low Power

V_dd = 0.6V, f = 250 MHz
P ∝ 0.36 × 0.25 = 0.09×

In RTL, DVFS manifests as clock domain crossings (CDCs) between voltage/frequency islands, each with its own power domain in UPF. The RTL designer must ensure proper synchronizers at all inter-island interfaces.

7. FSM Encoding for Low Power

The state register switches on every clock cycle. Choosing the right encoding minimizes Hamming distance — the number of bits that change — between consecutive state transitions.

EncodingBits UsedHamming Distance (worst)Best For
Binarylog₂(N)Up to log₂(N)Area-minimal (small FSMs)
Gray Codelog₂(N)1 (for sequential transitions)Linear state sequences, low power
One-HotN2 (one goes 1→0, one goes 0→1)High-speed FSMs, FPGA
One-ColdN2Low-speed power-critical FSMs

For a pipeline FSM with 8 sequential states, Gray encoding guarantees only 1 bit toggles per cycle — versus up to 3 bits for binary (e.g., 3→4 is 011→100, 3 flips). Specify encoding in RTL with synthesis attributes:

// SystemVerilog — guide synthesis to Gray encoding typedef enum logic [2:0] { IDLE = 3'b000, FETCH = 3'b001, EXEC = 3'b011, WRITE = 3'b010, // Gray sequence: 0→1→3→2→6→7→5→4 DONE = 3'b110 } state_t; // (* syn_encoding = "gray" *) -- Synopsys DC attribute state_t state, next_state;

8. Low-Power RTL Design Checklist

Interactive Lab: Switching Activity Visualizer

Toggle bits in the current and next data bus states. Hit Simulate to see Hamming distance and relative energy cost.

Current State (8-bit bus)

Next State (click bits to toggle)

Transition Analysis

Bits Toggling (α)
Hamming Distance
Relative Energy
Encoding Efficiency
Try: 0111 → 1000 (binary 7→8) = 4 toggles. Then try 0111 → 0110 (Gray) = 1 toggle. Same logical transition, 4× power difference.

FAQ

Clock gating uses an ICG cell to disable the clock to flip-flops that don't need to update. Since dynamic power scales with switching activity (α), eliminating unnecessary clock edges reduces power by 20–40% in typical designs. The ICG cell latches the enable on the clock's falling edge to produce a glitch-free gated clock.
Clock gating stops the clock to idle flip-flops but leakage current continues because VDD remains applied. Power gating cuts VDD from the entire logic block using header/footer transistors, eliminating both dynamic and leakage power. Power gating requires complex power-up/down sequencing and retention flip-flops to preserve state, while clock gating is near-instantaneous.
Operand isolation prevents unnecessary switching of datapath inputs (e.g., a multiplier's a/b inputs) when the output result is not being consumed. A simple AND gate holds inputs constant when a valid/enable signal is low, preventing power-wasting glitches from propagating through expensive arithmetic units with deep logic cones.
Multi-threshold voltage cells come in High-Vt (HVT), Regular-Vt (RVT), and Low-Vt (LVT) variants. HVT cells have a higher switching threshold — they are slower but have exponentially lower leakage. Synthesis tools start with all-HVT and swap in RVT/LVT only on timing-critical paths. A well-optimized design uses 60–80% HVT cells to minimize leakage while meeting timing.