Logic Synthesis · Step 3

Area, Power & Timing Optimization

Synthesis optimization strategies: compile_ultra, retiming, multi-Vt selection, clock gating, leakage minimization — and how to close timing when WNS is negative.

Power, Performance, Area — PPA Triangle

Performance (Speed) ↑ clock freq, ↓ WNS Power ↓ switching + leakage Area ↓ cell count (µm²) Balanced LVT cells → faster but more leakage Higher effort → better timing but more area Clock gating → ↓ power, ↓ area but adds ICG cell overhead
Every synthesis optimization involves trade-offs in the PPA triangle — improving one axis often degrades another. The goal is balanced PPA for the target specification.

Fixing Timing — Negative Slack (WNS < 0)

BEFORE Retiming FF Stage A Logic 1.5 ns ⚠️ FF Stage B 0.5 ns FF 1 GHz target = 1.0 ns — Stage A FAILS (1.5 ns) AFTER Retiming FF Stage A′ 1.0 ns ✓ FF Stage B′ 1.0 ns ✓ Both stages 1.0 ns — both meet 1 GHz ✓
Retiming moves logic (and flip-flop boundaries) to balance pipeline stage delays — without changing the circuit's functional behavior.
TechniqueHow It Helps TimingDC Command
Cell upsizingReplace X1 with X2/X4 drive-strength cell — faster output, more currentsize_cell u_and/NAND2X2
Buffer insertionBreak high-fanout net — each buffered copy drives fewer loads → fasterinsert_buffer -max_fanout 20
Logic restructuringReduce logic levels — 3-input chain vs 2-level tree reduces delaycompile_ultra (auto)
Logic duplicationDuplicate cell feeding multiple fanout points — each copy drives fewer loadscompile_ultra (auto with -dup)
RetimingMove FFs across combo to balance stagescompile_ultra -retime
LVT cell swapUse low-Vt variant (faster, more leakage) on critical path cellsset_attribute [critical cells] lib_cell LVT_variant
PipeliningAdd register stage to break path — increases latency, allows higher FmaxManual RTL change

Dynamic & Leakage Power Reduction

Without Clock Gating CLK always FF FF FF All FFs toggle every cycle → wasted dynamic power when EN=0 P_dynamic = α · C · V² · f (α=1 always) With ICG Clock Gating CLK ICG EN latch+AND EN signal FF FF FF FFs toggle ONLY when EN=1 → 20–40% dynamic power saved P_dynamic = α · C · V² · f (α < 1, reduced activity)
ICG (Integrated Clock Gating) cells gate the clock off when the enable signal is low — flip-flops don't toggle and dynamic power drops 20–40%.
Dynamic Power

P = α · C · V² · f

α = activity factor (switching probability)
C = net capacitance (pF)
V = supply voltage
f = clock frequency

Reduce by: clock gating (↓ α), voltage scaling (↓ V), operand isolation, power gating idle blocks.

Leakage Power

I_leak × V_dd

Subthreshold + gate leakage when transistors are "off". Flows even when circuit is idle.

Reduce by: HVT cells for non-critical paths (higher Vt = lower leakage), multi-Vt strategy, power gating (sleep transistor), body biasing (negative Vbs for PMOS in standby mode).

## Design Compiler — Power Optimization

# Enable clock gating inference (minimum register size = 4)
set_clock_gating_style -minimum_bitwidth 4 \
  -control_point before \
  -control_signal scan_enable

# Run power-focused compile
compile_ultra -gate_clock         # enables ICG insertion

# Leakage optimization
set_multi_vth_constraint -threshold_voltage_groups {HVT SVT LVT} \
  -cell_slack_limit 0.1         # use HVT where slack > 100ps

compile_ultra -leakage_power     # swap non-critical cells to HVT

# Report power
report_power -hierarchy

Multi-Threshold Voltage Cell Selection

Cell TypeThreshold VoltageSpeedLeakageWhen to Use
LVT (Low-Vt)Low (~0.25V)FastestHighest (10–100×)Critical timing paths only (WNS close to 0)
SVT (Standard)Medium (~0.40V)MediumMediumDefault — moderate timing paths
HVT (High-Vt)High (~0.55V)SlowestLowestNon-critical paths with large positive slack
ULVT (Ultra-LVT)Very lowUltra fastVery highOnly most critical endpoints; use sparingly

Rule of thumb: start with all SVT. Run compile_ultra -leakage_power to automatically swap cells with slack > threshold to HVT. Then swap remaining negative-slack cells to LVT. This "HVT flooding" approach can reduce leakage 30–50%.

Area Reduction Techniques

TechniqueHow It Reduces AreaTrade-off
Boolean minimizationReduces logic cone to minimum SOP/POS representationMay slow paths if deeper logic tree
Resource sharingMultiple operations (e.g. 3 adders) share one adder with mux at inputMux adds delay — may fail timing
Constant propagationCompile-time evaluation removes unreachable logicNone — free optimization
Hierarchy flatteningMerge sub-modules — cross-boundary optimization finds more redundancyLonger compile time; hierarchy lost
set_max_areaDirects tool to optimize for area after timing met: set_max_area 0Tool swaps larger cells for smaller after timing closure
DesignWare componentsHighly optimized arithmetic cells (dw_add, dw_mult) smaller than RTL-inferred equivalentsRequires DesignWare license

Optimization Interview Questions

What is multi-Vt optimization and why is it important?
Modern standard cell libraries offer cells at multiple threshold voltages (LVT, SVT, HVT). LVT cells are faster but leak more; HVT cells are slower but much less leaky. Synthesis assigns LVT cells only to timing-critical paths and uses HVT elsewhere — this "multi-Vt strategy" maintains timing performance while reducing leakage power by 30–50%. The synthesizer automates this with compile_ultra -leakage_power (DC) or set_multi_vth_constraint.
How does clock gating save power and how does the synthesizer insert it?
Clock gating removes the clock from flip-flops when their data won't change — α (activity factor) drops from 1 to the true switching rate, reducing P_dynamic = α·C·V²·f proportionally. The synthesizer detects enabled register patterns (if (EN) Q <= D) and replaces the plain DFF + logic with an ICG cell (integrated latch + AND gate). The ICG samples EN on the negative clock edge (glitch-free), gates the clock, and prevents glitch-induced spurious toggling. Typical savings: 20–40% of total chip dynamic power. Minimum bitwidth threshold (4–8 bits) is set to avoid ICG overhead for small register groups.
What is the difference between compile and compile_ultra in Design Compiler?
compile uses medium optimization effort: single-pass technology mapping, basic timing-driven optimization, and standard cell sizing. compile_ultra enables all advanced algorithms: TDC (timing-driven compile with estimated placement for wire delay accuracy), datapath extraction (DesignWare component recognition), register retiming, constant propagation, and multiple incremental compile passes. compile_ultra typically improves timing QoR 10–20% vs compile at 3–10× longer runtime. Use compile for early design exploration and compile_ultra for final synthesis.
How do you fix timing violations that remain after compile_ultra?
When compile_ultra can't fix remaining violations: (1) Review critical path — often a long wire creates the delay that synthesis can't see (wire-load model mismatch). (2) Manually upsize critical cells. (3) Restructure RTL on the critical path — break into pipeline stages, reduce logic levels, replace behavioral arithmetic with DesignWare components. (4) Relax non-critical paths using set_false_path or set_multicycle_path to free up the optimizer. (5) Increase clock uncertainty margin to force more aggressive optimization. (6) Hand the design to P&R with current netlist and perform ECO synthesis after routing when real wire delays are known.
What is the compile_ultra -no_autoungroup flag and why is it used?
By default, compile_ultra flattens (autoungroups) module hierarchies for cross-boundary optimization. This can destroy the hierarchical structure needed for P&R (partition-based placement, hierarchical route). -no_autoungroup preserves the hierarchy as specified in RTL. Some modules should still be ungrouped for better optimization — use set_ungroup on specific sub-modules where timing is tight across their boundaries. The P&R engineer typically specifies which hierarchical partitions must be preserved.