What is multi-Vt cell selection in synthesis?

Modern standard cell libraries offer cells at multiple threshold voltages: HVT (high-Vt) has high threshold — lower leakage, slower; LVT (low-Vt) has low threshold — higher leakage, faster; SVT is standard. The synthesizer places LVT cells only on critical timing paths that need speed, and uses HVT cells everywhere else to save leakage power. This multi-Vt strategy can reduce leakage by 30–50% with minimal timing impact.

What is clock gating in synthesis and how does it save power?

Clock gating inserts an ICG (Integrated Clock Gating) cell — a latch + AND gate — in the clock path of a flip-flop group. When the enable signal is low, the clock is gated off and the flip-flops don't toggle, eliminating dynamic switching power. The synthesizer automatically inserts ICG cells when it detects enabled registers: always @(posedge CLK) if (EN) Q <= D. Clock gating typically saves 20–40% of total chip dynamic power.

What is retiming in synthesis?

Retiming moves flip-flops across combinational logic boundaries to balance pipeline stage delays without changing the circuit's input-output behavior. If stage A has 1.5 ns path and stage B has 0.5 ns in a 1 GHz design, retiming can redistribute logic to give both stages 1.0 ns — both now meet the 1 ns clock period. Enabled by compile_ultra -retime in Design Compiler.

How do you fix timing violations in synthesis?

Fix synthesis timing violations by: (1) upsizing cells on the critical path (larger drive strength = faster); (2) reducing fanout by inserting buffers or duplicating logic; (3) restructuring logic to reduce levels (fewer gate stages on path); (4) moving to LVT cells for critical paths; (5) retiming to redistribute logic across FFs; (6) relaxing constraints on non-critical paths to free up routing resources. As a last resort, pipeline the design to insert registers and break long paths.

Area, Power & Timing Optimization in Synthesis

Q: What is retiming in synthesis?

Retiming moves flip-flops across combinational logic boundaries to balance pipeline stage delays without changing the circuit's input-output behavior. If stage A has 1.5 ns path and stage B has 0.5 ns in a 1 GHz design, retiming can redistribute logic to give both stages 1.0 ns — both now meet the 1 ns clock period. Enabled by compile_ultra -retime in Design Compiler.

Q: How do you fix timing violations in synthesis?

Fix synthesis timing violations by: (1) upsizing cells on the critical path (larger drive strength = faster); (2) reducing fanout by inserting buffers or duplicating logic; (3) restructuring logic to reduce levels (fewer gate stages on path); (4) moving to LVT cells for critical paths; (5) retiming to redistribute logic across FFs; (6) relaxing constraints on non-critical paths to free up routing resources. As a last resort, pipeline the design to insert registers and break long paths.

The Trade-off Triangle

Power, Performance, Area — PPA Triangle

Every synthesis optimization involves trade-offs in the PPA triangle — improving one axis often degrades another. The goal is balanced PPA for the target specification.

Timing Optimization

Fixing Timing — Negative Slack (WNS < 0)

Retiming moves logic (and flip-flop boundaries) to balance pipeline stage delays — without changing the circuit's functional behavior.

Technique	How It Helps Timing	DC Command
Cell upsizing	Replace X1 with X2/X4 drive-strength cell — faster output, more current	size_cell u_and/NAND2X2
Buffer insertion	Break high-fanout net — each buffered copy drives fewer loads → faster	insert_buffer -max_fanout 20
Logic restructuring	Reduce logic levels — 3-input chain vs 2-level tree reduces delay	compile_ultra (auto)
Logic duplication	Duplicate cell feeding multiple fanout points — each copy drives fewer loads	compile_ultra (auto with -dup)
Retiming	Move FFs across combo to balance stages	compile_ultra -retime
LVT cell swap	Use low-Vt variant (faster, more leakage) on critical path cells	set_attribute [critical cells] lib_cell LVT_variant
Pipelining	Add register stage to break path — increases latency, allows higher Fmax	Manual RTL change

Power Optimization

Dynamic & Leakage Power Reduction

ICG (Integrated Clock Gating) cells gate the clock off when the enable signal is low — flip-flops don't toggle and dynamic power drops 20–40%.

Dynamic Power

P = α · C · V² · f

α = activity factor (switching probability)
C = net capacitance (pF)
V = supply voltage
f = clock frequency

Reduce by: clock gating (↓ α), voltage scaling (↓ V), operand isolation, power gating idle blocks.

Leakage Power

I_leak × V_dd

Subthreshold + gate leakage when transistors are "off". Flows even when circuit is idle.

Reduce by: HVT cells for non-critical paths (higher Vt = lower leakage), multi-Vt strategy, power gating (sleep transistor), body biasing (negative Vbs for PMOS in standby mode).

## Design Compiler — Power Optimization

# Enable clock gating inference (minimum register size = 4)
set_clock_gating_style -minimum_bitwidth 4 \
  -control_point before \
  -control_signal scan_enable

# Run power-focused compile
compile_ultra -gate_clock         # enables ICG insertion

# Leakage optimization
set_multi_vth_constraint -threshold_voltage_groups {HVT SVT LVT} \
  -cell_slack_limit 0.1         # use HVT where slack > 100ps

compile_ultra -leakage_power     # swap non-critical cells to HVT

# Report power
report_power -hierarchy

Multi-Vt Strategy

Multi-Threshold Voltage Cell Selection

Cell Type	Threshold Voltage	Speed	Leakage	When to Use
LVT (Low-Vt)	Low (~0.25V)	Fastest	Highest (10–100×)	Critical timing paths only (WNS close to 0)
SVT (Standard)	Medium (~0.40V)	Medium	Medium	Default — moderate timing paths
HVT (High-Vt)	High (~0.55V)	Slowest	Lowest	Non-critical paths with large positive slack
ULVT (Ultra-LVT)	Very low	Ultra fast	Very high	Only most critical endpoints; use sparingly

Rule of thumb: start with all SVT. Run compile_ultra -leakage_power to automatically swap cells with slack > threshold to HVT. Then swap remaining negative-slack cells to LVT. This "HVT flooding" approach can reduce leakage 30–50%.

Area Optimization

Area Reduction Techniques

Technique	How It Reduces Area	Trade-off
Boolean minimization	Reduces logic cone to minimum SOP/POS representation	May slow paths if deeper logic tree
Resource sharing	Multiple operations (e.g. 3 adders) share one adder with mux at input	Mux adds delay — may fail timing
Constant propagation	Compile-time evaluation removes unreachable logic	None — free optimization
Hierarchy flattening	Merge sub-modules — cross-boundary optimization finds more redundancy	Longer compile time; hierarchy lost
set_max_area	Directs tool to optimize for area after timing met: set_max_area 0	Tool swaps larger cells for smaller after timing closure
DesignWare components	Highly optimized arithmetic cells (dw_add, dw_mult) smaller than RTL-inferred equivalents	Requires DesignWare license

Interview Q&A

Optimization Interview Questions

What is multi-Vt optimization and why is it important?

Modern standard cell libraries offer cells at multiple threshold voltages (LVT, SVT, HVT). LVT cells are faster but leak more; HVT cells are slower but much less leaky. Synthesis assigns LVT cells only to timing-critical paths and uses HVT elsewhere — this "multi-Vt strategy" maintains timing performance while reducing leakage power by 30–50%. The synthesizer automates this with compile_ultra -leakage_power (DC) or set_multi_vth_constraint.

How does clock gating save power and how does the synthesizer insert it?

Clock gating removes the clock from flip-flops when their data won't change — α (activity factor) drops from 1 to the true switching rate, reducing P_dynamic = α·C·V²·f proportionally. The synthesizer detects enabled register patterns (if (EN) Q <= D) and replaces the plain DFF + logic with an ICG cell (integrated latch + AND gate). The ICG samples EN on the negative clock edge (glitch-free), gates the clock, and prevents glitch-induced spurious toggling. Typical savings: 20–40% of total chip dynamic power. Minimum bitwidth threshold (4–8 bits) is set to avoid ICG overhead for small register groups.

What is the difference between compile and compile_ultra in Design Compiler?

compile uses medium optimization effort: single-pass technology mapping, basic timing-driven optimization, and standard cell sizing. compile_ultra enables all advanced algorithms: TDC (timing-driven compile with estimated placement for wire delay accuracy), datapath extraction (DesignWare component recognition), register retiming, constant propagation, and multiple incremental compile passes. compile_ultra typically improves timing QoR 10–20% vs compile at 3–10× longer runtime. Use compile for early design exploration and compile_ultra for final synthesis.

How do you fix timing violations that remain after compile_ultra?

When compile_ultra can't fix remaining violations: (1) Review critical path — often a long wire creates the delay that synthesis can't see (wire-load model mismatch). (2) Manually upsize critical cells. (3) Restructure RTL on the critical path — break into pipeline stages, reduce logic levels, replace behavioral arithmetic with DesignWare components. (4) Relax non-critical paths using set_false_path or set_multicycle_path to free up the optimizer. (5) Increase clock uncertainty margin to force more aggressive optimization. (6) Hand the design to P&R with current netlist and perform ECO synthesis after routing when real wire delays are known.

What is the compile_ultra -no_autoungroup flag and why is it used?

By default, compile_ultra flattens (autoungroups) module hierarchies for cross-boundary optimization. This can destroy the hierarchical structure needed for P&R (partition-based placement, hierarchical route). -no_autoungroup preserves the hierarchy as specified in RTL. Some modules should still be ungrouped for better optimization — use set_ungroup on specific sub-modules where timing is tight across their boundaries. The P&R engineer typically specifies which hierarchical partitions must be preserved.

← SDC Constraints Technology Mapping →