Power, Performance, Area — PPA Triangle
Fixing Timing — Negative Slack (WNS < 0)
| Technique | How It Helps Timing | DC Command |
|---|---|---|
| Cell upsizing | Replace X1 with X2/X4 drive-strength cell — faster output, more current | size_cell u_and/NAND2X2 |
| Buffer insertion | Break high-fanout net — each buffered copy drives fewer loads → faster | insert_buffer -max_fanout 20 |
| Logic restructuring | Reduce logic levels — 3-input chain vs 2-level tree reduces delay | compile_ultra (auto) |
| Logic duplication | Duplicate cell feeding multiple fanout points — each copy drives fewer loads | compile_ultra (auto with -dup) |
| Retiming | Move FFs across combo to balance stages | compile_ultra -retime |
| LVT cell swap | Use low-Vt variant (faster, more leakage) on critical path cells | set_attribute [critical cells] lib_cell LVT_variant |
| Pipelining | Add register stage to break path — increases latency, allows higher Fmax | Manual RTL change |
Dynamic & Leakage Power Reduction
P = α · C · V² · f
α = activity factor (switching probability)
C = net capacitance (pF)
V = supply voltage
f = clock frequency
Reduce by: clock gating (↓ α), voltage scaling (↓ V), operand isolation, power gating idle blocks.
I_leak × V_dd
Subthreshold + gate leakage when transistors are "off". Flows even when circuit is idle.
Reduce by: HVT cells for non-critical paths (higher Vt = lower leakage), multi-Vt strategy, power gating (sleep transistor), body biasing (negative Vbs for PMOS in standby mode).
## Design Compiler — Power Optimization # Enable clock gating inference (minimum register size = 4) set_clock_gating_style -minimum_bitwidth 4 \ -control_point before \ -control_signal scan_enable # Run power-focused compile compile_ultra -gate_clock # enables ICG insertion # Leakage optimization set_multi_vth_constraint -threshold_voltage_groups {HVT SVT LVT} \ -cell_slack_limit 0.1 # use HVT where slack > 100ps compile_ultra -leakage_power # swap non-critical cells to HVT # Report power report_power -hierarchy
Multi-Threshold Voltage Cell Selection
| Cell Type | Threshold Voltage | Speed | Leakage | When to Use |
|---|---|---|---|---|
| LVT (Low-Vt) | Low (~0.25V) | Fastest | Highest (10–100×) | Critical timing paths only (WNS close to 0) |
| SVT (Standard) | Medium (~0.40V) | Medium | Medium | Default — moderate timing paths |
| HVT (High-Vt) | High (~0.55V) | Slowest | Lowest | Non-critical paths with large positive slack |
| ULVT (Ultra-LVT) | Very low | Ultra fast | Very high | Only most critical endpoints; use sparingly |
Rule of thumb: start with all SVT. Run compile_ultra -leakage_power to automatically swap cells with slack > threshold to HVT. Then swap remaining negative-slack cells to LVT. This "HVT flooding" approach can reduce leakage 30–50%.
Area Reduction Techniques
| Technique | How It Reduces Area | Trade-off |
|---|---|---|
| Boolean minimization | Reduces logic cone to minimum SOP/POS representation | May slow paths if deeper logic tree |
| Resource sharing | Multiple operations (e.g. 3 adders) share one adder with mux at input | Mux adds delay — may fail timing |
| Constant propagation | Compile-time evaluation removes unreachable logic | None — free optimization |
| Hierarchy flattening | Merge sub-modules — cross-boundary optimization finds more redundancy | Longer compile time; hierarchy lost |
| set_max_area | Directs tool to optimize for area after timing met: set_max_area 0 | Tool swaps larger cells for smaller after timing closure |
| DesignWare components | Highly optimized arithmetic cells (dw_add, dw_mult) smaller than RTL-inferred equivalents | Requires DesignWare license |