DAY 12 · TIMING CLOSURE & SIGN-OFF

Fixing Setup Violations

Q: What is the most effective first step for fixing setup violations?

The most effective first step is cell upsizing on the critical path. Replacing a weak driver (e.g., BUFX2) with a stronger one (BUFX8) reduces gate delay and output slew, which also reduces the next cell's delay. This is fast to implement and does not change the logic. However, upsizing increases power and may worsen congestion, so it's a tool for addressing the worst few paths, not a brute-force solution for hundreds of paths.

Q: What is an LVT cell swap for setup fixing?

LVT (Low-Vt threshold) cells have a lower threshold voltage than standard (SVT) cells. This makes them faster because transistors turn on more easily. Swapping a critical path cell from SVT to LVT reduces its delay, improving setup slack. The trade-off is significantly higher leakage current (5-10x), which increases static power. LVT cells are used sparingly on the most critical paths.

Q: When is pipeline register insertion used for setup?

Pipeline register insertion splits a long combinational path into two shorter stages by inserting a flip-flop in the middle. Each stage now needs to meet only half the timing budget. This is the most powerful technique for fundamentally long paths (e.g., floating-point arithmetic, complex ALU paths) but it adds one cycle of latency to the computation, which must be acceptable at the architecture level.

Q: What is useful skew for setup violation fixing?

Useful skew intentionally delays the capture flip-flop's clock arrival relative to the launch flip-flop's clock. By making the capture clock arrive later, the effective required time increases, giving the data more time to arrive. The clock tree is adjusted to deliver the clock slightly later to the capture FF. The amount of useful skew must be carefully limited to avoid creating hold violations on the skewed path.

Q: When should you accept a timing exception instead of fixing a setup violation?

A timing exception (set_multicycle_path or set_false_path) is only acceptable when there is a genuine architectural justification: the path is logically impossible to sensitise, or data on this path legitimately takes multiple cycles. Using exceptions to hide real violations is dangerous and can cause silicon failures. A valid multicycle path must be documented with which architectural condition guarantees the multi-cycle property.

By EcrioniX · Updated June 2026

Setup violations mean your critical paths are too slow for the target clock frequency. Fixing them is the central activity of timing closure — an iterative, systematic process that demands understanding which technique to apply to which type of violation. The goal is to reduce path delay until slack ≥ 0 at every corner, while minimising the collateral damage to power, area, and hold timing.

1. The timing closure loop

Setup closure is never done in one pass. It is an iterative loop that converges as violations are progressively fixed:

Each iteration should address a specific category of violations — not random individual paths. Fixing one path often shifts the critical path to the next-worst, so working on the statistically worst group each iteration converges faster than fixing paths one by one.

2. Fix techniques ordered by cost and impact

Technique	Delay reduction	Area cost	Power cost	When to use
Buffer removal on critical path	20–50ps per buffer	Saves area	Saves power	When excessive buffering (for fanout/SI) is on the critical path
Cell upsizing	30–100ps per cell	+10–30%	+10–30%	First response for setup violations on critical cells
LVT cell swap (SVT→LVT)	50–150ps	Same	+300–1000% leakage	Last resort for critical cells when upsizing is insufficient
Useful skew	20–80ps effective	CTS change	Negligible	When timing budget can be shifted from one path pair to another
Logic restructuring	Path-specific	Variable	Variable	When logic depth is the fundamental limit (re-synthesis)
Placement optimisation	30–200ps (wire)	No change	No change	When wire delay dominates the critical path
Pipeline insertion	Full path split	+FF area	+FF power	When path fundamentally exceeds one clock cycle even after all other fixes

3. Cell upsizing

Cell upsizing replaces a cell with a larger drive-strength version of the same function. A stronger driver has lower output resistance, which charges the output capacitance faster. This reduces both cell delay and output slew, which also speeds up subsequent cells in the path.

Cell upsizing ECO in PrimeTime

## Identify cells contributing most delay on worst path report_timing -nworst 1 -path_type full ## Target the cell with highest Incr value on the critical path ## Example: AND2X2 has 0.09 ns delay; try AND2X8 (stronger) size_cell [get_cells u_core/U_and1] AND2X8 ## Verify improvement update_timing report_timing -from [get_cells u_core/ff_A] -to [get_cells u_core/ff_B] -nworst 1 ## Automated sizing: optimize worst N paths optimize_critical_paths -slack_threshold 0 -max_paths 100 ## Check for hold side effects from upsizing ## Upsizing = faster cell = shorter min delay = more hold risk report_timing -delay_type min -nworst 10 ## Size multiple cells at once (ECO approach) foreach cell {u_core/U1 u_core/U2 u_core/U4} { set cur [get_attribute [get_cells $cell] ref_name] puts "Sizing $cell from $cur" } size_cell [get_cells u_core/U1] BUFX8 size_cell [get_cells u_core/U2] OR2X8 size_cell [get_cells u_core/U4] AND3X4

4. LVT cell swapping

Low-Vt (LVT) cells are the same logic function as standard-Vt (SVT) but with a lower threshold voltage. Lower Vt means transistors turn on earlier, giving faster switching at the cost of dramatically higher subthreshold leakage.

Typical speed gain from SVT→LVT: 15–30% delay reduction per cell
Typical leakage increase: 5–10× per cell
Best strategy: use LVT only on the top 5–10% of critical cells; do not blanket-swap

LVT cell swapping in post-synthesis ECO

## Identify cells on timing-critical paths for LVT swap ## Typical: swap cells contributing >50% of path delay ## List cells on worst path set worst_path [get_timing_paths -delay_type max -nworst 1] set critical_cells [get_cells -of [get_timing_path_pins $worst_path]] ## Check current Vt of cells foreach cell $critical_cells { puts "[get_object_name $cell]: [get_attribute $cell ref_name]" } ## Swap specific cell from SVT to LVT ## Library naming: BUFX4 -> BUFX4_LVT size_cell [get_cells u_core/U_crit] BUFX4_LVT size_cell [get_cells u_core/U_and2] AND2X4_LVT ## After LVT swap, verify leakage impact report_power -cell [get_cells {u_core/U_crit u_core/U_and2}] ## Check hold: LVT is faster, so hold slack decreases report_timing -delay_type min -through [get_cells u_core/U_crit] -nworst 5

5. Buffer removal on the critical path

Paradoxically, removing buffers can improve setup timing. Buffers added for fanout control or signal integrity add delay to the path. If the fanout situation has changed (after placement optimisation) or the buffer was over-estimated as necessary, removing it saves 20–50ps per stage.

Look for buffers in the timing report where: (1) the buffer’s fanout is 1 (only one receiver), or (2) the net capacitance after the buffer is small (the buffer wasn’t needed), or (3) the buffer was inserted during CTS but doesn’t carry a clock signal.

6. Logic restructuring

When a path has too many logic levels (gates in series), reducing the gate count through restructuring is more powerful than any ECO fix. Common techniques:

Logic flattening — re-synthesise a logic cone without hierarchy constraints; the synthesiser may find a more parallel implementation
Retiming — move registers across combinational logic to balance path lengths (done in synthesis with compile_ultra -retime)
Carry-lookahead for adders — replace a ripple-carry adder with a parallel prefix adder if adder paths are critical
Factor extraction — re-express logic in a form that allows more sharing, reducing depth

Logic restructuring in Design Compiler

## In Design Compiler: identify cells with highest fanin contributing to violations report_timing -path_type full -nworst 10 ## For a specific block, re-synthesise with restructuring enabled compile_ultra -retime ; # retime registers across paths compile_ultra -incremental ; # incremental optimisation only ## Target a specific path for restructuring group_path -name critical_adder \ -from [get_cells u_alu/add_in_reg/*] \ -to [get_cells u_alu/result_reg/*] compile_ultra -only_design_rule_fix -incremental ## In Genus: restructure-aware synthesis syn_opt -effort ultra time_design -pre_place restructure time_design -pre_place ; # check if restructuring helped

7. Pipeline register insertion

If a path fundamentally has too many logic levels — even after cell sizing, LVT swapping, and placement optimisation — the only path to fixing it is adding a pipeline register to split it into two stages. Each stage now has only half the logic depth.

The cost is one additional clock cycle of latency on that computation. This must be architecturally acceptable: downstream logic must be updated to account for the extra cycle. Retiming in synthesis can automate this across the whole design.

Pipeline insertion (Tcl ECO + synthesis)

## Identify a long path candidate for pipelining ## Slack: -0.250 ns, path delay: 1.950 ns, clock period: 1.0 ns ## -> path needs to be split into 2 stages of ~0.975 ns each ## In RTL (Verilog): insert a pipeline register // Before: result = A + B + C + D (long combinational chain) // After: // always @(posedge clk) pipe_stage <= A + B; // Stage 1 // always @(posedge clk) result <= pipe_stage + C + D; // Stage 2 ## In synthesis (post-RTL fix): use retiming set_optimize_registers true -design [get_designs ADDER] compile_ultra -retime -sequential_area_recovery ## Verify after retiming report_timing -nworst 5 -group adder_path report_register -level_sensitive ; # ensure no latches created

8. Useful skew for setup

Useful skew intentionally introduces a clock arrival time difference between launch and capture flip-flops to benefit timing. If the capture FF’s clock arrives slightly later than the launch FF’s clock, the required time effectively increases, giving data more margin.

Useful skew is applied during CTS by adjusting clock buffer delays to individual clock sinks. The maximum useful skew is limited by the hold requirement on the same path (introducing too much skew will cause a hold violation in the other direction).

Useful skew constraints for CTS (ICC2 / Innovus)

## In ICC2: apply useful skew target for specific flip-flops ## Delay capture FF clock by 100 ps to gain 100 ps setup margin set_clock_latency -late 0.100 [get_cells u_core/ff_capture] ## Or set target insertion delay for the capture sink # (Tool will adjust buffer sizing to hit this target) set_clock_balance_point -delay 0.650 [get_pins u_core/ff_capture/CK] # vs nominal insertion delay of 0.550 for other FFs in this domain ## After CTS: verify skew and check hold report_clock_timing -type skew -clock clk_core -nworst 10 report_timing -delay_type min -nworst 10 ; # check hold on this path!

Every setup improvement can create a hold violation

Every technique that reduces the data arrival time (faster data) or increases the required time (later capture clock) also reduces hold margin on the same path. After any setup fix, always run report_timing -delay_type min on the affected paths. Cell upsizing, LVT swap, and useful skew are the three most common inadvertent creators of hold violations during setup closure.

9. Physical optimisation

After gate-level fixes, placement and routing can also help:

Place critical cells closer — shorter wire = lower wire RC = less wire delay; 30–100ps possible
Remove routing detours — reroute congested nets to take a shorter path
Net widening — wider wires have lower resistance; helps long nets on critical paths
Layer upgrade — upper metal layers have lower resistance; route critical nets on higher metal

Day 12 Key Takeaways

Timing closure is iterative: run STA → analyse WNS/TNS → fix worst category → verify → repeat
Cell upsizing — fastest ECO fix; increases drive strength; use for the worst few cells on critical paths
LVT swap — 15–30% delay reduction per cell; 5–10× leakage penalty; use sparingly on most critical cells
Buffer removal — paradoxically reduces delay on over-buffered critical paths; check fanout first
Useful skew — delays capture clock arrival to give data more time; limited by hold margin on same path
Pipeline insertion — architectural fix for fundamentally long paths; costs one cycle of latency
Every setup fix risks hold: always check hold timing after applying any setup optimisation

Frequently Asked Questions

What is the most effective first step for fixing setup violations?

The most effective immediate step is cell upsizing on the critical path. A stronger driver reduces gate delay and output slew, improving multiple downstream cells. It doesn’t change logic, doesn’t affect functionality, and is fast to implement in an ECO flow. Follow it by checking hold violations on the same path, as upsizing speeds up the cell and can reduce hold margin.

What is an LVT cell swap for setup fixing?

LVT (Low Threshold Voltage) cells switch faster than SVT cells because transistors turn on at a lower gate voltage. Swapping a critical path cell from SVT to LVT can reduce its delay by 15–30%. The trade-off is 5–10× higher leakage current on that cell, increasing static power. LVT should be used only on the most critical cells where other techniques have been exhausted.

When is pipeline register insertion used for setup?

Pipeline insertion is used when a path is fundamentally too long — even after upsizing, LVT swapping, and placement optimisation, the path delay exceeds one clock period. Inserting a flip-flop in the middle of the path splits it into two shorter stages. Each stage must meet only half the timing budget. The cost is one additional clock cycle of latency, which must be architecturally acceptable.

What is useful skew for setup violation fixing?

Useful skew delays the capture flip-flop’s clock arrival relative to the launch flip-flop, effectively increasing the required time for that path. By arriving later, the capture window opens later, giving the data more time to propagate. The CTS tool can implement this by adjusting local buffer delays. The maximum amount of useful skew is bounded by the hold margin on the same path.

When should you accept a timing exception instead of fixing a setup violation?

A multicycle path exception is appropriate when the path is architecturally guaranteed to have multiple clock cycles before data is used. The justification must be explicit: which handshake, enable signal, or protocol ensures the multi-cycle property? Never apply exceptions to avoid fixing real single-cycle violations — that causes silicon failures. Document every exception with the architectural proof of why it is safe.