HomeSTA CourseDay 12
DAY 12 · TIMING CLOSURE & SIGN-OFF

Fixing Setup Violations

By EcrioniX · Updated June 2026

Setup violations mean your critical paths are too slow for the target clock frequency. Fixing them is the central activity of timing closure — an iterative, systematic process that demands understanding which technique to apply to which type of violation. The goal is to reduce path delay until slack ≥ 0 at every corner, while minimising the collateral damage to power, area, and hold timing.

1. The timing closure loop

Setup closure is never done in one pass. It is an iterative loop that converges as violations are progressively fixed:

Iterative Timing Closure Loop Run STA report_timing Analyse WNS, TNS, paths Apply Fix ECO / re-synth Verify WNS ≥ 0? Not clean — iterate DONE!

Each iteration should address a specific category of violations — not random individual paths. Fixing one path often shifts the critical path to the next-worst, so working on the statistically worst group each iteration converges faster than fixing paths one by one.

2. Fix techniques ordered by cost and impact

TechniqueDelay reductionArea costPower costWhen to use
Buffer removal on critical path20–50ps per bufferSaves areaSaves powerWhen excessive buffering (for fanout/SI) is on the critical path
Cell upsizing30–100ps per cell+10–30%+10–30%First response for setup violations on critical cells
LVT cell swap (SVT→LVT)50–150psSame+300–1000% leakageLast resort for critical cells when upsizing is insufficient
Useful skew20–80ps effectiveCTS changeNegligibleWhen timing budget can be shifted from one path pair to another
Logic restructuringPath-specificVariableVariableWhen logic depth is the fundamental limit (re-synthesis)
Placement optimisation30–200ps (wire)No changeNo changeWhen wire delay dominates the critical path
Pipeline insertionFull path split+FF area+FF powerWhen path fundamentally exceeds one clock cycle even after all other fixes

3. Cell upsizing

Cell upsizing replaces a cell with a larger drive-strength version of the same function. A stronger driver has lower output resistance, which charges the output capacitance faster. This reduces both cell delay and output slew, which also speeds up subsequent cells in the path.

Cell upsizing ECO in PrimeTime
## Identify cells contributing most delay on worst path report_timing -nworst 1 -path_type full ## Target the cell with highest Incr value on the critical path ## Example: AND2X2 has 0.09 ns delay; try AND2X8 (stronger) size_cell [get_cells u_core/U_and1] AND2X8 ## Verify improvement update_timing report_timing -from [get_cells u_core/ff_A] -to [get_cells u_core/ff_B] -nworst 1 ## Automated sizing: optimize worst N paths optimize_critical_paths -slack_threshold 0 -max_paths 100 ## Check for hold side effects from upsizing ## Upsizing = faster cell = shorter min delay = more hold risk report_timing -delay_type min -nworst 10 ## Size multiple cells at once (ECO approach) foreach cell {u_core/U1 u_core/U2 u_core/U4} { set cur [get_attribute [get_cells $cell] ref_name] puts "Sizing $cell from $cur" } size_cell [get_cells u_core/U1] BUFX8 size_cell [get_cells u_core/U2] OR2X8 size_cell [get_cells u_core/U4] AND3X4

4. LVT cell swapping

Low-Vt (LVT) cells are the same logic function as standard-Vt (SVT) but with a lower threshold voltage. Lower Vt means transistors turn on earlier, giving faster switching at the cost of dramatically higher subthreshold leakage.

LVT cell swapping in post-synthesis ECO
## Identify cells on timing-critical paths for LVT swap ## Typical: swap cells contributing >50% of path delay ## List cells on worst path set worst_path [get_timing_paths -delay_type max -nworst 1] set critical_cells [get_cells -of [get_timing_path_pins $worst_path]] ## Check current Vt of cells foreach cell $critical_cells { puts "[get_object_name $cell]: [get_attribute $cell ref_name]" } ## Swap specific cell from SVT to LVT ## Library naming: BUFX4 -> BUFX4_LVT size_cell [get_cells u_core/U_crit] BUFX4_LVT size_cell [get_cells u_core/U_and2] AND2X4_LVT ## After LVT swap, verify leakage impact report_power -cell [get_cells {u_core/U_crit u_core/U_and2}] ## Check hold: LVT is faster, so hold slack decreases report_timing -delay_type min -through [get_cells u_core/U_crit] -nworst 5

5. Buffer removal on the critical path

Paradoxically, removing buffers can improve setup timing. Buffers added for fanout control or signal integrity add delay to the path. If the fanout situation has changed (after placement optimisation) or the buffer was over-estimated as necessary, removing it saves 20–50ps per stage.

Look for buffers in the timing report where: (1) the buffer’s fanout is 1 (only one receiver), or (2) the net capacitance after the buffer is small (the buffer wasn’t needed), or (3) the buffer was inserted during CTS but doesn’t carry a clock signal.

6. Logic restructuring

When a path has too many logic levels (gates in series), reducing the gate count through restructuring is more powerful than any ECO fix. Common techniques:

Logic restructuring in Design Compiler
## In Design Compiler: identify cells with highest fanin contributing to violations report_timing -path_type full -nworst 10 ## For a specific block, re-synthesise with restructuring enabled compile_ultra -retime ; # retime registers across paths compile_ultra -incremental ; # incremental optimisation only ## Target a specific path for restructuring group_path -name critical_adder \ -from [get_cells u_alu/add_in_reg/*] \ -to [get_cells u_alu/result_reg/*] compile_ultra -only_design_rule_fix -incremental ## In Genus: restructure-aware synthesis syn_opt -effort ultra time_design -pre_place restructure time_design -pre_place ; # check if restructuring helped

7. Pipeline register insertion

If a path fundamentally has too many logic levels — even after cell sizing, LVT swapping, and placement optimisation — the only path to fixing it is adding a pipeline register to split it into two stages. Each stage now has only half the logic depth.

The cost is one additional clock cycle of latency on that computation. This must be architecturally acceptable: downstream logic must be updated to account for the extra cycle. Retiming in synthesis can automate this across the whole design.

Pipeline insertion (Tcl ECO + synthesis)
## Identify a long path candidate for pipelining ## Slack: -0.250 ns, path delay: 1.950 ns, clock period: 1.0 ns ## -> path needs to be split into 2 stages of ~0.975 ns each ## In RTL (Verilog): insert a pipeline register // Before: result = A + B + C + D (long combinational chain) // After: // always @(posedge clk) pipe_stage <= A + B; // Stage 1 // always @(posedge clk) result <= pipe_stage + C + D; // Stage 2 ## In synthesis (post-RTL fix): use retiming set_optimize_registers true -design [get_designs ADDER] compile_ultra -retime -sequential_area_recovery ## Verify after retiming report_timing -nworst 5 -group adder_path report_register -level_sensitive ; # ensure no latches created

8. Useful skew for setup

Useful skew intentionally introduces a clock arrival time difference between launch and capture flip-flops to benefit timing. If the capture FF’s clock arrives slightly later than the launch FF’s clock, the required time effectively increases, giving data more margin.

Useful skew is applied during CTS by adjusting clock buffer delays to individual clock sinks. The maximum useful skew is limited by the hold requirement on the same path (introducing too much skew will cause a hold violation in the other direction).

Useful skew constraints for CTS (ICC2 / Innovus)
## In ICC2: apply useful skew target for specific flip-flops ## Delay capture FF clock by 100 ps to gain 100 ps setup margin set_clock_latency -late 0.100 [get_cells u_core/ff_capture] ## Or set target insertion delay for the capture sink # (Tool will adjust buffer sizing to hit this target) set_clock_balance_point -delay 0.650 [get_pins u_core/ff_capture/CK] # vs nominal insertion delay of 0.550 for other FFs in this domain ## After CTS: verify skew and check hold report_clock_timing -type skew -clock clk_core -nworst 10 report_timing -delay_type min -nworst 10 ; # check hold on this path!

Every setup improvement can create a hold violation

Every technique that reduces the data arrival time (faster data) or increases the required time (later capture clock) also reduces hold margin on the same path. After any setup fix, always run report_timing -delay_type min on the affected paths. Cell upsizing, LVT swap, and useful skew are the three most common inadvertent creators of hold violations during setup closure.

9. Physical optimisation

After gate-level fixes, placement and routing can also help:

Day 12 Key Takeaways

Frequently Asked Questions

What is the most effective first step for fixing setup violations?

The most effective immediate step is cell upsizing on the critical path. A stronger driver reduces gate delay and output slew, improving multiple downstream cells. It doesn’t change logic, doesn’t affect functionality, and is fast to implement in an ECO flow. Follow it by checking hold violations on the same path, as upsizing speeds up the cell and can reduce hold margin.

What is an LVT cell swap for setup fixing?

LVT (Low Threshold Voltage) cells switch faster than SVT cells because transistors turn on at a lower gate voltage. Swapping a critical path cell from SVT to LVT can reduce its delay by 15–30%. The trade-off is 5–10× higher leakage current on that cell, increasing static power. LVT should be used only on the most critical cells where other techniques have been exhausted.

When is pipeline register insertion used for setup?

Pipeline insertion is used when a path is fundamentally too long — even after upsizing, LVT swapping, and placement optimisation, the path delay exceeds one clock period. Inserting a flip-flop in the middle of the path splits it into two shorter stages. Each stage must meet only half the timing budget. The cost is one additional clock cycle of latency, which must be architecturally acceptable.

What is useful skew for setup violation fixing?

Useful skew delays the capture flip-flop’s clock arrival relative to the launch flip-flop, effectively increasing the required time for that path. By arriving later, the capture window opens later, giving the data more time to propagate. The CTS tool can implement this by adjusting local buffer delays. The maximum amount of useful skew is bounded by the hold margin on the same path.

When should you accept a timing exception instead of fixing a setup violation?

A multicycle path exception is appropriate when the path is architecturally guaranteed to have multiple clock cycles before data is used. The justification must be explicit: which handshake, enable signal, or protocol ensures the multi-cycle property? Never apply exceptions to avoid fixing real single-cycle violations — that causes silicon failures. Document every exception with the architectural proof of why it is safe.

← Previous
Day 11: Reading Timing Reports