What is timing closure in VLSI?

Timing closure is the process of iteratively fixing setup and hold timing violations in a chip design until all timing paths meet their constraints at the signoff corner. It involves a combination of placement optimization, sizing, buffering, routing adjustments, and RTL/logic restructuring guided by the STA tool report.

How do you fix a setup violation?

Setup violations mean the data path is too slow. Fix strategies in order of preference: (1) Logic restructuring — reduce levels of logic in the critical path. (2) Gate sizing — upsize gates on the critical path to drive loads faster. (3) Buffer/inverter insertion — reduce wire capacitance. (4) Retiming — move registers across logic boundaries to balance delays. (5) Clock period increase or frequency reduction as a last resort.

How do you fix a hold violation?

Hold violations mean data arrives at the capture flop too soon — before the hold window closes. Fix by inserting delay (buffers or delay cells) on the short path. Unlike setup, hold fixes are insertion-only and do NOT depend on clock frequency. Hold violations in multi-corner designs must be fixed at the best-case (fast) corner.

What is OCV in timing analysis?

On-Chip Variation (OCV) accounts for the fact that identical cells at different locations on a chip experience slightly different process/voltage/temperature conditions. In STA, OCV applies a derate factor that simultaneously makes the launch path slower (pessimistic) and the capture path faster, or vice versa. CPPR (Clock Path Pessimism Reduction) removes the artificial pessimism introduced where launch and capture paths share common clock buffers.

What is the difference between setup slack and hold slack?

Setup slack = data required time − data arrival time. Positive slack means timing is met; negative means a setup violation. Hold slack = data arrival time − data required time (for the hold check). Positive hold slack means the data holds long enough after the clock edge. A hold violation means the data is changing too soon — it's captured incorrectly by the very edge that just launched it.

STA — Practical

Timing Closure —
Fix Setup & Hold Violations

EcrioniX · STA Series· ~22 min read· Real-world techniques

Setup Violation — Data Arrives Too Late

What Is Timing Closure?

Timing closure is the iterative process of eliminating all setup and hold violations in a chip design until every path meets its timing constraint at every signoff corner (process, voltage, temperature). It sits at the intersection of RTL, synthesis, and physical design — and is one of the most time-consuming phases of a real chip tape-out.

The STA tool (PrimeTime, Tempus, etc.) reports slack for every path:

Positive slack — path meets timing (margin to spare)
Zero slack — exactly meeting the constraint
Negative slack — timing violation — must fix before tapeout

Understanding the Slack Equations

Timing Equations

== SETUP CHECK ==
Data Arrival Time    = launch_clk_edge + Tclk2q + Tcomb + Tnet
Data Required Time   = capture_clk_edge + Tclk_skew - Tsetup

Setup Slack = Required − Arrival     (must be ≥ 0)

== HOLD CHECK ==
Data Arrival Time    = launch_clk_edge + Tclk2q_min + Tcomb_min + Tnet_min
Data Required Time   = capture_clk_edge + Tclk_skew_min + Thold

Hold Slack = Arrival − Required      (must be ≥ 0)

== EXAMPLE (setup violation) ==
Launch edge:   0ns
Tclk2q:        0.15ns
Tcomb:         1.80ns   ← too long — this is the bottleneck
Tnet:          0.12ns
Arrival:       2.07ns

Capture edge:  2.00ns   (500 MHz = 2ns period)
Tsetup:        0.05ns
Required:      1.95ns

Setup Slack = 1.95 − 2.07 = −0.12ns (120ps violation)

Fixing Setup Violations — Techniques in Order

① Logic Restructuring (Best — Zero Area Cost)

Reduce the number of logic levels in the critical path. Restructure Boolean equations, share subexpressions, or pipeline the path. This is the most efficient fix but requires RTL/synthesis changes.

Verilog — Reduce logic levels

// BEFORE: 4-level logic chain (slow)
assign out = ((a & b & c) | (d & e & f)) & (g | h);

// AFTER: Balance levels — same function, 3 levels
wire ab = a & b;
wire abc = ab & c;
wire def = d & e & f;  // now synthesizes in parallel
assign out = (abc | def) & (g | h);

② Gate Sizing — Upsize Cells on Critical Path

Replace a cell on the critical path with a higher-drive-strength version. Faster drive reduces Tpd and capacitive load effect. Done automatically by synthesis/place-route tools, but can be applied manually via ECO.

③ Buffer / Inverter Insertion

Long nets have high capacitance that slows all gates driving them. Insert repeater buffers at midpoints to break the RC delay. On clock paths this is done by CTS; on data paths it's done during routing optimization.

④ Retiming — Move Registers Across Logic

Push registers forward (toward the output) or backward (toward the input) across combinational logic to balance path delays between pipeline stages. Retiming preserves the functional behavior while redistributing delay.

Verilog — Pipeline to fix setup

// BEFORE: 8-level combinational path — critical
always @(posedge clk)
  result <= complex_8_level_logic(a, b, c);

// AFTER: Split into 2 pipeline stages — each 4 levels
logic [7:0] mid_stage;
always @(posedge clk) mid_stage <= first_4_levels(a, b);
always @(posedge clk) result    <= last_4_levels(mid_stage, c);
// Latency increases by 1 cycle — throughput unchanged

⑤ Multi-Cycle Path (MCP) Exception — If Data Rate is Slower

If the data only needs to be valid every N clock cycles (e.g., a divide-by-2 path), tell the STA tool to relax the constraint. This is not a physical fix — it's a specification correction.

SDC — Multi-cycle path

# Data valid every 2 clocks — relax setup by 1 extra period
set_multicycle_path -setup 2 -from [get_cells u_div/q_reg] \
                              -to   [get_cells u_proc/data_reg]

# Also relax hold to avoid over-insertion of hold buffers
set_multicycle_path -hold 1  -from [get_cells u_div/q_reg] \
                              -to   [get_cells u_proc/data_reg]

Fixing Hold Violations

⚠️

Critical insight: Hold violations are insertion-only fixes. You NEVER fix hold by making a path faster (that makes it worse). You always add delay on the short path. Hold fixes also do not depend on clock frequency — they must be fixed at the fastest process/voltage corner.

① Buffer / Delay Cell Insertion on Short Path

Insert delay buffers (or dedicated delay cells, X_DELAY) on the launch path to ensure data arrives at least Thold after the capture clock edge. P&R tools do this automatically in hold fixing mode.

② Avoid Logic that Creates Extremely Short Paths

In RTL, avoid direct register-to-register connections with no combinational logic between them when they share the same clock edge — these create zero-delay "short paths" that will need hold buffers in every corner.

③ Use set_false_path or set_multicycle_path for CDC Paths

Clock domain crossing paths should be marked as false paths in SDC — the STA tool cannot perform meaningful hold analysis on them anyway, and doing so forces unnecessary hold buffer insertion.

SDC — Hold fixes

# Mark CDC path as false — no hold check across async domains
set_false_path -from [get_clocks clk_a] -to [get_clocks clk_b]

# Set min delay to avoid hold violation (add 0.1ns minimum delay)
set_min_delay 0.1 -from [get_cells ff_launch] -to [get_cells ff_capture]

# Check hold at fast corner (best-case timing = worst hold)
read_sdc design.sdc
set_operating_conditions -min ff_0p95v_-40c -max ss_0p85v_125c

OCV and CPPR

Concept	What It Does	Effect on Slack
OCV (On-Chip Variation)	Applies derate: launch path slower, capture path faster (setup) or vice versa	Reduces slack by adding pessimism
AOCV (Advanced OCV)	Distance/depth-aware derate — cells far apart have more variation	More accurate than flat OCV
POCV (Parametric OCV)	Statistical approach — uses Gaussian delay distributions	Reduces unnecessary pessimism
CPPR (Clock Path Pessimism Removal)	Removes double-derate on shared clock buffers (launch + capture share the same buffer tree up to fork point)	Recovers pessimistic slack

Timing Closure Workflow

Timing Closure — Iteration Flow

1. Run STA at worst-case corner (ss_0p85v_125c)
   report_timing -slack_lesser_than 0 -max_paths 100

2. Sort violations by worst negative slack (WNS) and total negative slack (TNS)
   → WNS = worst single path slack
   → TNS = sum of all negative slacks (indicates volume of work)

3. Group violations by clock domain and module
   → Most violations in one module? → RTL restructure
   → Spread across chip? → Placement/routing issue

4. Apply fixes (priority order):
   a. SDC corrections (false paths, MCPs wrongly analyzed)
   b. RTL restructuring (pipelining, logic rebalancing)
   c. Synthesis constraint tightening (add -0.1ns margin)
   d. Physical: re-floorplan, re-place critical cells
   e. ECO: manual gate sizing, buffer insertion

5. Re-run STA after each fix — check for introduced hold violations

6. Sign off at all required corners:
   Setup: ss_0p85v_125c  (slow-slow, hot, low voltage)
   Hold:  ff_0p95v_-40c  (fast-fast, cold, high voltage)
   Leakage: tt_0p9v_25c  (typical)

Common Timing Closure Mistakes

Mistake	Consequence	Fix
Fixing setup by adding buffers to the clock	Moves clock edge — helps setup but creates hold violations elsewhere	Size data path instead
Setting false_path on a real path	Masks a real violation — silicon will fail	Verify path is truly asynchronous before marking false
Missing hold fix at fast corner	Design fails at -40°C or with fast process lots	Always run hold analysis at ff_0p95v_-40c
Applying OCV without CPPR	Over-pessimistic slack — unnecessary over-engineering	Enable CPPR in PrimeTime: set_app_var timing_remove_clock_reconvergence_pessimism true
Ignoring clock domain crossings in SDC	STA tries to analyze unconstrained CDC paths — false violations	Explicitly set_false_path on all async CDC paths

🔗