Timing Closure —
Fix Setup & Hold Violations
What Is Timing Closure?
Timing closure is the iterative process of eliminating all setup and hold violations in a chip design until every path meets its timing constraint at every signoff corner (process, voltage, temperature). It sits at the intersection of RTL, synthesis, and physical design — and is one of the most time-consuming phases of a real chip tape-out.
The STA tool (PrimeTime, Tempus, etc.) reports slack for every path:
- Positive slack — path meets timing (margin to spare)
- Zero slack — exactly meeting the constraint
- Negative slack — timing violation — must fix before tapeout
Understanding the Slack Equations
== SETUP CHECK == Data Arrival Time = launch_clk_edge + Tclk2q + Tcomb + Tnet Data Required Time = capture_clk_edge + Tclk_skew - Tsetup Setup Slack = Required − Arrival (must be ≥ 0) == HOLD CHECK == Data Arrival Time = launch_clk_edge + Tclk2q_min + Tcomb_min + Tnet_min Data Required Time = capture_clk_edge + Tclk_skew_min + Thold Hold Slack = Arrival − Required (must be ≥ 0) == EXAMPLE (setup violation) == Launch edge: 0ns Tclk2q: 0.15ns Tcomb: 1.80ns ← too long — this is the bottleneck Tnet: 0.12ns Arrival: 2.07ns Capture edge: 2.00ns (500 MHz = 2ns period) Tsetup: 0.05ns Required: 1.95ns Setup Slack = 1.95 − 2.07 = −0.12ns (120ps violation)
Fixing Setup Violations — Techniques in Order
Reduce the number of logic levels in the critical path. Restructure Boolean equations, share subexpressions, or pipeline the path. This is the most efficient fix but requires RTL/synthesis changes.
// BEFORE: 4-level logic chain (slow) assign out = ((a & b & c) | (d & e & f)) & (g | h); // AFTER: Balance levels — same function, 3 levels wire ab = a & b; wire abc = ab & c; wire def = d & e & f; // now synthesizes in parallel assign out = (abc | def) & (g | h);
Replace a cell on the critical path with a higher-drive-strength version. Faster drive reduces Tpd and capacitive load effect. Done automatically by synthesis/place-route tools, but can be applied manually via ECO.
Long nets have high capacitance that slows all gates driving them. Insert repeater buffers at midpoints to break the RC delay. On clock paths this is done by CTS; on data paths it's done during routing optimization.
Push registers forward (toward the output) or backward (toward the input) across combinational logic to balance path delays between pipeline stages. Retiming preserves the functional behavior while redistributing delay.
// BEFORE: 8-level combinational path — critical always @(posedge clk) result <= complex_8_level_logic(a, b, c); // AFTER: Split into 2 pipeline stages — each 4 levels logic [7:0] mid_stage; always @(posedge clk) mid_stage <= first_4_levels(a, b); always @(posedge clk) result <= last_4_levels(mid_stage, c); // Latency increases by 1 cycle — throughput unchanged
If the data only needs to be valid every N clock cycles (e.g., a divide-by-2 path), tell the STA tool to relax the constraint. This is not a physical fix — it's a specification correction.
# Data valid every 2 clocks — relax setup by 1 extra period set_multicycle_path -setup 2 -from [get_cells u_div/q_reg] \ -to [get_cells u_proc/data_reg] # Also relax hold to avoid over-insertion of hold buffers set_multicycle_path -hold 1 -from [get_cells u_div/q_reg] \ -to [get_cells u_proc/data_reg]
Fixing Hold Violations
Insert delay buffers (or dedicated delay cells, X_DELAY) on the launch path to ensure data arrives at least Thold after the capture clock edge. P&R tools do this automatically in hold fixing mode.
In RTL, avoid direct register-to-register connections with no combinational logic between them when they share the same clock edge — these create zero-delay "short paths" that will need hold buffers in every corner.
Clock domain crossing paths should be marked as false paths in SDC — the STA tool cannot perform meaningful hold analysis on them anyway, and doing so forces unnecessary hold buffer insertion.
# Mark CDC path as false — no hold check across async domains set_false_path -from [get_clocks clk_a] -to [get_clocks clk_b] # Set min delay to avoid hold violation (add 0.1ns minimum delay) set_min_delay 0.1 -from [get_cells ff_launch] -to [get_cells ff_capture] # Check hold at fast corner (best-case timing = worst hold) read_sdc design.sdc set_operating_conditions -min ff_0p95v_-40c -max ss_0p85v_125c
OCV and CPPR
| Concept | What It Does | Effect on Slack |
|---|---|---|
| OCV (On-Chip Variation) | Applies derate: launch path slower, capture path faster (setup) or vice versa | Reduces slack by adding pessimism |
| AOCV (Advanced OCV) | Distance/depth-aware derate — cells far apart have more variation | More accurate than flat OCV |
| POCV (Parametric OCV) | Statistical approach — uses Gaussian delay distributions | Reduces unnecessary pessimism |
| CPPR (Clock Path Pessimism Removal) | Removes double-derate on shared clock buffers (launch + capture share the same buffer tree up to fork point) | Recovers pessimistic slack |
Timing Closure Workflow
1. Run STA at worst-case corner (ss_0p85v_125c) report_timing -slack_lesser_than 0 -max_paths 100 2. Sort violations by worst negative slack (WNS) and total negative slack (TNS) → WNS = worst single path slack → TNS = sum of all negative slacks (indicates volume of work) 3. Group violations by clock domain and module → Most violations in one module? → RTL restructure → Spread across chip? → Placement/routing issue 4. Apply fixes (priority order): a. SDC corrections (false paths, MCPs wrongly analyzed) b. RTL restructuring (pipelining, logic rebalancing) c. Synthesis constraint tightening (add -0.1ns margin) d. Physical: re-floorplan, re-place critical cells e. ECO: manual gate sizing, buffer insertion 5. Re-run STA after each fix — check for introduced hold violations 6. Sign off at all required corners: Setup: ss_0p85v_125c (slow-slow, hot, low voltage) Hold: ff_0p95v_-40c (fast-fast, cold, high voltage) Leakage: tt_0p9v_25c (typical)
Common Timing Closure Mistakes
| Mistake | Consequence | Fix |
|---|---|---|
| Fixing setup by adding buffers to the clock | Moves clock edge — helps setup but creates hold violations elsewhere | Size data path instead |
| Setting false_path on a real path | Masks a real violation — silicon will fail | Verify path is truly asynchronous before marking false |
| Missing hold fix at fast corner | Design fails at -40°C or with fast process lots | Always run hold analysis at ff_0p95v_-40c |
| Applying OCV without CPPR | Over-pessimistic slack — unnecessary over-engineering | Enable CPPR in PrimeTime: set_app_var timing_remove_clock_reconvergence_pessimism true |
| Ignoring clock domain crossings in SDC | STA tries to analyze unconstrained CDC paths — false violations | Explicitly set_false_path on all async CDC paths |