What is a critical path in STA?

The critical path is the longest timing path in a design — the register-to-register or input-to-register path with the least setup slack (or most negative slack if it is failing). It determines the maximum operating frequency of the design: fmax = 1 / (T_cq + T_comb_critical + T_su). All paths with positive slack are non-critical. Only the critical path (and near-critical paths) require timing optimization effort.

What are the four types of timing paths in STA?

STA analyzes four path types: (1) Register-to-register paths — from one FF's clock pin through Q, through combinational logic, to the next FF's D pin. This is the most common path type. (2) Input-to-register paths — from a primary input port through combinational logic to a FF's D pin, constrained by set_input_delay. (3) Register-to-output paths — from a FF's Q pin through combinational logic to a primary output port, constrained by set_output_delay. (4) Input-to-output paths — purely combinational paths from input port to output port with no sequential element.

What is slack in STA?

Slack is the timing margin on a path — the difference between the time data is required to arrive (required time) and the time it actually arrives (arrival time). Setup slack = required time - arrival time. Positive slack means the path meets timing with margin to spare. Negative slack means the path is failing — data arrives after the deadline. The path with the most negative slack is the critical path.

What is logic depth and why does it matter?

Logic depth is the number of logic gate levels on the combinational path between two registers. Each gate level adds propagation delay. A path with 20 gate levels at 50 ps each contributes 1 ns of combinational delay. Reducing logic depth — through logic restructuring, pipelining, or retiming — is one of the primary ways to fix setup violations on critical paths. Modern STA tools report logic levels explicitly in timing path reports.

How do you read a PrimeTime timing report?

A PrimeTime timing report shows the path in two sections: the data path (launch side) and the clock path (capture side). The data arrival time accumulates from the launch clock edge through clock-to-Q delay and each combinational gate. The required time is computed from the capture clock edge plus the period minus setup time. Slack = required time - data arrival time. Negative slack = violation. The report also shows each cell name, its delay, and cumulative arrival time, making it easy to identify which gate is the bottleneck.

Timing Paths & Critical Path Analysis – STA Deep Dive

Section 01

The Four Types of Timing Paths

STA classifies every timing path by its start point and end point. Every path in the design falls into one of four categories. Understanding which type you are looking at tells you immediately what constraints apply and what the fix space is.

Register → Register

From a flip-flop's clock pin (launch) through its Q output, through combinational logic, to the next flip-flop's D input (capture). The most common path type — all internal datapath timing is R2R. Constrained by create_clock period.

Input → Register

From a primary input port through combinational logic to a flip-flop's D input. Constrained by set_input_delay. The external logic that drives the input port consumes part of the clock period before the signal reaches the chip.

Register → Output

From a flip-flop's Q output through combinational logic to a primary output port. Constrained by set_output_delay. The downstream chip receiving the output needs the data to arrive before its own setup deadline.

Input → Output

A purely combinational path with no sequential element — from input port directly to output port. Constrained by both set_input_delay and set_output_delay together. The available budget is the clock period minus both delays.

Path Type	Launch point	Capture point	Constraint	Frequency?
R2R	FF clock pin	FF D pin	`create_clock`	Yes — limits fmax
I2R	Input port	FF D pin	`set_input_delay`	Yes
R2O	FF Q pin	Output port	`set_output_delay`	Yes
I2O	Input port	Output port	Both I/O delays	Yes

Section 02

How STA Builds the Timing Graph

STA works on a timing graph — a directed acyclic graph (DAG) derived from the gate-level netlist. Every logic gate becomes a node, every wire becomes a directed edge with propagation delay, and every flip-flop becomes a source (from Q) or sink (to D).

  Timing Graph Construction:

  1. Parse gate-level netlist (after synthesis or P&R)
     └── Every cell → node (with delay from .lib timing arcs)
     └── Every net → edge (wire delay from parasitic extraction)

  2. Annotate delays
     └── Cell delays: from timing library at the specified PVT corner
     └── Wire delays: from SPEF/SPICE parasitic extraction (post-layout)
     └── Pre-layout: wire load models (estimated)

  3. Identify start/end points
     └── Start points: FF clock pins, primary input ports
     └── End points:   FF D pins, primary output ports

  4. Enumerate all paths (conceptually)
     └── In practice: forward/backward traversal to compute
         arrival time and required time at each node

  5. Compute slack at every end point
     └── Setup slack = required time − arrival time
     └── Report worst slack paths (top-N violators)

Why STA is fast: STA does not simulate — it does not apply test vectors or propagate logic values. It computes timing mathematically on the graph using static delay values from the library. This means it analyzes every possible path simultaneously in minutes, while simulation would take weeks for the same coverage.

Section 03

Arrival Time, Required Time, and Slack

Every node in the timing graph has two key numbers: arrival time (when does the signal actually get here?) and required time (when does it need to be here?). The difference between them is slack.

Arrival time at node N = max(arrival at all inputs) + cell delay(N)

Required time at node N = required time at output − cell delay(N)

Slack at end point = Required time − Arrival time

Critical path = path with the smallest (most negative) slack

Forward propagation (arrival time)

STA propagates arrival times forward from all start points through the graph. At each gate, the arrival time = max(arrivals at all inputs) + gate delay. This max selects the latest-arriving input, which is the constraining input for that gate.

Backward propagation (required time)

STA propagates required times backward from all end points. At each gate, required time at input = required time at output − gate delay. This identifies how late a signal can be at any point and still meet the downstream deadline.

Setup slack (full equation):

Slack = (T_capture_clk + T_period − T_su) − (T_launch_clk + T_cq + T_comb)

fmax = 1 / (T_cq + T_comb_critical + T_su − T_skew)

Critical path ≠ longest wire. The critical path is the path with the worst (least) timing slack, not necessarily the physically longest wire. A short wire through many slow high-fanout gates can be more critical than a long wire through fast buffers. Logic depth and gate drive strength matter more than wire length alone.

Section 04

Reading a PrimeTime Timing Report

Every STA engineer spends significant time reading timing path reports from tools like Synopsys PrimeTime or Cadence Tempus. The report structure is standardized and once you know how to read it, you can instantly locate the bottleneck cell.

  ============================================================
  Timing Path Report: Setup Check
  Path Group: CLK
  Path Type:  max (setup)
  ============================================================

  Point                        Incr    Path
  ─────────────────────────────────────────────────────────
  clock CLK (rise edge)        0.000   0.000
  clock network delay (ideal)  0.000   0.000
  u_pipe_A/clk (DFF_X1)        0.000   0.000 r   ← launch FF clock pin

  u_pipe_A/Q (DFF_X1)          0.180   0.180 f   ← clock-to-Q delay
  u_and0/A (AND2_X1)           0.042   0.222 f   ← wire + gate delay
  u_and0/Z (AND2_X1)           0.065   0.287 r
  u_xor1/A (XOR2_X2)           0.038   0.325 r
  u_xor1/Z (XOR2_X2)           0.092   0.417 f   ← gate delay
  u_add/A[3] (ADDER_X1)        0.031   0.448 f
  u_add/SUM[3] (ADDER_X1)      0.340   0.788 r   ← adder is slow!
  u_reg_B/D (DFF_X1)           0.025   0.813 r   ← wire to capture FF

  data arrival time                     0.813     ← total path delay

  ─────────────────────────────────────────────────────────
  clock CLK (rise edge)        1.000   1.000
  clock network delay (ideal)  0.000   1.000
  u_reg_B/clk (DFF_X1)        0.000   1.000 r   ← capture FF clock
  library setup time          -0.045   0.955     ← setup time subtracted

  data required time                    0.955

  ─────────────────────────────────────────────────────────
  data required time                    0.955
  data arrival time                    -0.813
  ─────────────────────────────────────────────────────────
  slack (MET)                          +0.142    ← positive = pass

Reading the "Incr" column

The "Incr" column shows the incremental delay at each step — wire delay + cell delay. The largest single increment is the bottleneck gate. In the example above, u_add/SUM[3] adds 0.340 ns — the adder is the critical gate on this path.

r / f annotations

"r" = rising transition, "f" = falling transition. Cell delays are different for rising and falling edges (asymmetric PMOS/NMOS drive). STA reports the worst-case transition. XOR and adder paths often have long chains of inversion that create alternating r/f transitions.

Locating the bottleneck: Sort the "Incr" column mentally. The cell with the largest single increment is the bottleneck. Typical suspects: wide adders, multipliers, long mux chains, high-fanout nets with heavy load, and cells driving large wire capacitances. Fix that cell first — upsizing its drive strength or breaking the path with pipelining has the biggest impact.

Section 05

Logic Depth and Gate-Level Analysis

Logic depth is the number of logic gate levels on the combinational path between two registers. Each gate level adds propagation delay. A path with 20 gate levels at 50 ps average delay contributes 1 ns of combinational delay — if your clock period is 1 ns, there is zero budget left for setup time and clock-to-Q.

Logic function	Typical gate levels	Typical delay (28nm)	Optimization strategy
2:1 Mux	1–2	80–120 ps	Synthesis restructuring
8-bit adder (RCA)	16–18	800–1000 ps	Replace with CLA/carry-select
8-bit adder (CLA)	6–8	300–400 ps	Gate sizing, VT swap
16-bit comparator	8–10	400–500 ps	Tree structure
32-bit multiplier	20–30	1.5–3 ns	Pipeline, Booth encoding
Priority encoder (16-bit)	4–6	200–300 ps	Restructure OR tree

# PrimeTime command to report logic levels on critical paths
report_timing -max_paths 10 -nworst 1 \
              -path_type full_clock \
              -delay_type max \
              -sort_by slack

# Show number of logic levels explicitly
report_timing -max_paths 5 -input_pins -nets -transition_time \
              -capacitance -crosstalk_delta

# Report paths by logic depth (useful to find structurally deep paths)
report_timing -max_paths 20 -group_count 5 \
              -slack_lesser_than 0.5   ;# near-critical paths too

The number of logic levels is reported as "data path / logic levels" in PrimeTime and Tempus. A path with 25+ logic levels at advanced nodes almost always needs pipelining — no amount of gate sizing will fix a 25-level path if the target clock period is 1 ns.

Section 06

Critical Path Fixing — Ordered by Impact

Not all timing fixes have the same impact. These techniques are ordered from most structural (largest impact, done early) to most local (smallest impact, done late in the flow).

1. Pipelining

Insert a flip-flop in the middle of a long combinational path, splitting it into two shorter paths. Each half now has the full clock period. Increases latency by 1 cycle but doubles the maximum achievable frequency for that path. Best applied early during RTL design.

2. Logic restructuring

Rebalance the logic tree to reduce the longest path. Example: a ripple-carry adder replaced by a carry-lookahead adder cuts gate depth by ~60%. Requires RTL or synthesis script changes. Works best on arithmetic-heavy paths.

3. Gate sizing (drive strength)

Replace a weak cell (e.g., AND2_X1) with a stronger version (AND2_X4). Larger drive strength charges downstream capacitance faster, reducing delay. Each upsizing step reduces delay by ~15–25%. Limited by area/power budget.

4. VT swapping

Replace High-Vt cells (slower, lower leakage) with Low-Vt cells (faster, higher leakage) on the critical path. Typically 10–20% delay reduction per swap. Tools do this automatically during timing optimization with a leakage power constraint.

5. Useful skew

Delay the capture clock slightly (positive skew) to give the data path more time. Directly adds to setup slack on that path. Must be balanced against hold margin. See the Clock Tree page for details.

6. Physical optimization

Move critical cells closer together to reduce wire delay. Reroute high-fanout nets with wider wires. Add repeater buffers to break long RC chains. These are post-placement fixes performed by the P&R tool during timing-driven optimization.

Fix order matters: Always fix the most negative slack path first. After each fix, re-run timing — fixing one path sometimes reveals a new critical path that was previously hidden by a larger violation. Work through the violation list iteratively rather than trying to fix all paths simultaneously.

Section 07

Near-Critical Paths and Timing Margin

A design that meets timing with only 10 ps of slack on hundreds of paths is fragile. Any post-sign-off variation — ECO buffers added for functional bugs, slight floorplan changes, or different parasitic extraction — can push those paths into violation. Good timing closure targets a healthy margin above zero.

Slack range	Status	Action
< 0 ps	Failing — must fix	Apply fixes before tapeout
0 – 50 ps	Marginally passing	Monitor; any ECO may cause violations
50 – 200 ps	Healthy margin	Safe for most ECOs and sign-off
> 200 ps	Over-designed	Consider frequency increase or power reduction

# Report top-20 near-critical paths (slack < 0.2 ns)
report_timing -max_paths 20 \
              -slack_lesser_than 0.2 \
              -delay_type max

# Check path count per slack bucket
report_timing -max_paths 1000 -delay_type max \
  | grep "slack" | awk '{print $NF}' | sort -n | uniq -c

# Interactive: highlight critical paths in GUI
gui_highlight_timing_path [get_timing_paths -max_paths 5 -nworst 1]

Interactive Lab — Critical Path Analyzer

Build a logic chain stage by stage. Watch the path delay accumulate and see whether the path meets timing — and which gate is the bottleneck.

Clock period (ps) 1000 ps

Logic levels (gate count) 8

Avg gate delay (ps) 50 ps

Wire delay per stage (ps) 15 ps

T_cq

80 ps

T_comb

520 ps

Total Arrival

640 ps

Required Time

955 ps

Setup Slack

+315 ps

Status

✓ PASS

Section 08

Frequently Asked Questions

The critical path is the timing path with the smallest (or most negative) setup slack in the design. It limits the maximum operating frequency because the clock period must be long enough for data to propagate through the longest combinational path: fmax = 1 / (T_cq + T_comb_max + T_su). Every other path in the design has more margin. Improving fmax means only the critical path needs to be shortened — the rest are already fine. After fixing the critical path, the next-worst path becomes the new critical path, and the process repeats.

Setup analysis asks: can the data arrive in time for the capture FF? The worst case for this question is when the data path is as slow as possible — maximum delay. STA uses max (late) delay values from the slow (SS) corner to find the worst-case arrival time for setup checks. Hold analysis asks: does the data stay stable long enough after the clock edge? The worst case here is when the data changes as fast as possible — minimum delay. STA uses min (early) delay values from the fast (FF) corner for hold checks. Running both simultaneously — called BCWC (Best Case Worst Case) or OCV analysis — ensures both checks are analyzed pessimistically.

A path group is a collection of timing paths that share the same capture clock. By default, STA tools create one path group per clock domain. Path groups determine how the tool reports and optimizes paths — report_timing shows results per group, and synthesis/optimization tools improve each group's worst path independently. You can create custom path groups to give specific paths higher reporting priority, or to separate I/O paths from register-to-register paths for different optimization treatment.

In the timing report, look at the "Incr" column — this shows the incremental delay at each step (wire + cell delay combined). The single largest increment on the data path is the bottleneck gate or wire segment. Common bottlenecks: wide adders/multipliers with many gate levels, high-fanout nets driving many cells (the load capacitance slows the driver), long wires with high RC (post-layout), and cells operating at High-Vt for power savings on a timing-critical path. Fix the bottleneck first — upsizing it or replacing it with a faster implementation gives the maximum slack recovery for the minimum change.

Worst Negative Slack (WNS) is the slack of the single most-failing path — the most negative number in the entire design. Total Negative Slack (TNS) is the sum of all negative slacks across all failing paths. WNS tells you how far your worst path is from closure. TNS tells you the overall volume of timing work remaining. A design with WNS = −0.5 ns and TNS = −0.5 ns has one failing path. A design with WNS = −0.5 ns and TNS = −50 ns has a hundred failing paths of similar severity. Both numbers together give a complete picture of timing health.

Yes — several scenarios cause this. (1) Wrong SDC: false paths masking real violations, or overly loose I/O delays. (2) Missing corners: timing only closed at TT, but SS corner fails. (3) Post-silicon variation beyond modeled limits: real silicon varies more than library models. (4) IR drop: power grid resistance causes supply voltage to sag under load, slowing cells below library specs. (5) Crosstalk (SI): an aggressor switching net couples noise onto a victim net, changing its delay in ways STA may underestimate. Real silicon sign-off requires multi-corner analysis, SI-aware STA, and IR-drop-aware timing — not just nominal STA.

Explore Further

RTL Pipelining

Pipelining is the most effective fix for a deep critical path — see how to identify where to insert pipeline registers in RTL to split a long combinational path into two short ones.

RTL Design Techniques

Explore RTL-level strategies that directly reduce critical path depth — logic restructuring, carry-lookahead arithmetic, and resource sharing that synthesis tools use to optimize timing.

CDC in RTL Design

Clock Domain Crossing paths are a special category of timing paths that STA cannot analyze with standard checks — see how synchronizer paths are handled separately from the main timing graph.

← SDC Constraints STA Hub →