The Four Types of Timing Paths
STA classifies every timing path by its start point and end point. Every path in the design falls into one of four categories. Understanding which type you are looking at tells you immediately what constraints apply and what the fix space is.
Register → Register
From a flip-flop's clock pin (launch) through its Q output, through combinational logic, to the next flip-flop's D input (capture). The most common path type — all internal datapath timing is R2R. Constrained by create_clock period.
Input → Register
From a primary input port through combinational logic to a flip-flop's D input. Constrained by set_input_delay. The external logic that drives the input port consumes part of the clock period before the signal reaches the chip.
Register → Output
From a flip-flop's Q output through combinational logic to a primary output port. Constrained by set_output_delay. The downstream chip receiving the output needs the data to arrive before its own setup deadline.
Input → Output
A purely combinational path with no sequential element — from input port directly to output port. Constrained by both set_input_delay and set_output_delay together. The available budget is the clock period minus both delays.
| Path Type | Launch point | Capture point | Constraint | Frequency? |
|---|---|---|---|---|
| R2R | FF clock pin | FF D pin | create_clock | Yes — limits fmax |
| I2R | Input port | FF D pin | set_input_delay | Yes |
| R2O | FF Q pin | Output port | set_output_delay | Yes |
| I2O | Input port | Output port | Both I/O delays | Yes |
How STA Builds the Timing Graph
STA works on a timing graph — a directed acyclic graph (DAG) derived from the gate-level netlist. Every logic gate becomes a node, every wire becomes a directed edge with propagation delay, and every flip-flop becomes a source (from Q) or sink (to D).
Timing Graph Construction:
1. Parse gate-level netlist (after synthesis or P&R)
└── Every cell → node (with delay from .lib timing arcs)
└── Every net → edge (wire delay from parasitic extraction)
2. Annotate delays
└── Cell delays: from timing library at the specified PVT corner
└── Wire delays: from SPEF/SPICE parasitic extraction (post-layout)
└── Pre-layout: wire load models (estimated)
3. Identify start/end points
└── Start points: FF clock pins, primary input ports
└── End points: FF D pins, primary output ports
4. Enumerate all paths (conceptually)
└── In practice: forward/backward traversal to compute
arrival time and required time at each node
5. Compute slack at every end point
└── Setup slack = required time − arrival time
└── Report worst slack paths (top-N violators)
Arrival Time, Required Time, and Slack
Every node in the timing graph has two key numbers: arrival time (when does the signal actually get here?) and required time (when does it need to be here?). The difference between them is slack.
Forward propagation (arrival time)
STA propagates arrival times forward from all start points through the graph. At each gate, the arrival time = max(arrivals at all inputs) + gate delay. This max selects the latest-arriving input, which is the constraining input for that gate.
Backward propagation (required time)
STA propagates required times backward from all end points. At each gate, required time at input = required time at output − gate delay. This identifies how late a signal can be at any point and still meet the downstream deadline.
Reading a PrimeTime Timing Report
Every STA engineer spends significant time reading timing path reports from tools like Synopsys PrimeTime or Cadence Tempus. The report structure is standardized and once you know how to read it, you can instantly locate the bottleneck cell.
============================================================
Timing Path Report: Setup Check
Path Group: CLK
Path Type: max (setup)
============================================================
Point Incr Path
─────────────────────────────────────────────────────────
clock CLK (rise edge) 0.000 0.000
clock network delay (ideal) 0.000 0.000
u_pipe_A/clk (DFF_X1) 0.000 0.000 r ← launch FF clock pin
u_pipe_A/Q (DFF_X1) 0.180 0.180 f ← clock-to-Q delay
u_and0/A (AND2_X1) 0.042 0.222 f ← wire + gate delay
u_and0/Z (AND2_X1) 0.065 0.287 r
u_xor1/A (XOR2_X2) 0.038 0.325 r
u_xor1/Z (XOR2_X2) 0.092 0.417 f ← gate delay
u_add/A[3] (ADDER_X1) 0.031 0.448 f
u_add/SUM[3] (ADDER_X1) 0.340 0.788 r ← adder is slow!
u_reg_B/D (DFF_X1) 0.025 0.813 r ← wire to capture FF
data arrival time 0.813 ← total path delay
─────────────────────────────────────────────────────────
clock CLK (rise edge) 1.000 1.000
clock network delay (ideal) 0.000 1.000
u_reg_B/clk (DFF_X1) 0.000 1.000 r ← capture FF clock
library setup time -0.045 0.955 ← setup time subtracted
data required time 0.955
─────────────────────────────────────────────────────────
data required time 0.955
data arrival time -0.813
─────────────────────────────────────────────────────────
slack (MET) +0.142 ← positive = pass
Reading the "Incr" column
The "Incr" column shows the incremental delay at each step — wire delay + cell delay. The largest single increment is the bottleneck gate. In the example above, u_add/SUM[3] adds 0.340 ns — the adder is the critical gate on this path.
r / f annotations
"r" = rising transition, "f" = falling transition. Cell delays are different for rising and falling edges (asymmetric PMOS/NMOS drive). STA reports the worst-case transition. XOR and adder paths often have long chains of inversion that create alternating r/f transitions.
Logic Depth and Gate-Level Analysis
Logic depth is the number of logic gate levels on the combinational path between two registers. Each gate level adds propagation delay. A path with 20 gate levels at 50 ps average delay contributes 1 ns of combinational delay — if your clock period is 1 ns, there is zero budget left for setup time and clock-to-Q.
| Logic function | Typical gate levels | Typical delay (28nm) | Optimization strategy |
|---|---|---|---|
| 2:1 Mux | 1–2 | 80–120 ps | Synthesis restructuring |
| 8-bit adder (RCA) | 16–18 | 800–1000 ps | Replace with CLA/carry-select |
| 8-bit adder (CLA) | 6–8 | 300–400 ps | Gate sizing, VT swap |
| 16-bit comparator | 8–10 | 400–500 ps | Tree structure |
| 32-bit multiplier | 20–30 | 1.5–3 ns | Pipeline, Booth encoding |
| Priority encoder (16-bit) | 4–6 | 200–300 ps | Restructure OR tree |
# PrimeTime command to report logic levels on critical paths
report_timing -max_paths 10 -nworst 1 \
-path_type full_clock \
-delay_type max \
-sort_by slack
# Show number of logic levels explicitly
report_timing -max_paths 5 -input_pins -nets -transition_time \
-capacitance -crosstalk_delta
# Report paths by logic depth (useful to find structurally deep paths)
report_timing -max_paths 20 -group_count 5 \
-slack_lesser_than 0.5 ;# near-critical paths too
The number of logic levels is reported as "data path / logic levels" in PrimeTime and Tempus. A path with 25+ logic levels at advanced nodes almost always needs pipelining — no amount of gate sizing will fix a 25-level path if the target clock period is 1 ns.
Critical Path Fixing — Ordered by Impact
Not all timing fixes have the same impact. These techniques are ordered from most structural (largest impact, done early) to most local (smallest impact, done late in the flow).
1. Pipelining
Insert a flip-flop in the middle of a long combinational path, splitting it into two shorter paths. Each half now has the full clock period. Increases latency by 1 cycle but doubles the maximum achievable frequency for that path. Best applied early during RTL design.
2. Logic restructuring
Rebalance the logic tree to reduce the longest path. Example: a ripple-carry adder replaced by a carry-lookahead adder cuts gate depth by ~60%. Requires RTL or synthesis script changes. Works best on arithmetic-heavy paths.
3. Gate sizing (drive strength)
Replace a weak cell (e.g., AND2_X1) with a stronger version (AND2_X4). Larger drive strength charges downstream capacitance faster, reducing delay. Each upsizing step reduces delay by ~15–25%. Limited by area/power budget.
4. VT swapping
Replace High-Vt cells (slower, lower leakage) with Low-Vt cells (faster, higher leakage) on the critical path. Typically 10–20% delay reduction per swap. Tools do this automatically during timing optimization with a leakage power constraint.
5. Useful skew
Delay the capture clock slightly (positive skew) to give the data path more time. Directly adds to setup slack on that path. Must be balanced against hold margin. See the Clock Tree page for details.
6. Physical optimization
Move critical cells closer together to reduce wire delay. Reroute high-fanout nets with wider wires. Add repeater buffers to break long RC chains. These are post-placement fixes performed by the P&R tool during timing-driven optimization.
Near-Critical Paths and Timing Margin
A design that meets timing with only 10 ps of slack on hundreds of paths is fragile. Any post-sign-off variation — ECO buffers added for functional bugs, slight floorplan changes, or different parasitic extraction — can push those paths into violation. Good timing closure targets a healthy margin above zero.
| Slack range | Status | Action |
|---|---|---|
| < 0 ps | Failing — must fix | Apply fixes before tapeout |
| 0 – 50 ps | Marginally passing | Monitor; any ECO may cause violations |
| 50 – 200 ps | Healthy margin | Safe for most ECOs and sign-off |
| > 200 ps | Over-designed | Consider frequency increase or power reduction |
# Report top-20 near-critical paths (slack < 0.2 ns)
report_timing -max_paths 20 \
-slack_lesser_than 0.2 \
-delay_type max
# Check path count per slack bucket
report_timing -max_paths 1000 -delay_type max \
| grep "slack" | awk '{print $NF}' | sort -n | uniq -c
# Interactive: highlight critical paths in GUI
gui_highlight_timing_path [get_timing_paths -max_paths 5 -nworst 1]
Frequently Asked Questions
report_timing shows results per group, and synthesis/optimization tools improve each group's worst path independently. You can create custom path groups to give specific paths higher reporting priority, or to separate I/O paths from register-to-register paths for different optimization treatment.Explore Further
RTL Pipelining
Pipelining is the most effective fix for a deep critical path — see how to identify where to insert pipeline registers in RTL to split a long combinational path into two short ones.
RTL Design Techniques
Explore RTL-level strategies that directly reduce critical path depth — logic restructuring, carry-lookahead arithmetic, and resource sharing that synthesis tools use to optimize timing.
CDC in RTL Design
Clock Domain Crossing paths are a special category of timing paths that STA cannot analyze with standard checks — see how synchronizer paths are handled separately from the main timing graph.