Home Physical Design Day 7 — Static Timing Analysis

Static Timing Analysis & Timing Closure

Complete STA guide: setup/hold analysis, OCV derating, multi-corner verification, critical path fixing techniques, and production timing sign-off for VLSI chip design.

By EcrioniX Engineering Team · Published June 14, 2026 · ~4,900 words · 15 min read

1. What is Static Timing Analysis?

Static Timing Analysis (STA) is the exhaustive, simulation-free verification that every register-to-register path in a design meets its timing constraints. Unlike functional simulation which checks a subset of vectors, STA mathematically analyzes all paths simultaneously.

STA operates on a directed acyclic graph (DAG) of timing arcs:

Timing Path Graph (STA View)
CLK │ ▼ ┌─────────┐ D1 ┌─────┐ D2 ┌─────┐ D3 ┌─────────┐ │ FF1 │──────► │ AND │──────► │ OR │──────► │ FF2 │ │ (reg) │ 15ps └─────┘ 12ps └─────┘ 10ps │ (reg) │ └─────────┘ └─────────┘ │ │ Clock(FF1) Wire delays: Clock(FF2) arrival: T_c1 D1→AND: 5ps arrival: T_c2 AND→OR: 4ps OR→FF2: 6ps Skew = T_c2 - T_c1 Total combinational delay: 15+12+10+5+4+6 = 52ps STA checks: Setup: 52 + T_c1 ≤ T_period + T_c2 - T_setup Hold: 52 + T_c1 ≥ T_c2 + T_hold

2. Setup Time Analysis

Setup time is the minimum time data must be stable BEFORE the capturing clock edge. If data arrives too late — setup violation — the flip-flop may capture an intermediate (metastable) value.

Setup Timing Equation: T_data_arrival ≤ T_clock_arrival - T_setup - T_uncertainty Where: T_data_arrival = launch_clock_edge + logic_delay + wire_delay T_clock_arrival = capture_clock_edge + clock_network_delay(FF2) T_setup = flip-flop's minimum setup requirement (library) T_uncertainty = clock jitter + OCV margin + cross-talk guard Setup Slack = T_clock_arrival - T_setup - T_uncertainty - T_data_arrival Positive slack → PASS (timing margin available) Negative slack → FAIL (timing violation, fix required) Example (1GHz clock, 1ns period): T_period = 1000ps Launch clock edge = 0ps Logic + wire delay = 750ps T_data_arrival = 0 + 750 = 750ps Capture edge = 1000ps Clock network delay(FF2) = 150ps T_clock_arrival = 1000 + 150 = 1150ps T_setup = 50ps T_uncertainty = 80ps Setup slack = 1150 - 50 - 80 - 750 = +270ps ✓ PASS (good margin) Tight example (same path, 2GHz clock): T_period = 500ps T_data_arrival = 750ps (same path) T_clock_arrival = 500 + 150 = 650ps Setup slack = 650 - 50 - 80 - 750 = -230ps ✗ FAIL (needs fix!)
Setup Violation Waveform
Clock at FF1 (launch): ─────┐ ┌─────────────────────┐ ┌──── └─────┘ └─────┘ 0ps 1000ps Data path output at FF2 D pin: │◄────────── 750ps propagation ──────────►│ ─────────────────────────────┐ ──────────────── Data = previous value │ Transition to new └──── at 750ps Clock at FF2 (capture): ─────────────────────────────────┐ ┌──────── └─────┘ 1150ps (skewed late) Setup window at FF2: Data must be stable by: 1150 - 50 = 1100ps Data arrives at: 750ps ← arrives at 750ps, stable before 1100ps ✓ No violation at 1GHz At 2GHz (period=500ps): Clock at FF2 capture edge: 500 + 150 = 650ps Data must be stable by: 650 - 50 = 600ps Data arrives at: 750ps ← 150ps TOO LATE! ✗ Setup violation! Flip-flop captures wrong data

3. Hold Time Analysis

Hold time is the minimum time data must remain stable AFTER the clock edge. Hold violations are particularly dangerous — they cannot be fixed by simply reducing the clock frequency, and they cause permanent functional failures.

Hold Timing Equation: T_data_arrival ≥ T_clock_arrival + T_hold Hold Slack = T_data_arrival - T_clock_arrival - T_hold Positive → PASS Negative → FAIL (must fix — cannot use slower clock!) Hold violation example: T_data_arrival = 25ps (very short combinational path) T_clock_arrival(FF2) = 100ps (FF2 clock arrives early) T_hold = 30ps Hold slack = 25 - 100 - 30 = -105ps ✗ FAIL! Data changes at 25ps, but FF2 needs data stable until 100+30=130ps Data is GONE before hold window ends! Fixes for hold violation (add delay to data path): 1. Insert delay cell (DELAYX1, DELAYX2) in path 2. Use larger drive strength cell (higher intrinsic delay) 3. Increase wire length (more capacitance = more delay) Note: These deliberately slow data — opposite of setup fix!

4. On-Chip Variation (OCV)

Real chips have process variation across the die. Two nominally identical gates at different die locations experience different transistor thresholds (Vt), oxide thickness, and metal resistivity. OCV modeling accounts for this statistically.

OCV Derating — Pessimistic Timing Model
Without OCV (ideal, all gates identical): Launch path delay: 500ps Capture clock delay: 200ps With OCV (realistic derating): Launch path uses SLOW derating: Cells: 500ps × 1.10 = 550ps (10% slower) Wires: 500ps × 1.05 = 525ps (5% more RC) Total launch: 550ps (worst case) Capture clock uses FAST derating: Clock to FF2: 200ps × 0.95 = 190ps (5% faster → earlier clock) (Earlier capture edge = tighter setup check) OCV-derated setup check: T_data_arrival = 550ps (launch, worst slow) T_clock_arrival = 190ps + 500ps = 690ps (capture, fast) Setup slack = 690 - 50 - 80 - 550 = +10ps (was +270ps without OCV!) OCV significantly reduces slack margins. Advanced OCV Methods: AOCV (Advanced OCV): Path-length-dependent derating (shorter paths = more variation) SOCV (Statistical OCV): Monte Carlo-based, less pessimistic than AOCV POCV (Parametric OCV): Uses sigma-based statistical models (most accurate)

5. Multi-Corner Multi-Mode (MCMM)

MCMM verifies timing across ALL combinations of operating conditions simultaneously. A chip that only passes one corner will fail in silicon under different temperatures, voltages, or process lots.

DimensionCornersSetup Check CornerHold Check Corner
ProcessSS / TT / FF (slow/typical/fast)Slow (SS)Fast (FF)
Voltage0.72V / 0.8V / 0.88V (±10%)Low V (0.72V)High V (0.88V)
Temperature-40°C / 25°C / 125°CDepends on nodeDepends on node
Operating ModeFunctional / Scan / Low-PowerAll modesAll modes
MCMM Corner Set (Typical Production Sign-Off): Setup corners (worst for setup — slow data, tight clock window): SLOW_0.72V_125C (slow process, low voltage, hot) SLOW_0.72V_-40C (slow process, low voltage, cold — reverse body effect) Hold corners (worst for hold — fast data, early clock): FAST_0.88V_-40C (fast process, high voltage, cold) FAST_0.88V_25C (fast process, high voltage, room temp) Total corners in sign-off: typically 8–16 corner/mode combos Run time per corner: 2–8 hours (distributed compute) Total MCMM run: 16 corners × 6 hours = 96 CPU-hours Distributed across 32-core cluster: Wall time: ~3 hours per iteration Typical iterations to clean sign-off: 5–10

6. Clock Uncertainty

Clock signals are not perfect. Three distinct sources of uncertainty eat into your timing budget at every cycle:

Clock Uncertainty Sources
┌─────────────────────────────────────┐ Ideal Clock │ Clock Period = 1000ps │ ────────────────┤◄──────────────────────────────────► │ └─────────────────────────────────────┘ 1. Jitter (cycle-to-cycle period variation): Ideal: │ │ │ │ │ │ ← equally spaced edges Real: │ │ │ │ │ │ ← edges shift ±30ps ±30ps jitter 2. Skew (spatial variation across die): FF1 clock arrival: 150ps FF2 clock arrival: 185ps Skew = 35ps 3. OCV on clock tree itself: Derating adds ±20ps uncertainty to buffered clock paths Total uncertainty added to timing check: Jitter: 30ps Skew: 35ps OCV: 20ps Guard: 15ps (margin) Total: 100ps subtracted from clock period for setup Effective timing budget = 1000 - 100 = 900ps available for logic

7. Critical Path Fixing Techniques

When a path fails timing, the fix must address the specific bottleneck. STA reports show which gate or wire contributes the most delay — target those first.

7.1 Cell Upsizing

Cell Upsizing Effect on Critical Path
Original path (timing FAIL, slack = -50ps): FF1_Q ─[BUF_X1]─ 30ps ─[AND_X1]─ 40ps ─[OR_X1]─ 35ps ─ FF2_D (drive X1) (drive X1) (drive X1) Total: 105ps → Slack: 500 - 105 - 50(setup) - 80(uncertainty) = 265ps Wait, path is 750ps total... Bottleneck: AND_X1 has 40ps delay due to high fan-out (10 loads) Fix: Upsize AND_X1 → AND_X4 (4× drive strength): AND_X4 drives same 10 loads at: 15ps (instead of 40ps) New total: 105 - (40-15) = 80ps (saved 25ps) Wire delay also decreases because driver is stronger → slews faster Effective savings: 30ps total New path: FF1_Q ─[BUF_X1]─[AND_X4]─[OR_X1]─ FF2_D = 720ps total New slack: +20ps ✓ PASS

7.2 Buffer Insertion (Logical Effort)

Long wires with many fanout loads benefit from intermediate buffering to reduce delay.

Buffer Insertion Formula (Logical Effort): Optimal # of buffers for a wire: n = log(h) / log(e) h = electrical effort (output capacitance / input capacitance) e = euler's number (~2.718) For h = 64 (64× more capacitive load than driving gate): n = log(64) / log(2.718) ≈ 4.16 → insert 4 buffers Each buffer stage has effort: 64^(1/4) = 2.83× Vs. direct drive effort: 64× (much worse) Delay with 4 buffers: proportional to 4 × 2.83 = 11.3 effort units Delay direct: 64 effort units Improvement: 64/11.3 = 5.7× faster with optimal buffering! Practical: EDA tools handle this automatically via "buffer tree synthesis" Manual intervention needed when auto-fix breaks constraints

7.3 Gate Restructuring

Rebalancing logic depth — converting a wide gate tree into a balanced binary tree — can cut delay by 40–60% on deep combinational paths.

Logic Restructuring — Reduce Depth
Before (sequential chain, 5 levels deep): A ─[AND]─ AB ─[AND]─ ABC ─[AND]─ ABCD ─[AND]─ ABCDE 5ps 5ps 5ps 5ps Total: 20ps delay (4 gate levels) After (balanced binary tree, 3 levels deep): A ─[AND]─ AB ─┐ B ─[AND]─ AB ─┤ [AND]──[AND]─ ABCDE C ─[AND]─ CD ─┤ 5ps D ─[AND]─ CD ─┘ E ─────────────────┘ Total: 5+5+5 = 15ps delay (3 gate levels) ← 25% faster! Tools: Logic synthesis restructuring (Synopsys Design Compiler) Post-route ECO gate replacement (Cadence Innovus)

8. Timing Sign-Off — TSMC N5 Example

Apple M2 Chip (TSMC N5)

Timing Slack Distribution at Sign-Off (Apple M2)
Number of paths (millions): ^ │ ████████ │ ████████ ████ │ ████████ ████ ████ │ ████████ ████ ████ ███ │ ████████ ████ ████ ███ ██ │ ████████ ████ ████ ███ ██ █ └─────────────────────────────────► Slack (ps) -50 0 +20 +50 +100 +200 +500 ^ All paths must move to right of 0! Percentage of violating paths at start of timing closure: Week 0: 8.3% (millions of violations) Week 2: 2.1% Week 4: 0.3% Week 6: 0.02% Week 8: 0% ← CLEAN SIGN-OFF Tools used: Synopsys PrimeTime (STA engine) Cadence Tempus (STA + ECO integration) Custom timing scripts (internal, proprietary)

9. STA Tools and ECO Flows

ToolVendorStrengthKey Feature
PrimeTimeSynopsysIndustry reference for STAPOCV, AOCV, graph-based analysis
TempusCadenceECO-integrated STAAuto ECO with placement awareness
GoldTimeSiemensFast signoff verificationCorrelation with Calibre extraction
StarRCSynopsysParasitic extractionBest accuracy for parasitic delays
QuantusCadenceRC extractionFast extraction with Innovus integration

10. Timing Closure Checklist

Production Timing Sign-Off Checklist

Next — Day 8: Power optimization and thermal management — voltage scaling, clock gating, power domains, and thermal budgeting for production VLSI chips.