What is clock skew, how balanced clock trees are built with buffers, CTS flow, H-tree topology, and how skew affects setup and hold timing margins.
H-Tree balanced clock distribution — every FF sees identical clock path length
Clock skew is the spatial difference in clock arrival time between two flip-flops in a digital circuit. Ideally, all flip-flops receive the clock edge simultaneously. In practice, differences in wire lengths, buffer delays, and process/voltage/temperature (PVT) variations cause the clock to arrive at different times at different FFs.
Capture FF clock arrives LATER than launch FF clock. Helps setup timing, hurts hold timing.
Capture FF clock arrives EARLIER than launch FF clock. Hurts setup timing, helps hold timing.
Both FFs receive clock at the same time. The CTS goal — achieved by balancing buffer paths.
Intentional skew added to a critical path to borrow time from a nearby non-critical path.
| Term | Definition | Cause | STA Modeling |
|---|---|---|---|
| Clock Skew | Spatial difference in clock arrival between two FFs | Unequal wire length, buffer mismatch | Captured in clock tree analysis |
| Clock Jitter | Cycle-to-cycle variation at a single clock point | PLL noise, power supply, thermal | set_clock_uncertainty |
| Clock Latency | Total delay from clock source to FF clock pin | Buffer delays + wire delay | Source + network latency in CTS |
| Insertion Delay | Delay inserted by the clock tree (buffers + wires) | CTS buffer chain | Balanced to minimize skew |
| Transition Time | Clock signal rise/fall time at FF clock pin | Buffer drive strength, capacitance | CTS targets: 60–200ps |
Clock skew directly modifies the effective timing window available for data to propagate between flip-flops.
| Skew Type | Setup Timing | Hold Timing | When useful |
|---|---|---|---|
| Positive Skew (+) | Relaxed (extra window) | Tightened (harder) | Fix setup violations on slow paths |
| Negative Skew (−) | Tightened (harder) | Relaxed (extra window) | Fix hold violations (rare useful case) |
| Zero Skew (≈0) | Nominal | Nominal | Standard CTS target |
Hold violations are dangerous: Hold timing failures cannot be fixed by slowing down the clock. They require adding delay buffers on data paths. Excessive positive skew creates hold violations — CTS engineers must carefully balance skew optimization against hold risk.
CTS is a key physical design step that builds the clock distribution network from the clock source to all FF sinks.
| Step | Action | Goal |
|---|---|---|
| 1 | Define clock tree source (PLL output / clock port) | Establish tree root |
| 2 | Clock tree topology selection (H-tree, fishbone, mesh) | Geometric balance |
| 3 | Buffer insertion — CLKBUF, CLKINV pairs, ICG cells | Drive strength + delay balance |
| 4 | Skew balancing — adjust buffer sizes, add filler buffers | Skew < target (e.g., 50ps) |
| 5 | Transition time fixing — size up buffers on high-cap nets | Slew < 150ps at FF clock pins |
| 6 | Post-CTS timing analysis (setup + hold with real clock latency) | No setup/hold violations |
| 7 | Useful skew optimization (optional) — intentional skew for timing | Recover timing on critical paths |
Recursive H-shaped branching. Geometrically balanced — equal wire length to all sinks. Best for regular layouts (memory arrays, datapath).
Horizontal spine with vertical branches. Common in modern place-and-route tools. Less geometric but easily adaptable to irregular designs.
Grid of clock wires with buffers at intersections. Very low skew but high power consumption. Used in high-performance CPUs (Intel, AMD).
Multiple clock tree roots driven from different PLLs. Common in multi-domain SoCs. Each domain balanced independently.
| Cell Type | Function | When Used |
|---|---|---|
| CLKBUF | Non-inverting clock buffer — high drive, symmetric rise/fall | All clock tree levels |
| CLKINV pair | Two inverters (double inversion = non-inverting) for precise delay | Fine-tuning delay in branches |
| ICG (Integrated Clock Gate) | AND gate + latch for glitch-free clock gating | Power management, clock enables |
| CLKDIV | Clock divider — generates /2, /4 derived clocks | Multi-frequency clock domains |
| MUX (clock mux) | Selects between two clock sources | DFT, scan, test modes |
# Define primary clock — 1GHz create_clock -name clk -period 1.0 -waveform {0 0.5} [get_ports CLK] # Clock uncertainty: jitter + OCV margin (pre-CTS) set_clock_uncertainty -setup 0.15 [get_clocks clk] set_clock_uncertainty -hold 0.05 [get_clocks clk] # Post-CTS: reduce uncertainty (clock tree is now modeled precisely) set_clock_uncertainty -setup 0.08 [get_clocks clk] set_clock_uncertainty -hold 0.03 [get_clocks clk] # Set clock source latency (from PLL to chip pin) set_clock_latency -source 0.5 [get_clocks clk] # Define clock as ideal (pre-CTS, no tree modeled) set_ideal_network [get_ports CLK] # After CTS: propagate clock (use actual tree delays) set_propagated_clock [get_clocks clk] # Useful skew: apply intentional skew to a path # set_clock_skew -setup 0.1 -hold 0 [get_cells launch_ff]
CTS Rule of Thumb: Every 100µm of additional wire length adds ~5–10ps of clock skew (depending on metal layer and load). At 1GHz (1ns period), 50ps of skew consumes 5% of the timing budget. This is why balanced H-trees minimize wire length imbalance.
Clock skew is the difference in arrival time of the clock signal at two different flip-flops. Skew = t_capture_clock − t_launch_clock. Positive skew helps setup, hurts hold. Zero skew is the CTS target. Typical post-CTS skew: <50–100ps.
A balanced clock tree uses carefully sized and positioned buffers (CLKBUFs) to ensure every FF sees the same total clock path delay from the source. CTS tools insert buffers iteratively to equalize branch delays, targeting near-zero skew across the design.
The clock tree source (root) is the starting point — typically a clock input port or PLL output buffer. CTS tools start here and fan out through buffers to all FF clock pins. The source has zero (reference) latency; all downstream FFs accumulate insertion delay from the source.
Skew = static spatial difference between two FFs (different locations see different arrival times). Jitter = dynamic temporal variation at one point (clock edge shifts cycle-to-cycle). Both are captured in STA: skew from CTS analysis, jitter via set_clock_uncertainty.
Useful skew is intentionally introduced skew to fix timing. By delaying the clock to the capture FF (positive skew), setup margin on that path increases. CTS tools or timing ECO can apply useful skew to rescue critical paths without re-routing data paths.