What is clock tree synthesis (CTS)?

Clock Tree Synthesis (CTS) is the process of distributing a single clock source to every flip-flop in a chip through a tree of buffers and inverters. The goal is to minimize clock skew (arrival time differences between flip-flops) and control clock insertion delay, slew rate, and power consumption.

Clock skew is the difference in clock arrival time between two flip-flops connected by a timing path. It is caused by unequal buffer delays in the clock tree due to different routing lengths and cell placement. Positive skew (capture clock arrives later) helps setup but hurts hold. Negative skew hurts setup but helps hold.

Useful skew is the intentional introduction of clock skew to improve timing margins. By making the capture clock arrive later (positive skew) on a critical setup-violating path, the effective clock period for that path is increased. This is done carefully because positive skew on setup tightens hold — hold must be fixed with buffers afterward.

What is clock insertion delay?

Clock insertion delay (also called clock latency) is the time it takes for the clock signal to propagate from the clock source through the clock tree to a flip-flop's clock pin. It has two components: source latency (from PLL to the clock port) and network latency (from clock port through the buffer tree to the FF). STA accounts for both launch and capture latencies in timing checks.

What is an H-tree clock topology?

An H-tree is a symmetric clock distribution structure where the clock is routed through wires that branch in an H-shape, ensuring equal wire length from the clock source to every leaf. This geometric symmetry minimizes clock skew before buffers are added. H-trees are used in high-performance designs like CPUs and GPUs where ultra-low skew is required.

How does CTS affect setup and hold timing?

CTS affects both setup and hold margins through clock skew. Setup slack = T_period + T_skew - T_cq - T_comb - T_su. Positive skew improves setup slack. Hold slack = T_cq + T_comb_min - T_h - T_skew. The same positive skew that helps setup tightens hold. CTS engineers must balance skew targets so both setup and hold pass simultaneously across all corners.

Clock Tree Synthesis & Skew – STA Deep Dive

Section 01

Why the Clock Needs a Tree

In RTL simulation, a clock signal arrives simultaneously at every flip-flop. In silicon, it does not. A single wire from the PLL output to a million flip-flops would have an enormous fanout — the driving cell would need to charge the combined input capacitance of every FF simultaneously. That drive current would be enormous, the wire resistance would cause huge RC delay, and the signal would arrive wildly late and with a degraded slew rate.

Clock Tree Synthesis solves this by inserting a balanced tree of buffers and inverters between the clock source and the flip-flops. Each buffer drives only a small number of downstream loads, the tree branches progressively, and the total delay from source to every leaf is made as equal as possible.

What CTS controls

Insertion delay (total latency from source to FF clock pin), clock skew (arrival time difference between FFs), slew rate (transition time at each leaf), and fanout per buffer stage. All four are specified as CTS constraints before the tool runs.

What CTS cannot fix

Floorplan problems that force clock wires to travel long distances, extremely high clock fanout from ungated always-on clocks, and fundamental setup violations caused by too-long combinational paths — these must be resolved before CTS.

Key insight: CTS is run after placement and before routing in the physical design flow. At this point cell positions are fixed, so the tool knows exactly how far each FF is from the clock source and can size buffer stages to equalize arrival times.

Section 02

Clock Tree Topologies

Different designs use different physical structures for the clock tree, each with trade-offs in skew, power, and routability.

H-Tree

Routes the clock through wires shaped like nested letter H. Each branch has the same wire length from the center, giving inherent geometric symmetry. Minimizes skew before any buffers are inserted. Used in high-performance regularchip blocks like SRAM, CPU cores, and FPGAs.

Balanced Binary Tree

A standard buffer tree where each node drives exactly two children. CTS tools equalize delay by sizing buffers and adjusting wire lengths. Most common in ASIC design — tools like Innovus and ICC2 build this automatically from the placed netlist.

Mesh (Grid)

A global metal grid is driven from multiple points, creating a low-impedance distributed clock network. Extremely low skew (near zero) because every point on the mesh is connected. Very high power due to large capacitance. Used in ultra-high-performance designs like server CPUs (Intel, AMD).

Topology	Skew	Power	Routability	Typical use
H-Tree	Very Low	Low	Requires regular layout	Memory, FPGA fabric, custom blocks
Balanced Binary	Low	Medium	Handles irregular floorplan	Standard ASIC, SoC design
Mesh	Near Zero	Very High	Needs dedicated metal layers	High-performance server CPUs
Hybrid	Low	Medium	Moderate	Mixed-block SoCs with multiple clocks

Section 03

Clock Skew — Definition and Types

Clock skew is the difference in clock arrival time between the launch flip-flop and the capture flip-flop on a timing path. It arises from differences in buffer delays, wire lengths, and RC parasitics through the clock tree branches connecting source to each leaf.

Skew = T_clk_capture − T_clk_launch

Positive skew: capture clock arrives LATER than launch clock

Negative skew: capture clock arrives EARLIER than launch clock

Local Skew

The skew between two flip-flops that are directly connected by a timing path (launch FF → capture FF). This is what STA tools analyze per-path. Local skew directly appears in setup and hold equations. Modern CTS targets local skew < 50–100 ps at 7nm.

Global Skew

The maximum skew across the entire clock domain — the difference between the latest and earliest clock arrival among all flip-flops. A useful health metric for the clock tree but not directly used in per-path STA. Global skew is always larger than local skew.

Skew direction	Effect on Setup slack	Effect on Hold slack	Intuition
Positive (+) Capture clock later	Improves	Tightens	Data has more time to arrive before capture clock edge
Negative (−) Capture clock earlier	Tightens	Improves	Capture edge moves left — data must arrive sooner
Zero	Neutral	Neutral	Ideal CTS target — balanced tree

The skew trade-off: Any skew that helps setup simultaneously hurts hold by the same amount. You cannot improve both simultaneously through skew alone. Hold violations introduced by positive (useful) skew must be fixed by inserting delay buffers on the data path.

Section 04

Clock Insertion Delay (Clock Latency)

Clock insertion delay is the total propagation time from the clock source to a flip-flop's clock pin, measured through the PLL, clock network, and buffer tree. It has two components that STA treats separately.

Source Latency

The delay from the PLL or oscillator output to the clock definition point in the design (usually the top-level clock port). This includes off-chip PCB traces, package inductance, and on-chip wiring from the pad to the first clock buffer. Specified in SDC using set_clock_latency -source.

Network Latency

The delay from the clock definition port through all clock buffers and inverters in the clock tree to the flip-flop's clock pin. This is what CTS physically builds. After CTS, the tool annotates actual network latency from extraction. Before CTS, an ideal clock model is used with zero or estimated latency.

Total Clock Latency = Source Latency + Network Latency

Setup check (simplified):

T_launch_clk + T_cq + T_comb + T_su ≤ T_capture_clk + T_period

where T_launch_clk and T_capture_clk include full clock latency to each FF

Why latency matters for sign-off: Before CTS, SDC uses set_clock_latency estimates. After CTS, the actual annotated latency is used. If estimated and actual latencies differ significantly, timing that passed pre-CTS may fail post-CTS. Always re-run STA with propagated clocks after CTS.

# SDC: specify clock latency before CTS
set_clock_latency -source 0.5 [get_clocks CLK]   ;# source: PLL to port
set_clock_latency 1.2 [get_clocks CLK]            ;# network: estimated tree delay

# After CTS: use propagated clocks (actual network delay from extraction)
set_propagated_clocks [all_clocks]

Section 05

CTS Goals, Constraints, and Flow

Before running CTS, the designer specifies a set of targets that the tool must meet. These are set as CTS constraints in the tool's configuration or in the SDC.

Constraint	Typical target	What happens if violated
Max clock skew	50–150 ps (7nm–28nm)	Setup/hold timing margin is reduced; failing paths may appear
Max insertion delay	500 ps – 2 ns	Data path timing uses higher latency in STA, may cause setup failures
Max slew (transition time)	100–300 ps	Slow slew increases noise susceptibility and clock-to-Q delay
Max fanout per buffer	8–20 FFs	Too high fanout degrades slew; too low wastes buffer area and power
Max capacitance per node	Per-cell library limit	Excessive load slows the buffer, degrading slew and delay

The CTS flow runs in these steps:

1. Clock tree specification
   └── Define: clock roots, exceptions (don't-touch cells, clock gating),
               skew targets, slew targets, buffer/inverter cell list

2. Virtual tree construction
   └── Tool builds a virtual balanced tree ignoring physical layout

3. Physical tree construction
   └── Cells are placed, wires are routed (in-CTS routing)
   └── Buffer sizes are chosen to match delays across branches

4. Clock tree optimization (CTO)
   └── Iterative fixing of skew hotspots, slew violations,
       max-fanout violations

5. Post-CTS timing analysis
   └── STA with propagated clocks (actual annotated delays)
   └── Fix remaining setup/hold violations from skew imbalance

6. Incremental CTS for ECO
   └── Fix specific skew issues without re-running full CTS

Section 06

Clock Gating Cells in the Clock Tree

Clock gating is the most effective technique for reducing dynamic power in VLSI chips — by stopping the clock to idle registers, switching activity drops to near zero for those flip-flops. However, inserting clock gates into the clock tree has direct implications for CTS and timing.

Integrated Clock Gating (ICG) Cell

An ICG cell is a latch-based AND gate designed specifically for clock gating. The latch samples the enable signal on the clock's low phase, and the AND gate combines the latched enable with the clock. The latch eliminates glitches that a plain AND gate would produce when the enable changes while the clock is high.

ICG placement in the tree

ICG cells are treated as part of the clock tree. CTS must balance clock arrival time through ICG cells just as it balances through regular buffers. An ICG cell adds its own insertion delay (typically 50–150 ps), which must be accounted for in setup and hold analysis for all flip-flops downstream of the gate.

// RTL clock gating — synthesized into ICG cell
always_ff @(posedge clk) begin
  if (en) data_reg <= data_in;   // synthesis tool infers ICG
end

// Or explicit ICG instantiation in RTL
ICGX1 u_icg (.CLK(clk), .EN(enable), .SE(scan_en), .GCLK(gated_clk));

// Downstream FFs use gated_clk — they stop switching when enable = 0

CTS exception for ICG: Clock gates must be marked as clock_gating_check in SDC or as special cells in the CTS spec. If not, the tool may balance through them incorrectly, causing unexpected skew on downstream flip-flop clusters.

During scan test mode, the SE (Scan Enable) pin of the ICG forces the gate open so the scan clock can propagate to all FFs — critical for structural DFT test coverage.

Section 07

Useful Skew — Intentional Timing Slack

Useful skew is the deliberate introduction of unequal clock arrival times to improve timing margins on specific critical paths. Instead of targeting zero skew everywhere, the CTS tool or timing engineer intentionally delays the capture clock on a setup-critical path — effectively giving data more time to travel through the combinational logic.

Setup slack = T_period + T_skew − T_cq − T_comb − T_su

Adding positive T_skew directly adds to setup slack

Hold slack = T_cq + T_comb_min − T_h − T_skew

The same positive T_skew subtracts from hold slack

When to use useful skew

When a timing path has negative setup slack that cannot be fixed by gate sizing or logic restructuring — typically late in the design cycle when netlist changes are risky. Also used at the block level to trade slack from paths with positive margin to paths that are failing.

Useful skew risks

Every ps of positive skew added for setup removes 1 ps from hold margin for the same path pair. Aggressive useful skew can cause hold violations in the FF/FF path, requiring hold buffer insertion — which increases area, power, and routing congestion.

Useful skew in tools: PrimeTime and Tempus support useful skew optimization via set_clock_skew or through CTS-aware timing optimization (CCOPT in Innovus). The tool automatically balances setup gain against hold risk, inserting hold buffers as needed.

Section 08

How Skew Appears in STA Reports

After CTS, every timing path report from tools like PrimeTime includes the clock arrival times for both the launch and capture flip-flops. The skew is visible as the difference between these two numbers.

  Timing Path Report — Setup Check
  ─────────────────────────────────────────────────────────
  Path:  FF_A/Q  →  [comb logic]  →  FF_B/D

  Data path:
    FF_A clock-to-Q               0.18 ns
    Combinational delay           0.62 ns
    FF_B setup time               0.05 ns
    ──────────────────────────────────────
    Data required arrival         0.85 ns

  Clock path:
    Clock source (PLL out)        0.00 ns
    Buffer BUF1                   0.15 ns
    Buffer BUF2 (launch, FF_A)    0.38 ns   ← launch latency
    Buffer BUF3 (capture, FF_B)   0.46 ns   ← capture latency

    Clock period                  1.00 ns
    Capture edge                  1.46 ns   (1.00 + 0.46)

    Skew = 0.46 − 0.38 = +0.08 ns (positive — helps setup)

  Setup slack = 1.46 − 0.85 = +0.61 ns  ✓ PASS

Reading skew from STA: Launch clock latency is subtracted; capture clock latency is added to the required time. A positive skew (capture > launch latency) adds to the window available for data, improving setup slack. Always check the hold path after seeing positive skew — it costs the same amount on hold.

Section 09

Common CTS Problems and Fixes

Problem	Root cause	Fix
High local skew on critical path	Unequal buffer stages or wire length to launch vs capture FF	Add/resize buffers on the shorter branch to equalize delay; use useful skew carefully
Clock slew violation	Too much capacitance on a clock node (high fanout or long wire)	Insert additional buffer stage to reduce per-buffer load; upsize buffer drive strength
Hold violations post-CTS	Positive skew from CTS or useful skew tightened hold margin below zero	Insert delay buffers (filler buffers) on the data path of the violating paths
Clock tree power too high	Overly deep buffer tree or insufficient clock gating	Add clock gating (ICG) on idle register banks; target minimum insertion delay in CTS spec
Skew mismatch between corners	Different RC extraction or cell delay at SS vs FF corner	Run CTS with worst-case extraction; use AOCV/POCV derating on clock tree cells
Post-route skew degradation	Clock wires re-routed during detailed routing, changing RC	Use clock net shielding and NDR (Non-Default Rules) for clock wires; re-run CTO post-route

Section 10

Frequently Asked Questions

Clock skew is a spatial effect — the fixed difference in clock arrival time between two physical flip-flops caused by unequal clock tree delays. It is deterministic and reproducible for a given design. Clock jitter is a temporal effect — the cycle-to-cycle variation in the clock period caused by PLL noise, supply noise, and thermal variation. Jitter changes every clock cycle and is modeled statistically. STA accounts for both: skew appears in the path timing equations, jitter is subtracted from the available clock period as an uncertainty budget.

Setup checks that data arrives before the capture clock edge. If the capture clock is delayed (positive skew), the deadline moves later — data has more time, so setup slack increases. Hold checks that data does not change too soon after the capture clock edge. The same delayed capture clock edge means the hold window starts later, and the data path (which launches from the fixed launch FF) may already be changing when the window opens — reducing hold slack. Both are governed by the same skew term appearing with opposite signs in the setup and hold equations.

An H-tree is a clock routing structure where the clock travels from a center point through wires forming a letter H, then each endpoint fans out through another H, recursively. At every level, both left and right branches are identical wire lengths. This geometric symmetry means the RC delay from the center to every leaf point is identical — before any buffers are inserted. H-trees are used in memory arrays and custom datapath blocks where the regular grid layout makes them practical. In irregular ASIC floorplans, H-trees are not practical because the floorplan does not have the geometric regularity needed.

Pre-CTS timing uses an ideal clock model: zero insertion delay, zero skew, and no clock tree. This is how all RTL synthesis and early physical implementation timing is analyzed. Post-CTS timing uses propagated clocks: actual cell delays and wire RC from extraction are annotated onto the clock tree, so real insertion delay and real skew appear in every path's timing. A design that meets timing pre-CTS may fail post-CTS if clock latency is higher than the estimated SDC value, or if skew is larger than assumed. Post-CTS signoff timing is the authoritative result.

During scan test mode, the Scan Enable (SE) signal forces all ICG (clock gating) cells open, allowing the test clock to reach every flip-flop regardless of the functional enable signals. CTS must ensure the scan clock also meets slew and fanout requirements through the ICG cells. At-speed test (ATPG) requires the scan shift clock to propagate with the same timing as the functional clock — so the clock tree must work correctly in both functional and scan modes. CTS constraints usually include a scan_mode scenario to verify this.

Non-Default Rules (NDR) are special routing rules applied to clock nets to make them more robust than data nets. Typical NDR rules for clocks use double-width wires (reducing resistance) and double-spacing between clock wires and neighboring wires (reducing capacitive coupling noise). Clock nets may also use shielding — grounded wires on both sides of the clock wire — to prevent switching noise from data nets from coupling into the clock signal and causing jitter or glitches. NDR rules increase clock wire area but are essential for clock integrity at advanced nodes.

Explore Further

Clock Gating (ICG)

Study the Integrated Clock Gating cell that CTS must handle specially — how latch-based AND gates prevent glitches and how the SE pin opens the gate during scan test.

Glitch-Free Clock Mux

Learn how to safely switch between two clock sources without glitches — the exact topology CTS uses for muxed clock trees and how SDC constrains each selection.

Clock Domain Crossing

Clock skew between independent clock domains is unbounded — see how 2-FF synchronizers, handshake protocols, and async FIFOs safely cross asynchronous clock boundaries.