VLSI · Physical Design · Stage 3 of 5

Clock Tree Synthesis
(CTS)

Distribute the clock from source to every flip-flop with minimal skew and controlled insertion delay — the heartbeat of every synchronous design.

Stage: After Placement, Before Routing
Tools: Innovus (clock_design), ICC2 (synthesize_clock_tree)
Goal: Skew < 100ps, Meet insertion delay
Output: Clock tree netlist + updated DEF

1. What is Clock Tree Synthesis?

In any synchronous digital design, every flip-flop must be clocked. A single clock source drives thousands (or millions) of flip-flop clock pins. If the clock arrives at different times at different flip-flops, timing violations result. CTS builds a physical network of buffers and inverters that distributes the clock uniformly.

Input: Post-placement DEF + SDC clock definitions + CTS spec (target skew, insertion delay)
Output: Updated netlist with clock buffers/inverters inserted, DEF with their placement
Done before: Signal routing (clock nets need special treatment — NDR rules, shielding)
CTS: Clock Distribution from Source to Flip-Flops Clock Source PLL / Crystal Osc CLK_BUF_L1 CLK_BUF_L2_A CLK_BUF_L2_B CLK_BUF_L2_C CLK_BUF_L3 CLK_BUF_L3 FF_A1 FF_A2 FF_A3 FF_A4 CLK_BUF FF_B1 CLK_BUF_L3 CLK_BUF_L3 FF_C1 FF_C2 FF_C3 FF_C4 Insertion Delay (CK source → FF.clk) Ideal: All FFs receive clock at same time — Skew = 0ps Clock Skew = T_arrival(FF_latest) − T_arrival(FF_earliest) Target: Skew < 100ps | Insertion delay typically 400–800ps for 1GHz+ designs Clock Buffer Flip-Flop (sink) Clock net
3-level clock tree: trunk buffer → branch buffers → leaf buffers → flip-flop sinks. Balanced path lengths minimize skew.

2. Key CTS Metrics

Clock Skew

Skew = T_arrival(capturing FF) − T_arrival(launching FF)
Skew TypeDefinitionEffect on SetupEffect on Hold
Zero skewBoth FFs receive clock simultaneouslyNeutralNeutral
Positive skewCapturing FF gets clock laterRelaxed (more time for data)Tightened (less hold margin)
Negative skewCapturing FF gets clock earlierTightened (less time for data)Relaxed (more hold margin)

Setup Timing with Skew

Setup slack = T_clk + Skew − T_data_path − T_setup_margin

Hold Timing with Skew

Hold slack = T_data_path − T_hold_margin − Skew

Insertion Delay (Latency)

Insertion delay is the time from the clock source pin to a flip-flop's clock pin. For a balanced tree, all FFs should have approximately equal insertion delay.

MetricTypical RangeImpact
Clock skew0 – 200ps (target <100ps)Directly reduces setup/hold margin
Insertion delay200ps – 1nsNot directly a timing violation, but affects power and jitter
Clock slew50 – 200psSlow slew → excess power, jitter, noise coupling
Clock jitter10 – 100psReduces effective setup margin → must be budgeted in STA

3. Clock Tree Topologies

H-Tree Equal path lengths root Equal path lengths ✓ Skew ≈ 0 Good for regular clock domains Used in: ARM Cortex, GPU cores Fishbone / Clock Spine Spine + branches clock spine (trunk) root Good for linear layouts Skew depends on spine tap point spacing Used in: linear arrays, DRAM row circuits Clock Mesh Global grid ✓ Very low skew High power — many buffers Multiple supply paths Used in: high-perf CPUs, Intel, AMD desktop chips
Three clock tree topologies: H-tree (equal path lengths, near-zero skew), Fishbone/spine (linear layouts), Clock mesh (lowest skew, highest power)
TopologySkewPowerAreaBest For
H-TreeVery low (~0)Low–MediumMediumRegular, symmetric floorplans (GPU shader cores, ARM Cortex)
Fishbone/SpineLow–MediumLowLowLinear cell arrays, less dense designs
Clock MeshNear zeroHighHighHigh-performance CPUs needing <50ps skew (Intel/AMD)
HybridLowMediumMediumIrregular floorplans — tree to sink clusters, mesh within cluster

4. CTS Implementation Flow

CTS follows a structured flow in commercial tools. Each step refines clock distribution quality.

StepActionGoal
1. Pre-CTS setupDefine clock sources, set CTS spec (target skew, max insertion delay, buffer lib)Constrain the builder
2. Clock tree buildTool inserts clock buffers/inverters, builds balanced tree from source to sinksAchieve balanced latency
3. Clock tree optimizationResize buffers, adjust tap points, insert local inverters to fix skew imbalancesMeet skew target
4. Post-CTS timingSTA with actual clock latencies on both launch and capture pathsVerify setup + hold with skew
5. Hold fixingInsert delay buffers (hold buffers) on paths with negative hold slackFix hold violations introduced by skew
6. Clock routingRoute clock nets with NDR rules (wider wire, more spacing, shielding)Protect from noise, SI
Critical order: CTS must happen after placement and before signal routing. Clock nets are routed first (with NDR rules), then signal nets fill remaining space.

5. NDR Rules, Shielding & Clock Net Special Treatment

Clock nets carry the most sensitive signals in the chip. They need special routing rules to prevent noise coupling and ensure clean signal integrity.

Non-Default Routing Rules (NDR)

RuleTypical ValueReason
Wire width2× minimum widthLower resistance → less RC delay and slew degradation
Wire spacing2× minimum spacingReduce capacitive coupling from adjacent signal nets
Via doubling2 vias per layer changeRedundancy — single-via open causes clock failure = chip dead
Layer preferenceTop metal layers (M5+)Lower resistance per unit length, less congestion

Clock Shielding

Shielding surrounds the clock wire with VDD or VSS wires on both sides. This prevents switching noise from adjacent data signals from coupling onto the clock net (which would cause jitter).

Clock Net: NDR Rules vs Standard Signal Routing Standard Signal Net DATA CLK DATA DATA Noise couples onto CLK! Min spacing = 1× width Min width = 1× standard Single via per layer Clock Net with NDR + Shielding VSS CLK VSS DATA Double via ✓ No noise coupling 2× width, 2× spacing VSS shields on both sides
Left: Standard routing — adjacent signals couple noise onto clock wire. Right: NDR rules + VSS shielding eliminate coupling, 2× width lowers resistance.

6. Useful Skew (Intentional Skew)

Instead of fighting all skew, useful skew deliberately introduces skew to fix timing violations. The key insight: if a launch→capture path has setup violation, delaying the capturing FF's clock arrival time gives the data more time — effectively borrowing from the clock budget.

Useful Skew: Fixing Setup Violation with Intentional Delay BEFORE: Setup Violation FF_launch clk: T=0 FF_capture clk: T=0 Data path: data arrives T=480ps next clk edge T=500ps Setup slack = −30ps ❌ AFTER: Useful Skew Applied FF_launch clk: T=0 FF_capture clk: T=+50ps skew Data path: data arrives T=480ps next clk T=550ps Setup slack = +20ps ✓ Fixed!
Useful skew: delaying capturing FF's clock by 50ps converts a −30ps setup violation into +20ps slack. Data path unchanged — only clock distribution adjusted.
Useful skew creates a coupling between setup and hold: fixing setup on one path by adding positive skew TIGHTENS hold margin on the same path. Always check hold after applying useful skew.

7. CTS Commands — Innovus & ICC2

ToolCommandPurpose
Innovusclock_designFull CTS: build + optimize clock tree
Innovusset_ccopt_property target_skew 100Set target skew to 100ps
Innovusset_ccopt_property max_fanout 20Limit buffer fanout in clock tree
Innovusset_ccopt_property insertion_delay 500Target insertion delay 500ps
Innovusreport_clock_timing -type skewReport clock skew across all paths
Innovusreport_clock_timing -type latencyReport insertion delay per clock endpoint
ICC2synthesize_clock_treeBuild the clock tree
ICC2set_clock_tree_options -target_skew 0.1Set skew target (in ns)
ICC2clock_opt -from build -to buildCTS build phase only
ICC2report_clock_qorClock tree quality report
OpenROADclock_tree_synthesisCTS via TritonCTS

Pre-CTS SDC Clock Definitions

create_clock -name CLK -period 1.0 [get_ports clk]
set_clock_uncertainty -setup 0.05 [get_clocks CLK]
set_clock_uncertainty -hold 0.02 [get_clocks CLK]
set_clock_transition 0.1 [get_clocks CLK]

CTS Best Practices

  • Fix all DRVs (max transition, max capacitance) before CTS — bad slew propagates through clock tree
  • Don't clock gate inside the clock tree — only use ICG cells before the tree root
  • Use only CTS-approved buffer/inverter cells (matched drive strength pairs)
  • Apply NDR rules before CTS so the router knows which nets need special treatment
  • Check post-CTS hold violations immediately — they are cheaper to fix before routing
  • Shield all clock nets from trunk to leaf — partial shielding still couples at unshielded segments
  • Target insertion delay variation < 20ps across all domains for multi-domain chips

8. Interview Questions & Answers

Physical DesignCTS Basics
What is Clock Tree Synthesis and why is it needed?
CTS is the step that builds a physical clock distribution network from the clock source (PLL output) to every flip-flop's clock pin in the design. Without CTS, the clock signal would reach different flip-flops at different times — causing clock skew which can violate setup and hold timing for paths that have nothing wrong with their combinational logic.

CTS inserts a tree of clock buffers and inverters, sized and placed so that each path from source to any FF clock pin has approximately equal RC delay. This minimizes skew and ensures all timing analysis is meaningful.
Physical DesignSkew & Latency
What is the difference between clock skew and clock insertion delay?
Clock skew = difference in arrival time between two FFs. Skew = T_arrive(FF_B) − T_arrive(FF_A). Skew directly eats into setup and hold margins. Target: <100ps.

Insertion delay (latency) = absolute time from clock source to a single FF's clock pin. A large insertion delay means the entire clock domain is delayed — this is fine for intra-domain timing (setup/hold checks cancel out) but matters for cross-domain paths and I/O timing. Typical: 300–800ps for 1GHz designs.

Key point: Low skew is critical for timing. Insertion delay only matters at chip boundaries (I/O, cross-domain).
Physical DesignNDR Rules
What are Non-Default Routing Rules (NDR) and why are they applied to clock nets?
NDR rules override the default minimum-width/spacing routing rules for specific nets. For clock nets:

2× width: Reduces wire resistance → lower RC delay → better slew preservation through the tree
2× spacing: Increases distance to neighboring wires → reduces capacitive coupling (SI noise) that would inject jitter
Double vias: Two vias at every layer transition → redundancy. A single-via open on a clock net = chip fails; double via dramatically reduces this risk
Layer preference: Top metal layers for clock trunk (lower resistance per unit length)

Without NDR, signal integrity problems cause clock jitter, which directly reduces effective timing margin.
Physical DesignUseful Skew
What is useful skew and when would you apply it?
Useful skew is intentional clock skew introduced to fix timing violations without modifying the logic or data path. If FF_launch → FF_capture has a setup violation, deliberately delaying the capture FF's clock arrival by Δt effectively extends the launch-to-capture time window by Δt, resolving the violation.

When to apply: Post-CTS setup violations that routing optimization cannot close. Typically last resort before ECO.

Trade-off: Positive skew (delaying capture) fixes setup but tightens hold on the same path. Must always check hold after applying useful skew. Also, skewing one FF affects all paths through it, potentially creating new violations on other paths sharing that FF.
Physical DesignHold Fixing
Why do hold violations often appear after CTS and how do you fix them?
Before CTS (pre-CTS), STA assumes ideal clocks — zero insertion delay, zero skew. Post-CTS, real clock latencies are applied. If two FFs in a path have a clock skew such that the capturing FF gets the clock before the launching FF (negative skew), the hold check tightens dramatically:

Hold slack = T_data_path − T_hold − Skew

Negative skew reduces hold slack. A path that had +200ps hold margin pre-CTS might have −50ps hold violation post-CTS if the skew is 250ps.

Fix: Insert delay buffers (hold buffers, also called delay cells or DLY cells) on the data path to lengthen it. These are sized buffers whose purpose is to add propagation delay. Fix hold violations before routing — post-route hold fixing requires ECO routes and is much harder.
Physical DesignTopology
When would you choose a clock mesh over an H-tree topology?
Clock mesh: a global grid of clock wires driven by multiple leaf buffers. Any FF taps the nearest grid wire. Multiple supply paths mean skew is extremely low (grid averages out RC variation). Used when target skew is <50ps (e.g., high-performance CPUs at 3–5GHz: Intel, AMD).

When to choose mesh: (1) Very high frequency (>2GHz), (2) Irregular FF distribution that makes H-tree unbalanced, (3) Design has many clock domains at same frequency (share mesh), (4) Robustness requirement — process variation has less impact on mesh than tree.

Trade-off: Mesh consumes 2–5× more clock power than H-tree (more buffers, more wire switching), uses significant routing resources on top metal layers. Only justified when skew target is extremely tight.
Placement ↑ Physical Design Hub Routing