Physical Design Flow
Physical design transforms a synthesized gate-level netlist into a manufacturable GDSII layout. Every decision — where to place each cell, how to route every wire, how to distribute the clock — directly determines timing, power, area, and yield of the final chip.
1. Physical Design Flow Overview
The physical design (PD) flow is a sequential series of transformations, each adding more geometric detail to the abstract netlist. Modern PD tools (Cadence Innovus, Synopsys IC Compiler II) execute these stages with iterative feedback loops:
Import & Design Setup
Read netlist (Verilog), technology files (LEF/TECH), library timing (Liberty), and constraints (SDC). Initialize the die area.
Floorplanning
Define die and core area. Place macros (memories, IPs). Establish power rings and trunk routes. Set aspect ratio and utilization target (typically 70–80%).
Power Planning
Create power grid — VDD/GND rings, stripes, and rails. Analyze IR drop. Ensure sufficient current delivery without voltage droop at any cell.
Placement
Place standard cells in rows within the core area. Global placement minimizes wirelength; detailed placement legalizes cells to grid-aligned positions.
Clock Tree Synthesis (CTS)
Insert clock buffers and inverters to distribute the clock with balanced latency to all flip-flop clock pins, minimizing skew within target (typically < 50 ps).
Routing
Connect all nets with metal wires obeying DRC rules. Global routing assigns nets to routing regions; detailed routing assigns exact tracks and vias.
Sign-off Verification
DRC, LVS, post-route STA with extracted parasitics (SPEF), IR drop analysis, EM (electromigration) check, and final timing sign-off.
GDSII Tape-out
Stream out the verified layout in GDSII format to the foundry for mask generation and fabrication.
2. Floorplanning
Floorplanning is the highest-impact stage — decisions made here echo through every subsequent step. Key objectives:
The die includes I/O pads and ESD structures around the perimeter. The core (interior) holds standard cells, macros, and power structures.
Memories and hard IPs placed at edges or corners to minimize routing congestion and maintain abutment with power stripes.
Core utilization = (cell area) / (core area). 70–80% is typical — higher causes routing congestion; lower wastes die area.
Placement blockages around macros prevent cells from being placed in electrically sensitive regions or routing channels.
3. Placement
Placement determines where each standard cell lives within the core area. The problem is NP-hard — tools use heuristic algorithms to approximate the global optimum.
| Stage | Goal | Algorithm | Constraint |
|---|---|---|---|
| Global Placement | Minimize total wirelength (HPWL) | Analytical / simulated annealing | Cells can overlap |
| Legalization | Move cells to legal rows without overlap | Abacus / tetris | No overlaps, on-grid |
| Detailed Placement | Local optimization: timing, routability | Swap, shift, mirror | Legal positions only |
Timing-driven placement: Modern tools weight net criticality during global placement — critical-path nets are shortened aggressively at the cost of non-critical wirelength. This is controlled by placement constraints derived from SDC timing budgets.
4. Clock Tree Synthesis (CTS)
After placement, the clock distribution network is built. The clock drives every flip-flop simultaneously — even 100 ps of skew on a 1 GHz clock represents a 10% budget hit. CTS inserts clock buffers and inverters to balance arrival times across all sinks.
Key CTS Metrics
Max difference in clock arrival time between any two FFs in the same clock domain. Target: < 50–100 ps.
Total delay from clock source to sink FF. Consistent latency across domains is critical for CDC analysis.
Total buffer chain delay added by the CTS. Larger designs with more sinks need longer chains → higher latency.
H-Tree Topology
The H-tree distributes the clock via H-shaped branches of equal wire length at each level. Each split point has identical wire delay to both children, achieving near-zero geometric skew. The interactive lab below lets you explore how buffer insertion on different branches changes skew and latency.
Interactive: CTS H-Tree Skew Balancer
Click any wire segment to insert a buffer (+50 ps). Balance the clock tree to minimize global skew. Green sinks are early; red sinks are late.
CTS Toolkit
Click wires in the diagram to add buffers. Each buffer adds 50 ps delay to that branch.
Red FF = late (high latency)
Target skew < 100 ps
5. Routing
Routing connects all nets with metal wires on the available metal layers. Modern designs use 10–15 metal layers. Lower layers (M1–M3) handle local connections; upper layers (M6–M10+) handle global signals and power distribution.
| Stage | What It Does | Output |
|---|---|---|
| Global Routing | Assigns nets to coarse routing regions (G-cells); estimates congestion | G-cell routing guide |
| Track Assignment | Assigns global routes to specific metal tracks | Track-assigned routes |
| Detailed Routing | Determines exact wire shapes, vias, and connections obeying all DRC rules | Full GDSII-ready layout |
| Search and Repair | Iteratively fixes remaining DRC violations | DRC-clean layout |
6. Physical Sign-Off
Before tape-out, the layout must pass a rigorous set of checks. All checks run on the final routed layout with extracted parasitics (SPEF file) from the layout extraction tool.
Design Rule Check — verifies layout geometry satisfies all foundry rules: minimum width, spacing, enclosure, density. Zero violations required for tape-out.
Layout vs. Schematic — verifies connectivity and device parameters match the netlist. Detects opens, shorts, missing vias, and parameter mismatches.
Static Timing Analysis with RC parasitics extracted from the layout. All setup/hold paths must meet timing at all PVT corners.
Power grid IR drop must stay below 5–10% of VDD. Electromigration checks ensure metal/via current density is within limits for 10+ year lifetime.