HomeRTL→SiToolsInterview
Chapter 4 of 10
← Ch.3 Floorplan Ch.5 CTS →
⚡ Animated Placement Visualizer inside

Placement

With floorplan set and macros fixed, tens of thousands of standard cells must be assigned their exact (x, y) coordinates. Placement quality defines your routing success, timing closure difficulty, and the final chip's performance.

📖 ~30 min read 🎯 Global · Detailed · Timing-Driven · Congestion 🏭 Next: Clock Tree Synthesis →
In this chapter
  1. Global vs Detailed Placement
  2. Timing-Driven Placement
  3. Congestion-Driven Placement
  4. Legalization
  5. Scan Chain Reordering
  6. Placement Blockages & Halos
  7. Interactive: Placement Visualizer
  8. Interpreting Congestion Maps & Timing
  9. Key Takeaways

1. Global vs Detailed Placement

Placement happens in two stages. Global placement assigns approximate locations to cells, optimizing for wire length and congestion at a coarse level. Cells may overlap at this stage. Detailed placement then resolves overlaps, snaps cells to legal placement rows, and performs local optimization.

StageObjectiveCell overlap?Method
Global placementMinimize total wire length (HPWL), spread cells evenlyAllowed (cells are "fuzzy")Force-directed, analytical (e-PLACE, NTUplace)
LegalizationResolve overlaps, snap to placement rowsNone after this stepAbacus algorithm, greedy row packing
Detailed placementLocal cell swaps to improve timing/WLNoneSimulated annealing, dynamic programming
Post-placement opt.Fix timing violations found after legalizationNoneCell sizing, buffering, topology changes

Modern tools (Cadence Innovus, Synopsys ICC2) run all stages as a single place_opt command, but they iterate through these phases internally. Understanding the phases helps when debugging placement-related timing or congestion issues.

2. Timing-Driven Placement

Standard wire-length minimization (HPWL) treats all nets equally. Timing-driven placement weights critical nets more heavily — cells on the critical path are pulled closer together to minimize delay on those specific connections.

The key metric is criticality: a net's criticality is proportional to how close its slack is to the worst negative slack (WNS). A net on the critical path has criticality = 1.0; a net with large positive slack has criticality ≈ 0.

Timing-driven placement formula:
Net weight = 1 + α × criticality
where α is the timing weight factor (typically 2–5). Critical nets are weighted up to 6× more than non-critical nets.

After global placement, the placer runs incremental STA using estimated wire delays (based on placed positions). Cells with negative slack are moved closer to their fanin/fanout, trading some wire-length optimality for timing improvement.

# Innovus: Run placement with timing optimization
place_opt \
  -effort high \
  -timing_driven \
  -congestion_effort medium

# After placement: check timing estimate
report_timing \
  -max_paths 10 \
  -path_type full \
  -delay_type max

# Check worst slack and total negative slack
report_design -timing_summary

3. Congestion-Driven Placement

Routing congestion occurs when the demand for routing tracks in a local region exceeds supply. A router that cannot find tracks must detour wires, which increases delay and can cause DRC violations. Placement prevents this by spreading cells to avoid local density hotspots.

How the placer estimates congestion

The die is divided into a routing estimation grid (typically 5–10× the standard cell height). For each tile, the placer estimates: (a) the number of routing tracks available (supply) and (b) the number of wires that need to pass through (demand, estimated from net cuts). Overflow = max(0, demand − supply). The placer penalizes placements that increase overflow.

# Check congestion after placement
report_placement_congestion

## Example output:
# Routing congestion summary:
# Layer M3 (horizontal): max overflow = 2 tracks at (450, 320)
# Layer M4 (vertical):   max overflow = 0
# Global overflow:        4.2%  ← target < 1% for clean routing

# Visualize congestion heatmap
display_congestion_map -layer M3

# If congestion hotspot found, add local spreading:
refine_placement \
  -focus_area {400 280 500 360} \
  -congestion_effort ultra

4. Legalization

After global placement, cells are at approximate positions and may overlap. Legalization is the process of moving cells to legal positions: snapped to placement row boundaries, non-overlapping, and within the placement area.

Placement rows

Standard cells must sit in predefined placement rows — horizontal stripes across the core area whose height equals the standard cell height (e.g., 0.72 µm in a 7nm technology). Each row has a power rail (VDD or VSS) running along its top or bottom edge. Cells in adjacent rows share rails — this is the "rail-sharing" structure that makes standard cell layout efficient.

Legalization must also respect:

Legalization displacement: The quality of global placement determines how much cells move during legalization. A poor global solution forces cells to displace far from their optimal positions, destroying the timing and congestion improvements made in global placement. Displacement > 10% of core edge length is a warning sign.

5. Scan Chain Reordering Post-Placement

For DFT (Design for Test), flip-flops are connected into one or more scan chains — a shift register that allows test vectors to be loaded serially. The scan chain order is originally determined during synthesis, but after placement, the physical positions of FFs are known.

If the scan chain order doesn't match the physical order of FFs, the scan-in wire must zigzag across the chip to reach FFs in sequence. This adds significant wire length and can cause timing violations on the scan path (which must meet its own timing constraints during test mode).

# Reorder scan chains to minimize scan wire length post-placement
set_scan_reorder_mode -effort high -optimize scan_wire_length
reorder_scan

# Verify scan chain connectivity after reordering
report_scan_chain -detail

## Before reorder: scan wire length = 18.4 mm  (zigzag)
## After reorder:  scan wire length = 3.1 mm   (physical order)

6. Placement Blockages & Halos

Certain regions of the core must be kept free of standard cells. These are defined using placement blockages:

Blockage typeEffectTypical use case
Hard blockageNo cells allowed inside the regionMacro keepout, analog isolation zone
Soft blockagePlacer avoids the region but can use it if neededPreferred keepout (placer may override under congestion)
Partial blockageReduce density to a specified % in the regionBuffer zones around macros for routing access
HaloAutomatically follows a macro as it movesPre-placement macro margins (applied in floorplan)
⚡ Interactive: Placement Visualizer
Watch ~40 cells transition through three placement phases. Cells are colored by timing criticality. Use the buttons to step through phases or animate automatically.
Phase: Random (pre-placement)
--
Est. Wire Length (mm)
--
Worst Slack (ps)
--
Routing Overflow %
Critical path cells Near-critical High slack (non-critical)

8. Interpreting Congestion Maps & Timing After Placement

Reading a congestion heatmap

Congestion heatmaps display routing overflow as a color gradient (green → yellow → red). A red hotspot at a specific (x, y) location means the number of wires that need to pass through that tile exceeds available routing tracks. Solutions:

Timing after placement

Post-placement timing uses estimated wire delays (from RC estimation based on placed cell positions). This is less accurate than post-route STA but provides an early indicator of timing problems. Key metrics to check:

MetricTargetAction if violated
WNS (Worst Negative Slack)≥ 0 ps (or close to 0 with margin)Resize critical cells, move closer, pipeline
TNS (Total Negative Slack)0 psReduce endpoint count in violation
Max transition time< 200 ps (7nm typical)Upsize driver cell or add buffer
Max capacitancePer library cell limitUpsize driver or split net
Post-placement timing is optimistic: Actual post-route wire delays are 10–30% higher than estimated, because the router must detour around congestion. If WNS is −50 ps after placement, expect −80 to −100 ps after routing. Fix timing aggressively at the placement stage.

✅ Chapter 4 Key Takeaways

Next → Chapter 5
Clock Tree Synthesis (CTS)
Building H-trees and mesh topologies to minimize clock skew and insertion delay across thousands of flip-flops.