With floorplan set and macros fixed, tens of thousands of standard cells must be assigned their exact (x, y) coordinates. Placement quality defines your routing success, timing closure difficulty, and the final chip's performance.
Placement happens in two stages. Global placement assigns approximate locations to cells, optimizing for wire length and congestion at a coarse level. Cells may overlap at this stage. Detailed placement then resolves overlaps, snaps cells to legal placement rows, and performs local optimization.
| Stage | Objective | Cell overlap? | Method |
|---|---|---|---|
| Global placement | Minimize total wire length (HPWL), spread cells evenly | Allowed (cells are "fuzzy") | Force-directed, analytical (e-PLACE, NTUplace) |
| Legalization | Resolve overlaps, snap to placement rows | None after this step | Abacus algorithm, greedy row packing |
| Detailed placement | Local cell swaps to improve timing/WL | None | Simulated annealing, dynamic programming |
| Post-placement opt. | Fix timing violations found after legalization | None | Cell sizing, buffering, topology changes |
Modern tools (Cadence Innovus, Synopsys ICC2) run all stages as a single place_opt command, but they iterate through these phases internally. Understanding the phases helps when debugging placement-related timing or congestion issues.
Standard wire-length minimization (HPWL) treats all nets equally. Timing-driven placement weights critical nets more heavily — cells on the critical path are pulled closer together to minimize delay on those specific connections.
The key metric is criticality: a net's criticality is proportional to how close its slack is to the worst negative slack (WNS). A net on the critical path has criticality = 1.0; a net with large positive slack has criticality ≈ 0.
After global placement, the placer runs incremental STA using estimated wire delays (based on placed positions). Cells with negative slack are moved closer to their fanin/fanout, trading some wire-length optimality for timing improvement.
# Innovus: Run placement with timing optimization place_opt \ -effort high \ -timing_driven \ -congestion_effort medium # After placement: check timing estimate report_timing \ -max_paths 10 \ -path_type full \ -delay_type max # Check worst slack and total negative slack report_design -timing_summary
Routing congestion occurs when the demand for routing tracks in a local region exceeds supply. A router that cannot find tracks must detour wires, which increases delay and can cause DRC violations. Placement prevents this by spreading cells to avoid local density hotspots.
The die is divided into a routing estimation grid (typically 5–10× the standard cell height). For each tile, the placer estimates: (a) the number of routing tracks available (supply) and (b) the number of wires that need to pass through (demand, estimated from net cuts). Overflow = max(0, demand − supply). The placer penalizes placements that increase overflow.
# Check congestion after placement report_placement_congestion ## Example output: # Routing congestion summary: # Layer M3 (horizontal): max overflow = 2 tracks at (450, 320) # Layer M4 (vertical): max overflow = 0 # Global overflow: 4.2% ← target < 1% for clean routing # Visualize congestion heatmap display_congestion_map -layer M3 # If congestion hotspot found, add local spreading: refine_placement \ -focus_area {400 280 500 360} \ -congestion_effort ultra
After global placement, cells are at approximate positions and may overlap. Legalization is the process of moving cells to legal positions: snapped to placement row boundaries, non-overlapping, and within the placement area.
Standard cells must sit in predefined placement rows — horizontal stripes across the core area whose height equals the standard cell height (e.g., 0.72 µm in a 7nm technology). Each row has a power rail (VDD or VSS) running along its top or bottom edge. Cells in adjacent rows share rails — this is the "rail-sharing" structure that makes standard cell layout efficient.
Legalization must also respect:
For DFT (Design for Test), flip-flops are connected into one or more scan chains — a shift register that allows test vectors to be loaded serially. The scan chain order is originally determined during synthesis, but after placement, the physical positions of FFs are known.
If the scan chain order doesn't match the physical order of FFs, the scan-in wire must zigzag across the chip to reach FFs in sequence. This adds significant wire length and can cause timing violations on the scan path (which must meet its own timing constraints during test mode).
# Reorder scan chains to minimize scan wire length post-placement set_scan_reorder_mode -effort high -optimize scan_wire_length reorder_scan # Verify scan chain connectivity after reordering report_scan_chain -detail ## Before reorder: scan wire length = 18.4 mm (zigzag) ## After reorder: scan wire length = 3.1 mm (physical order)
Certain regions of the core must be kept free of standard cells. These are defined using placement blockages:
| Blockage type | Effect | Typical use case |
|---|---|---|
| Hard blockage | No cells allowed inside the region | Macro keepout, analog isolation zone |
| Soft blockage | Placer avoids the region but can use it if needed | Preferred keepout (placer may override under congestion) |
| Partial blockage | Reduce density to a specified % in the region | Buffer zones around macros for routing access |
| Halo | Automatically follows a macro as it moves | Pre-placement macro margins (applied in floorplan) |
Congestion heatmaps display routing overflow as a color gradient (green → yellow → red). A red hotspot at a specific (x, y) location means the number of wires that need to pass through that tile exceeds available routing tracks. Solutions:
refine_placement with local effort)Post-placement timing uses estimated wire delays (from RC estimation based on placed cell positions). This is less accurate than post-route STA but provides an early indicator of timing problems. Key metrics to check:
| Metric | Target | Action if violated |
|---|---|---|
| WNS (Worst Negative Slack) | ≥ 0 ps (or close to 0 with margin) | Resize critical cells, move closer, pipeline |
| TNS (Total Negative Slack) | 0 ps | Reduce endpoint count in violation |
| Max transition time | < 200 ps (7nm typical) | Upsize driver cell or add buffer |
| Max capacitance | Per library cell limit | Upsize driver or split net |