Your RTL is an intent. The synthesizer's job is to turn that intent into actual gates from the foundry's library — while meeting your timing, area, and power goals.
Logic synthesis is a three-stage transformation: Elaboration → Generic optimization → Technology mapping.
| Stage | Input | Output | Tool action |
|---|---|---|---|
| Elaboration | RTL (Verilog/SV) | GTECH netlist | Parse, resolve hierarchy, infer registers/memories |
| Generic opt. | GTECH netlist | Optimized GTECH | Boolean optimization, constant propagation, dead logic removal |
| Tech mapping | Optimized GTECH | Mapped netlist (.v) | Replace generic gates with PDK standard cells, meet timing |
The mapped netlist references cells from the standard cell library — AND2X1, INVX2, DFFX1, etc. Each cell has characterized timing (setup, hold, propagation delay), area, and power at all PVT corners.
The synthesizer decomposes your Boolean logic into a network of AND/OR/NOT gates, then maps that network to cells in the standard cell library using a tree-covering algorithm (dynamic programming over the logic cone).
Each cell has a cost: delay × area × power. The tool picks cell combinations that minimize the weighted cost while meeting timing constraints (setup slack ≥ 0). A fast cell (e.g., AND2X4 — 4× drive strength) is large and leaky; a slow cell (AND2X1) is small and low-power. The synthesizer selects drive strengths based on fanout and required arrival time.
-- Example: TSMC 7nm standard cell naming --
AND2X1 → 2-input AND, drive strength 1× (smallest, slowest)
AND2X2 → 2-input AND, drive strength 2×
INVX8 → Inverter, drive strength 8× (large, drives long wire)
DFFRHQX1 → D flip-flop, Reset, High-drive, Q output, strength 1×
MX2X2 → 2:1 MUX, drive strength 2×
The synthesizer only meets timing if you tell it what timing means. Synopsys Design Constraints (SDC) is the universal language for this. Without correct SDC, the tool produces a netlist that may fail at silicon.
# ── Clock definition ────────────────────────────────────── create_clock -name clk -period 2.0 [get_ports clk] # 2.0 ns period = 500 MHz. Synthesizer ensures all paths < 2ns. # ── Clock uncertainty (jitter + skew margin) ────────────── set_clock_uncertainty -setup 0.1 [get_clocks clk] set_clock_uncertainty -hold 0.05 [get_clocks clk] # ── Input/output delays ─────────────────────────────────── set_input_delay -max 0.4 -clock clk [get_ports {data_in valid}] set_output_delay -max 0.3 -clock clk [get_ports {data_out ready}] # Input delay: time spent in external FF before arriving at this block # Output delay: setup time of receiving FF after this block's output # ── Drive strength / load ───────────────────────────────── set_driving_cell -lib_cell INVX2 [get_ports {data_in}] set_load 0.05 [get_ports {data_out}] # pF # ── False paths (no timing requirement) ────────────────── set_false_path -from [get_ports rst_n] # async reset: no timing set_false_path -from [get_clocks clk_a] -to [get_clocks clk_b] # ── Multi-cycle paths ───────────────────────────────────── set_multicycle_path 2 -setup -from [get_cells slow_div/*] # Divider takes 2 clock cycles — relax setup by 1 cycle
set_input_delay and set_output_delay. Without them, the tool assumes input arrives at time 0 (too optimistic) and output has infinite time (no constraint). Real silicon fails because the actual external FF has setup requirements your netlist doesn't meet.Startpoint: reg_a (rising edge-triggered FF, clocked by clk) Endpoint: reg_b (rising edge-triggered FF, clocked by clk) Path Group: clk clock clk (rise edge) 0.00 0.00 clock network delay (propagated) 0.12 0.12 reg_a/CK (DFFX1) 0.00 0.12 r reg_a/Q (DFFX1) 0.18 0.30 r ← CK→Q delay U1/A (AND2X1) 0.00 0.30 r U1/Z (AND2X1) 0.09 0.39 r ← gate delay U2/A (OR2X1) 0.00 0.39 r U2/Z (OR2X1) 0.11 0.50 r reg_b/D (DFFX1) 0.00 0.50 r data arrival time 0.50 clock clk (rise edge) 2.00 2.00 clock uncertainty -0.10 1.90 reg_b/CK (DFFX1) 0.12 2.02 library setup time -0.08 1.94 data required time 1.94 slack (MET) 1.44 ← positive = timing met
Slack = Required time − Arrival time. Positive slack = timing met. Negative slack = violation. The critical path is the path with the worst (least positive or most negative) slack.
Combinational area: 4823.2 µm² Noncombinational area: 8104.6 µm² ← registers (FFs) Total cell area: 12927.8 µm² Net interconnect area: ~estimated post-route
| Optimization goal | What the tool does | Side effect |
|---|---|---|
| Meet timing (setup) | Use faster (larger) cells, add buffers, restructure logic | ↑ Area, ↑ Power |
| Reduce area | Use minimum-drive cells, share logic, reduce register count | ↑ Delay, may miss timing |
| Reduce power | Use low-Vt only where needed, add clock gating, reduce activity | ↑ Area (clock gate cells) |
| Reduce hold violations | Insert delay buffers on short paths | ↑ Area, ↑ Power |
Use compile_ultra (DC) or syn_generic; syn_map; syn_opt (Genus) with effort levels. Higher effort = longer runtime but better QoR (Quality of Results). For tapeout, always use high effort on critical paths.
set_clock_gating_style and ensure the enable pattern is recognized.set_dont_touch or preserve attributes.create_clock, set_input_delay, set_output_delay