RTL Development — Complete Guide
RTL development is the process of describing hardware at the register transfer level — what data moves between which registers each clock cycle. This guide covers the full flow from specification to sign-off, with Verilog patterns and rules that apply on real chips.
The RTL Development Flow
RTL development is not just writing Verilog — it spans from a blank architecture spec to a signed-off gate netlist ready for physical design. Each step has specific tools and exit criteria.
| Phase | Goal | Exit Criteria |
|---|---|---|
| Microarch Spec | Define what to build, interfaces, pipeline, power plan | Spec reviewed and signed off by architect + DV |
| RTL Coding | Write synthesizable Verilog/SV matching the spec | 0 lint errors, no latch warnings, CDC clean |
| Simulation | Functional correctness — directed + constrained-random | 100% functional coverage, all assertions passing |
| Lint | Catch coding rule violations before synthesis | 0 severity-1 lint violations (SpyGlass clean) |
| CDC Analysis | Verify all clock crossings are properly synchronized | 0 CDC violations — no unsynchronized crossings |
| Synthesis | Map RTL to target library gate netlist | Meets timing at target frequency, area budget |
| STA | Verify setup/hold closure at all PVT corners | All paths positive slack, false paths documented |
Synthesizable Verilog Rules
✓ DO — Synthesizable
- always @(posedge clk) — flip-flop
- always @(*) — combinational logic
- Non-blocking <= in sequential blocks
- Blocking = in combinational blocks
- if/case with else/default (no latch)
- assign for continuous assignment
- for loops with constant bounds
- parameters and localparams
- Generate blocks (genvar)
- Synchronous or async reset
✗ DON'T — Simulation Only
- #10 delays inside design
- initial blocks in RTL modules
- $display, $monitor in design
- forever loops without clock edge
- Tri-state inside synthesized logic
- Real/float data types
- Integer loop bounds from variables
- force / release statements
- casex / casez (use unique case)
- Missing else → infers latch!
// ✓ GOOD: Proper sequential + combinational separation
module good_rtl #(parameter W=8)(
input wire clk, rst_n, en,
input wire [W-1:0] d,
output reg [W-1:0] q
);
// Sequential block: use <= always
always @(posedge clk or negedge rst_n) begin
if (!rst_n) q <= '0; // async reset — clear on negedge rst_n
else if (en) q <= d; // clock enable — power-friendly clock gating
end
endmodule
// ✗ BAD: Mixed blocking/non-blocking in sequential — DO NOT DO THIS
always @(posedge clk) begin
a = b + c; // blocking: a is evaluated immediately (like software)
d <= a; // non-blocking: d gets the OLD value of a, not the new one!
// Bug: simulation says d gets new a, synthesis gives different result
end
// ✓ GOOD: Combinational with full case coverage (no latch)
always @(*) begin
case (op)
2'b00: out = a + b;
2'b01: out = a - b;
2'b10: out = a & b;
default: out = '0; // MUST have default — otherwise latch inferred!
endcase
end
// ✗ BAD: Missing else → LATCH INFERRED (synthesis warning)
always @(*) begin
if (en) out = data; // if en=0, what is out? → latch holds old value
// Fix: add else out = '0;
end
// ✓ GOOD: FSM — 3-block style (state register + next-state + output)
typedef enum logic [1:0] {IDLE, BUSY, DONE} state_t;
state_t curr_state, next_state;
// Block 1: state register
always @(posedge clk or negedge rst_n)
if (!rst_n) curr_state <= IDLE;
else curr_state <= next_state;
// Block 2: next-state logic (combinational)
always @(*) begin
next_state = curr_state; // default: stay in current state
case (curr_state)
IDLE: if (start) next_state = BUSY;
BUSY: if (done_flag) next_state = DONE;
DONE: next_state = IDLE;
default: next_state = IDLE;
endcase
end
// Block 3: output logic (Moore — based on state only)
always @(*) begin
busy_out = (curr_state == BUSY);
valid_out = (curr_state == DONE);
end
Common RTL Bugs & How to Avoid Them
| Bug | Cause | How to Catch | Fix |
|---|---|---|---|
| Unintended Latch | Missing else/default in combinational always | Lint (SpyGlass LATCH warning) | Add else/default with a defined output |
| Blocking/NB Mix | Using = instead of <= in sequential block | Simulation/synthesis mismatch | Use <= exclusively in clocked blocks |
| Metastability | Async signal sampled without 2-FF sync | CDC tool (SpyGlass CDC) | Add 2-FF synchronizer or async FIFO |
| Reset missing | Register not in reset sensitivity list | Simulation (X-prop) | Add to reset condition in always block |
| Width mismatch | 8-bit = 16-bit silently truncates MSBs | Lint (WIDTH violation) | Explicit cast or zero-extend/sign-extend |
| Counter overflow | N-bit counter rolls to 0 unexpectedly | Simulation (directed test) | Add saturate logic or wider counter |
| Glitch on output | Combinational output changes mid-cycle | Timing simulation with SDF | Register outputs at block boundary |
Power Optimization in RTL Development
Power is a first-class concern in RTL — decisions made at RTL time account for 70% of dynamic power. Physical design can only do so much after the fact.
// ✓ Clock Gating — most impactful RTL power technique
// Instead of: always @(posedge clk) if (en) q <= d;
// Infer ICG (Integrated Clock Gate) explicitly:
module icg_example(
input wire clk, en,
input wire [7:0] d,
output reg [7:0] q
);
wire gated_clk;
// Synthesis tool infers ICG cell from this pattern
// ICG: latch enable on level, AND with clock
CLKGATE_CELL icg (.CLK(clk), .EN(en), .GCLK(gated_clk));
always @(posedge gated_clk) q <= d; // only toggles when en=1
endmodule
// ✓ Operand Isolation — prevent glitching through inactive datapaths
// Without isolation: even when unit is off, operands toggle the adder
// With isolation: clamp inputs to 0 when unit disabled
assign add_a_iso = unit_en ? a : '0;
assign add_b_iso = unit_en ? b : '0;
assign result = add_a_iso + add_b_iso; // no glitch power when unit_en=0
// ✓ One-Hot encoding for FSM — faster decode, lower glitch switching
// vs binary encoding (smaller state register, more decode logic)
// One-hot: only 1 bit changes per transition → lower switching activity
typedef enum logic [3:0] {
S_IDLE = 4'b0001,
S_READ = 4'b0010,
S_PROC = 4'b0100,
S_DONE = 4'b1000
} state_t; // synthesis attribute: (* fsm_encoding = "one_hot" *)
RTL Signoff Checklist
Check off each item before handing RTL to physical design.
RTL Design Topic Deep Dives
Frequently Asked Questions
Is RTL the same as hardware description language (HDL)?
RTL is an abstraction level; HDL is the language used to describe it. Verilog and VHDL are HDLs that support both RTL-level and gate-level descriptions. When engineers say "RTL" they typically mean synthesizable HDL code written at the register-transfer level — not gate-level netlists (which are also valid Verilog) and not behavioral models with delays.
What is the difference between RTL design and RTL verification?
RTL design (or RTL development) means writing the synthesizable Verilog that describes the hardware. RTL verification means checking that the RTL is functionally correct — writing testbenches, running simulations, writing assertions, and measuring coverage. On modern chips, verification effort is 70% of the total engineering time. Both roles require deep Verilog knowledge, but verification additionally requires SystemVerilog UVM/class-based verification skills.
How long does RTL development take for a real chip?
For a mid-sized chip (phone SoC or GPU block), RTL development for one major block typically takes 6–18 months from spec to synthesis sign-off. The full chip RTL (with 10–50 blocks) takes 2–4 years from architecture to tapeout. This is why semiconductor companies employ hundreds of RTL engineers per chip and why FPGA prototyping is critical — it lets software teams start development while RTL is still being verified.