RTL Design Fundamentals

Blocking vs Non-Blocking
Assignments in Verilog

The single most important distinction in Verilog RTL. Blocking (=) executes immediately and sequentially — like software. Non-blocking (<=) evaluates now but updates all registers simultaneously at the end of the time step — like real hardware. Using the wrong one in the wrong context causes simulation mismatches, race conditions, and broken synthesis.

Use = in always @(*) — combinational
Use <= in always @(posedge clk) — sequential
Never mix both in the same always block

The Two Golden Rules — Memorize These

Golden Rules for Synthesis-Safe RTL

  • Combinational logic (always @(*) or always_comb) → use blocking (=). Sequential order of statements correctly computes intermediate values.
  • Sequential / clocked logic (always @(posedge clk) or always_ff) → use non-blocking (<=). All flip-flops sample old values and update simultaneously.
  • Never mix = and <= inside the same always block. Ever.
  • Never use non-blocking assignments in continuous assign statements or functions.

What Each Assignment Type Actually Does

Blocking  =

Executes Immediately, In Order

When the simulator reaches a blocking assignment, it evaluates the RHS and immediately updates the LHS variable before executing the next statement. Subsequent statements in the same always block see the updated value. Behaviour is identical to a software programming language assignment.

Non-Blocking  <=

Evaluates Now, Updates Later

When the simulator reaches a non-blocking assignment, it evaluates the RHS immediately using current signal values, then schedules the LHS update for the NBA (Non-Blocking Assignment) region — after all active events at this time step are resolved. All non-blocking updates take effect simultaneously.

Verilog Time Step

Active Region → NBA Region

A Verilog simulation time step has two key phases: the Active region (where blocking assignments execute and NBA RHS expressions are evaluated) and the NBA region (where all scheduled non-blocking LHS updates are applied). This two-phase model is what makes <= correctly model synchronous hardware.

The Verilog Event Scheduler

Understanding why non-blocking works requires seeing how the simulator processes events. Both RHS expressions are evaluated in the Active region; only the LHS update is deferred.

One Verilog simulation time step (e.g., rising edge of CLK) ACTIVE REGION Evaluate all blocking RHS a = b + 1; → b+1 computed, a updated NOW Evaluate non-blocking RHS (schedule LHS) q <= d; → d read NOW, q update SCHEDULED Continuous assign / gate evaluation Propagate through combinational logic Repeat until no more active events all done NBA REGION (Non-Blocking Assignment Update) Apply all scheduled LHS updates ALL flip-flop outputs update simultaneously → This is what makes <= model real hardware New values may trigger more Active events (e.g., always @(*) sees updated flop outputs) Cycle repeats until simulation is stable then advance to next time step

Fig 1 — The Verilog simulation time step. Blocking (=) assignments complete in the Active region. Non-blocking (<=) RHS is read in the Active region but LHS updates happen in the NBA region — making all registers appear to update simultaneously, just like real flip-flops.

The 4-Stage Shift Register — Where Everyone Goes Wrong

This is the most famous Verilog example for demonstrating the difference. The goal is a 4-stage shift register: on each clock edge, data propagates one stage forward.

WRONG — Blocking = in clocked block
// WRONG: blocking in clocked always
always @(posedge clk) begin
  b = a;   // b updated IMMEDIATELY
  c = b;   // sees NEW b, not old b!
  d = c;   // sees NEW c!
  e = d;   // sees NEW d!
end

// Result: ALL stages get value of 'a'
// in same clock cycle — it's just
// one register, not a shift register!
CORRECT — Non-Blocking <= in clocked block
// CORRECT: non-blocking in clocked always
always @(posedge clk) begin
  b <= a;   // RHS read: old 'a'
  c <= b;   // RHS read: old 'b'
  d <= c;   // RHS read: old 'c'
  e <= d;   // RHS read: old 'd'
end

// Result: all RHS read SIMULTANEOUSLY
// then all LHS updated SIMULTANEOUSLY
// → correct 4-stage shift register ✓

Why the wrong version fails: With blocking assignments, b = a executes completely before c = b. So when c = b runs, b already holds the new value of a. Every stage propagates the same value through in one clock cycle — you get a 1-stage register, not a 4-stage one. In simulation AND in synthesis this is wrong.

CLK a (in) b (<=) c (<=) d (<=) a=1 b=1 (1 clk delay) c=1 (2 clk delay) d=1 (3 clk delay) Cycle 0 Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 ↑ posedge

Fig 2 — Correct 4-stage shift register waveform using non-blocking (<=). Each stage is delayed by exactly one clock cycle. With blocking (=) all four stages would update to the same value of 'a' in the same clock cycle.

The Right Tool for the Right Job

Using blocking assignments correctly in combinational logic — order of statements matters for computed intermediates:
COMBINATIONAL — use blocking (=)
always @(*) begin                 // or always_comb
  // Intermediate values: order matters, = is correct
  sum    = a + b;
  carry  = (sum > 8'hFF);       // uses updated sum ✓
  result = carry ? sum[7:0] : sum; // uses updated carry ✓
end

// Priority MUX — correct with blocking:
always @(*) begin
  out = 8'h00;           // default (prevents latch!)
  if      (sel[2]) out = c;
  else if (sel[1]) out = b;
  else if (sel[0]) out = a;
end
Correct D flip-flop and register designs — always non-blocking in clocked blocks:
SEQUENTIAL — use non-blocking (<=)
// D flip-flop with async reset
always @(posedge clk or negedge rst_n) begin
  if (!rst_n) q <= 1'b0;
  else        q <= d;
end

// Multi-bit register with enable
always @(posedge clk) begin
  if (rst)      data_r <= 8'h00;
  else if (en) data_r <= data_in;
  // no else needed — hold is implied for a register
end

// State machine — always <= for state register
always @(posedge clk) begin
  if (rst) state <= IDLE;
  else     state <= next_state; // next_state from comb block
end

Pitfalls That Break Simulation and Synthesis

ANTI-PATTERN — Mixing = and <= in same block
// DANGEROUS: mixing blocking and non-blocking
always @(posedge clk) begin
  temp =  a + b;   // blocking — executes first
  out  <= temp;   // non-blocking — but which 'temp'?
end
// Result: simulator behaviour is implementation-
// defined. Different simulators may give different
// results. NEVER do this.

// ALSO WRONG: non-blocking in combinational block
always @(*) begin
  y <= a & b;    // <= in combinational = latch + race
end            // Synthesis tool will likely warn/error
ANTI-PATTERN — Race condition between two always blocks
// Two always blocks both write 'x' with blocking
always @(posedge clk) x = a;  // Block 1
always @(posedge clk) x = b;  // Block 2

// The final value of 'x' depends on which block
// the simulator executes last — undefined by IEEE!
// With non-blocking, the LAST assignment wins
// deterministically (defined by IEEE 1364).

// NON-BLOCKING version is at least deterministic:
always @(posedge clk) x <= a; // Block 1 schedules
always @(posedge clk) x <= b; // Block 2 schedules
// Last-scheduled update wins — still bad practice
// but at least reproducible across simulators

What Synthesis Tools See

ContextAssignment UsedSynthesizes ToCorrect?
always @(*)= blockingCombinational gates (AND, OR, MUX…)✓ Correct
always @(posedge clk)<= non-blockingD flip-flops + any required combo logic✓ Correct
always @(posedge clk)= blockingTool-dependent — may synthesize incorrectly or warn✗ Wrong
always @(*)<= non-blockingLatch or error — tools typically warn✗ Wrong
always @(*), no default= blocking (incomplete if/case)Inferred latch (unintentional)⚠ Latch

Synthesis tools are forgiving — but don't rely on it. Modern synthesis tools (Synopsys DC, Cadence Genus, Vivado) will often produce the correct netlist even with blocking assignments in clocked blocks — but the simulation behaviour during RTL sim will differ from gate-level sim, causing sign-off mismatches. Write RTL that is correct in simulation, not just in synthesis.

Side-by-Side Comparison

PropertyBlocking (=)Non-Blocking (<=)
When LHS updatesImmediately, before next statementEnd of time step (NBA region), simultaneously
RHS evaluatedImmediatelyImmediately (but LHS deferred)
ModelsProcedural software variableHardware flip-flop / register
Use in always @(*)✓ Correct✗ Avoid
Use in always @(posedge clk)✗ Avoid✓ Correct
Mix both in same block✗ Never — race conditions and undefined behavior
Intermediate variablesWorks correctly (sequential)Must be careful — old values used
Allowed in function✓ Yes✗ No

Frequently Asked Questions

The = (blocking) assignment executes immediately and sequentially — the LHS is updated before the next statement in the always block runs, just like a variable assignment in C or Python.

The <= (non-blocking) assignment evaluates the RHS immediately but defers the LHS update to the NBA (Non-Blocking Assignment) region at the end of the current simulation time step. All non-blocking assignments in all always blocks at the same time step take effect simultaneously — correctly modeling how all flip-flops in real hardware clock their outputs at the same edge.

With blocking assignments in a clocked always block, each statement executes before the next. So b = a updates b immediately, then c = b reads the already-updated b. The result is that all stages get the value of the first input (a) in the same clock cycle — you have a 1-stage register that copies to everything at once, not a shift register.

Fix: replace every = with <= in the clocked always block. Non-blocking reads all RHS values simultaneously (old values) then writes all LHS simultaneously — giving you one clock cycle of propagation per stage.

No. The always block type determines the rule, not what's inside it. If you're inside always @(posedge clk), use <= everywhere inside it — in if-else branches, case statements, and loops. The only exception would be local temporary variables that you use as intermediate calculations (which some styles allow with blocking), but even then mixing is risky and most coding guidelines forbid it entirely.

A race condition occurs when two always blocks both read and write the same signal using blocking assignments at the same simulation time. The result depends on which block the simulator happens to evaluate first — which is not guaranteed by the IEEE standard. Different simulators, or even different runs of the same simulator, can produce different results.

Non-blocking assignments prevent this for registers: all RHS expressions are evaluated in the Active region (reading old values), and all LHS updates happen in the NBA region simultaneously. No block can "see" another block's non-blocking update during the Active region — eliminating the ordering dependency entirely.

Some experienced RTL engineers use blocking for local temporary variables inside clocked blocks to avoid declaring separate always @(*) blocks. For example: temp = a + b; result <= temp; — here temp is a local variable only used within this block, so the blocking update is safe. However, most coding guidelines (including ARM and Intel RTL guidelines) prohibit mixing to avoid confusion. The safest rule for beginners and professionals alike: never mix = and <= in the same always block.

Both. Synthesis tools infer flip-flops based on the structure of clocked always blocks — most will produce the correct netlist even if you mistakenly use blocking. However, RTL simulation will show the wrong behavior, meaning your testbench results are invalid. When gate-level simulation (post-synthesis) is run, the netlist will behave differently from your RTL simulation — this mismatch is called a simulation-synthesis mismatch and is a serious sign-off issue in ASIC design. Using non-blocking correctly ensures RTL sim, gate-level sim, and silicon all behave identically.