Tutorial 14 · Verilog Series

Verilog RTL Design Patterns

Production RTL is built from a small set of reusable structural patterns — pipelines, handshakes, FIFOs, arbiters, synchronizers. Understanding these patterns lets you read industry netlists, design at higher abstraction, and avoid the class of bugs that only appear in real-world multi-module designs.

pipelinevalid-readysync FIFO round-robin arbiterclock enable 2-FF synchronizerskid buffer
PIPELINE — CUTTING THE CRITICAL PATH Stage 1 Combinational FF Stage 2 Combinational FF Stage 3 Combinational OUT clk → each FF clocks one stage per cycle Throughput: 1/cycle · Latency: N cycles · Fmax ↑ (shorter stages)

1. Pipeline Register Stages

Pipelining inserts flip-flops between logic stages to reduce the critical path. Each stage completes in one clock cycle; the whole computation takes N cycles but new data enters every cycle.

// 3-stage pipeline: multiply → accumulate → clamp
module mac_pipe (
  input         clk, rst_n,
  input  [7:0]  a, b,
  output [15:0] result
);
  // Stage 1: multiply
  reg [15:0] s1_product;
  always @(posedge clk)
    s1_product <= a * b;

  // Stage 2: accumulate
  reg [15:0] s2_accum;
  always @(posedge clk)
    s2_accum <= s1_product + 16'h100;

  // Stage 3: saturate / clamp to 8-bit
  reg [15:0] s3_out;
  always @(posedge clk)
    s3_out <= (s2_accum > 16'hFF) ? 16'hFF : s2_accum;

  assign result = s3_out;
endmodule
Pipeline validity: The first valid output appears after N cycles of latency. When stalling a pipeline, all stages must hold their data simultaneously. A common mistake is stalling only the last stage while earlier stages keep advancing — corrupting data in flight.

2. Valid-Ready Handshake

The valid-ready (AXI-style) handshake decouples producers from consumers. A transaction happens only when both valid and ready are high at the clock edge.

// Transfer rule: data moves when valid AND ready are both 1
always @(posedge clk) begin
  if (valid_in && ready_in)    // handshake on input
    data_buf <= data_in;
  if (valid_out && ready_out)  // handshake on output
    valid_out <= 0;
end

// Critical property: valid must NOT deassert unless handshake occurred
// (AXI rule: once valid is raised, hold until ready arrives)
always @(posedge clk or negedge rst_n) begin
  if (!rst_n) valid_out <= 0;
  else if (produce_data) valid_out <= 1;
  else if (valid_out && ready_out) valid_out <= 0;  // deassert AFTER handshake
end
SignalDirectionMeaning
validProducer → ConsumerProducer has valid data ready
readyConsumer → ProducerConsumer can accept data now
Transfervalid && ready both high at clock edge

3. Synchronous FIFO

A synchronous FIFO uses one clock domain, a RAM array, and pointer arithmetic. The extra MSB trick distinguishes full from empty when the pointers are equal.

module sync_fifo #(
  parameter WIDTH = 8,
  parameter DEPTH = 16
)(
  input               clk, rst_n,
  input               wr_en, rd_en,
  input  [WIDTH-1:0]  wr_data,
  output [WIDTH-1:0]  rd_data,
  output              full, empty
);
  localparam PTR_W = $clog2(DEPTH);

  reg [WIDTH-1:0]   mem [0:DEPTH-1];
  reg [PTR_W:0]     wr_ptr, rd_ptr;  // extra MSB bit

  assign full    = ((wr_ptr[PTR_W] != rd_ptr[PTR_W]) &&
                    (wr_ptr[PTR_W-1:0] == rd_ptr[PTR_W-1:0]));
  assign empty   = (wr_ptr == rd_ptr);
  assign rd_data = mem[rd_ptr[PTR_W-1:0]];

  always @(posedge clk or negedge rst_n) begin
    if (!rst_n) begin
      wr_ptr <= 0; rd_ptr <= 0;
    end else begin
      if (wr_en && !full) begin
        mem[wr_ptr[PTR_W-1:0]] <= wr_data;
        wr_ptr <= wr_ptr + 1;
      end
      if (rd_en && !empty)
        rd_ptr <= rd_ptr + 1;
    end
  end
endmodule
MSB trick: Pointers are (PTR_W+1) bits wide — one extra MSB. When both the MSB and lower bits are equal, the FIFO is empty. When only the lower bits match but MSBs differ, the FIFO is full. This elegantly handles pointer wrap-around.

4. Round-Robin Arbiter

A round-robin arbiter grants access to N requestors in rotating order. The last-served index is remembered; the next grant searches forward circularly for the next asserted request.

module rr_arbiter #(parameter N=4) (
  input        clk, rst_n,
  input  [N-1:0] req,
  output reg [N-1:0] grant
);
  reg [$clog2(N)-1:0] last;  // index of last granted
  integer i;

  always @(posedge clk or negedge rst_n) begin
    if (!rst_n) begin
      grant <= 0; last <= 0;
    end else begin
      grant <= 0;
      for (i=1; i<=N; i++) begin
        if (req[(last+i)%N] && (grant==0)) begin
          grant[(last+i)%N] <= 1;
          last              <= (last+i)%N;
        end
      end
    end
  end
endmodule

The for loop unrolls at synthesis time — it is not a loop in hardware. The grant == 0 guard ensures only one bit is set even if multiple requests arrive simultaneously.

5. Clock Enable Pattern

Never gate the clock with AND logic in RTL. Instead, use the flip-flop's clock enable input (or an if (en) guard before a non-blocking assignment).

// WRONG: clock gating in RTL (can cause glitches, timing violations)
wire gated_clk = clk & enable;  // DO NOT DO THIS
always @(posedge gated_clk) q <= d;

// CORRECT: clock enable — synthesis infers CE pin on the flip-flop
always @(posedge clk)
  if (enable) q <= d;             // q holds when enable=0

// Equivalent with explicit CE:  q <= enable ? d : q;
Never AND the clock in RTL. Clock gating belongs in the physical design step where the tool inserts integrated clock gating cells (ICG) with built-in latch-based hold-time protection. In RTL, use CE — the synthesizer will map it to an ICG cell automatically when power optimization is enabled.

6. 2-FF CDC Synchronizer

Any single-bit signal crossing a clock domain boundary must pass through a 2-FF synchronizer to tame metastability. For multi-bit signals, use Gray-coded counters or handshake protocols — never a plain 2-FF.

// 2-FF synchronizer — synthesize with DONT_TOUCH / no_opt
module sync_2ff #(parameter STAGES=2) (
  input  clk_dst, rst_n, d,
  output q
);
  reg [STAGES-1:0] chain;
  // synthesis attribute to prevent retiming of these FFs
  // (* KEEP = "TRUE", ASYNC_REG = "TRUE" *)

  always @(posedge clk_dst or negedge rst_n)
    if (!rst_n) chain <= 0;
    else        chain <= {chain[STAGES-2:0], d};

  assign q = chain[STAGES-1];
endmodule

// Edge detector on synchronized signal
module edge_detect (input clk, sig_sync, output rise, fall);
  reg sig_d;
  always @(posedge clk) sig_d <= sig_sync;
  assign rise = sig_sync & ~sig_d;
  assign fall = ~sig_sync & sig_d;
endmodule
For FPGA designs, mark synchronizer flip-flops with (* ASYNC_REG = "TRUE" *) (Xilinx) or altera_attribute (Intel) to prevent the tool from retiming or merging them. This ensures the two FFs are placed adjacently on chip to minimize re-metastabilization risk.

7. Skid Buffer

A skid buffer allows a handshake interface to pipeline its ready signal. Without it, ready must be combinationally derived from downstream, creating long timing paths. The skid buffer breaks this with a one-element elastic buffer.

module skid_buffer #(parameter W=8) (
  input        clk, rst_n,
  input        s_valid, m_ready,
  input  [W-1:0] s_data,
  output reg   s_ready, m_valid,
  output reg [W-1:0] m_data
);
  reg [W-1:0] buf_data;
  reg         buf_valid;

  always @(posedge clk or negedge rst_n) begin
    if (!rst_n) begin
      s_ready  <= 1; m_valid  <= 0; buf_valid <= 0;
    end else begin
      if (m_ready) begin
        if (buf_valid) begin               // drain buffer first
          m_data    <= buf_data;
          m_valid   <= 1;
          buf_valid <= 0;
          s_ready   <= 1;
        end else if (s_valid) begin       // pass through
          m_data  <= s_data;
          m_valid <= 1;
        end else m_valid <= 0;
      end else if (s_valid && s_ready) begin  // consumer stalled
        buf_data  <= s_data;
        buf_valid <= 1;
        s_ready   <= 0;                // stop accepting
      end
    end
  end
endmodule

8. Pattern Selection Guide

SituationPattern to use
Critical path too long for target FmaxPipeline — add register stages to cut the path
Producer and consumer at different ratesValid-ready handshake + skid buffer
Decoupling two blocks in the same clock domainSynchronous FIFO
Multiple masters sharing one slaveRound-robin arbiter
Conditional register updateClock enable (if/CE — never gate the clock)
Single-bit signal crossing clock domains2-FF synchronizer
Multi-bit bus crossing clock domainsAsync FIFO with Gray-coded pointers
Registered ready to break combinational pathSkid buffer