What is a pipeline in Verilog RTL design?

A pipeline inserts registers between combinational logic stages to cut the critical path, allowing a higher clock frequency. Each stage processes one data item per clock cycle and the throughput is one result per cycle after an initial latency equal to the number of stages.

What is the valid-ready handshake in Verilog?

The valid-ready handshake (AXI-style) uses two signals: valid (producer has data) and ready (consumer can accept). A transfer occurs only when both are high at the same clock edge. This allows either side to stall without losing data.

How do you build a synchronous FIFO in Verilog?

A synchronous FIFO uses a single clock, a dual-port RAM (or register array), a write pointer, a read pointer, and full/empty flags. Full when write_ptr+1 == read_ptr; empty when write_ptr == read_ptr. The flag logic uses an extra bit in the pointers to distinguish full from empty.

What is a round-robin arbiter in Verilog?

A round-robin arbiter grants access to a shared resource among N requestors in a rotating order. The last granted index is stored; the next grant goes to the next requesting input in circular order. This prevents starvation of any requestor.

What is clock enable and why use it instead of gating the clock?

Clock enable (CE) passes an enable signal to the flip-flop's CE pin rather than AND-ing the clock signal. Clock gating can cause glitches on the clock net and setup/hold violations. Using CE is safe, synthesis-friendly, and is the standard method for conditionally updating registers.

What is a 2-FF synchronizer for clock domain crossing?

A 2-FF synchronizer passes a signal through two flip-flops in the destination clock domain before use. The first FF may metastabilize, but the second FF samples it after one full destination clock period, giving it time to resolve. This dramatically reduces the probability of metastability propagation.

What is the difference between a Mealy and Moore output in RTL?

A Moore output is registered (changes only on clock edge) so it is glitch-free but has one cycle of latency. A Mealy output is combinational (can change within a cycle in response to input) so it is faster but may glitch if the input is not synchronized. Register Mealy outputs when driving other FSMs or crossing to other modules.

Tutorial 14 · Verilog Series

Verilog RTL Design Patterns

Production RTL is built from a small set of reusable structural patterns — pipelines, handshakes, FIFOs, arbiters, synchronizers. Understanding these patterns lets you read industry netlists, design at higher abstraction, and avoid the class of bugs that only appear in real-world multi-module designs.

pipelinevalid-readysync FIFO round-robin arbiterclock enable 2-FF synchronizerskid buffer

1. Pipeline Register Stages

Pipelining inserts flip-flops between logic stages to reduce the critical path. Each stage completes in one clock cycle; the whole computation takes N cycles but new data enters every cycle.

// 3-stage pipeline: multiply → accumulate → clamp
module mac_pipe (
  input         clk, rst_n,
  input  [7:0]  a, b,
  output [15:0] result
);
  // Stage 1: multiply
  reg [15:0] s1_product;
  always @(posedge clk)
    s1_product <= a * b;

  // Stage 2: accumulate
  reg [15:0] s2_accum;
  always @(posedge clk)
    s2_accum <= s1_product + 16'h100;

  // Stage 3: saturate / clamp to 8-bit
  reg [15:0] s3_out;
  always @(posedge clk)
    s3_out <= (s2_accum > 16'hFF) ? 16'hFF : s2_accum;

  assign result = s3_out;
endmodule

Pipeline validity: The first valid output appears after N cycles of latency. When stalling a pipeline, all stages must hold their data simultaneously. A common mistake is stalling only the last stage while earlier stages keep advancing — corrupting data in flight.

2. Valid-Ready Handshake

The valid-ready (AXI-style) handshake decouples producers from consumers. A transaction happens only when both valid and ready are high at the clock edge.

// Transfer rule: data moves when valid AND ready are both 1
always @(posedge clk) begin
  if (valid_in && ready_in)    // handshake on input
    data_buf <= data_in;
  if (valid_out && ready_out)  // handshake on output
    valid_out <= 0;
end

// Critical property: valid must NOT deassert unless handshake occurred
// (AXI rule: once valid is raised, hold until ready arrives)
always @(posedge clk or negedge rst_n) begin
  if (!rst_n) valid_out <= 0;
  else if (produce_data) valid_out <= 1;
  else if (valid_out && ready_out) valid_out <= 0;  // deassert AFTER handshake
end

Signal	Direction	Meaning
`valid`	Producer → Consumer	Producer has valid data ready
`ready`	Consumer → Producer	Consumer can accept data now
Transfer	—	`valid && ready` both high at clock edge

3. Synchronous FIFO

A synchronous FIFO uses one clock domain, a RAM array, and pointer arithmetic. The extra MSB trick distinguishes full from empty when the pointers are equal.

module sync_fifo #(
  parameter WIDTH = 8,
  parameter DEPTH = 16
)(
  input               clk, rst_n,
  input               wr_en, rd_en,
  input  [WIDTH-1:0]  wr_data,
  output [WIDTH-1:0]  rd_data,
  output              full, empty
);
  localparam PTR_W = $clog2(DEPTH);

  reg [WIDTH-1:0]   mem [0:DEPTH-1];
  reg [PTR_W:0]     wr_ptr, rd_ptr;  // extra MSB bit

  assign full    = ((wr_ptr[PTR_W] != rd_ptr[PTR_W]) &&
                    (wr_ptr[PTR_W-1:0] == rd_ptr[PTR_W-1:0]));
  assign empty   = (wr_ptr == rd_ptr);
  assign rd_data = mem[rd_ptr[PTR_W-1:0]];

  always @(posedge clk or negedge rst_n) begin
    if (!rst_n) begin
      wr_ptr <= 0; rd_ptr <= 0;
    end else begin
      if (wr_en && !full) begin
        mem[wr_ptr[PTR_W-1:0]] <= wr_data;
        wr_ptr <= wr_ptr + 1;
      end
      if (rd_en && !empty)
        rd_ptr <= rd_ptr + 1;
    end
  end
endmodule

MSB trick: Pointers are (PTR_W+1) bits wide — one extra MSB. When both the MSB and lower bits are equal, the FIFO is empty. When only the lower bits match but MSBs differ, the FIFO is full. This elegantly handles pointer wrap-around.

4. Round-Robin Arbiter

A round-robin arbiter grants access to N requestors in rotating order. The last-served index is remembered; the next grant searches forward circularly for the next asserted request.

module rr_arbiter #(parameter N=4) (
  input        clk, rst_n,
  input  [N-1:0] req,
  output reg [N-1:0] grant
);
  reg [$clog2(N)-1:0] last;  // index of last granted
  integer i;

  always @(posedge clk or negedge rst_n) begin
    if (!rst_n) begin
      grant <= 0; last <= 0;
    end else begin
      grant <= 0;
      for (i=1; i<=N; i++) begin
        if (req[(last+i)%N] && (grant==0)) begin
          grant[(last+i)%N] <= 1;
          last              <= (last+i)%N;
        end
      end
    end
  end
endmodule

The for loop unrolls at synthesis time — it is not a loop in hardware. The grant == 0 guard ensures only one bit is set even if multiple requests arrive simultaneously.

5. Clock Enable Pattern

Never gate the clock with AND logic in RTL. Instead, use the flip-flop's clock enable input (or an if (en) guard before a non-blocking assignment).

// WRONG: clock gating in RTL (can cause glitches, timing violations)
wire gated_clk = clk & enable;  // DO NOT DO THIS
always @(posedge gated_clk) q <= d;

// CORRECT: clock enable — synthesis infers CE pin on the flip-flop
always @(posedge clk)
  if (enable) q <= d;             // q holds when enable=0

// Equivalent with explicit CE:  q <= enable ? d : q;

Never AND the clock in RTL. Clock gating belongs in the physical design step where the tool inserts integrated clock gating cells (ICG) with built-in latch-based hold-time protection. In RTL, use CE — the synthesizer will map it to an ICG cell automatically when power optimization is enabled.

6. 2-FF CDC Synchronizer

Any single-bit signal crossing a clock domain boundary must pass through a 2-FF synchronizer to tame metastability. For multi-bit signals, use Gray-coded counters or handshake protocols — never a plain 2-FF.

// 2-FF synchronizer — synthesize with DONT_TOUCH / no_opt
module sync_2ff #(parameter STAGES=2) (
  input  clk_dst, rst_n, d,
  output q
);
  reg [STAGES-1:0] chain;
  // synthesis attribute to prevent retiming of these FFs
  // (* KEEP = "TRUE", ASYNC_REG = "TRUE" *)

  always @(posedge clk_dst or negedge rst_n)
    if (!rst_n) chain <= 0;
    else        chain <= {chain[STAGES-2:0], d};

  assign q = chain[STAGES-1];
endmodule

// Edge detector on synchronized signal
module edge_detect (input clk, sig_sync, output rise, fall);
  reg sig_d;
  always @(posedge clk) sig_d <= sig_sync;
  assign rise = sig_sync & ~sig_d;
  assign fall = ~sig_sync & sig_d;
endmodule

For FPGA designs, mark synchronizer flip-flops with (* ASYNC_REG = "TRUE" *) (Xilinx) or altera_attribute (Intel) to prevent the tool from retiming or merging them. This ensures the two FFs are placed adjacently on chip to minimize re-metastabilization risk.

7. Skid Buffer

A skid buffer allows a handshake interface to pipeline its ready signal. Without it, ready must be combinationally derived from downstream, creating long timing paths. The skid buffer breaks this with a one-element elastic buffer.

module skid_buffer #(parameter W=8) (
  input        clk, rst_n,
  input        s_valid, m_ready,
  input  [W-1:0] s_data,
  output reg   s_ready, m_valid,
  output reg [W-1:0] m_data
);
  reg [W-1:0] buf_data;
  reg         buf_valid;

  always @(posedge clk or negedge rst_n) begin
    if (!rst_n) begin
      s_ready  <= 1; m_valid  <= 0; buf_valid <= 0;
    end else begin
      if (m_ready) begin
        if (buf_valid) begin               // drain buffer first
          m_data    <= buf_data;
          m_valid   <= 1;
          buf_valid <= 0;
          s_ready   <= 1;
        end else if (s_valid) begin       // pass through
          m_data  <= s_data;
          m_valid <= 1;
        end else m_valid <= 0;
      end else if (s_valid && s_ready) begin  // consumer stalled
        buf_data  <= s_data;
        buf_valid <= 1;
        s_ready   <= 0;                // stop accepting
      end
    end
  end
endmodule

8. Pattern Selection Guide

Situation	Pattern to use
Critical path too long for target Fmax	Pipeline — add register stages to cut the path
Producer and consumer at different rates	Valid-ready handshake + skid buffer
Decoupling two blocks in the same clock domain	Synchronous FIFO
Multiple masters sharing one slave	Round-robin arbiter
Conditional register update	Clock enable (if/CE — never gate the clock)
Single-bit signal crossing clock domains	2-FF synchronizer
Multi-bit bus crossing clock domains	Async FIFO with Gray-coded pointers
Registered ready to break combinational path	Skid buffer

← Previous

Finite State Machines

Capstone Project: UART + FIFO