Production RTL is built from a small set of reusable structural patterns — pipelines, handshakes, FIFOs, arbiters, synchronizers. Understanding these patterns lets you read industry netlists, design at higher abstraction, and avoid the class of bugs that only appear in real-world multi-module designs.
Pipelining inserts flip-flops between logic stages to reduce the critical path. Each stage completes in one clock cycle; the whole computation takes N cycles but new data enters every cycle.
// 3-stage pipeline: multiply → accumulate → clamp module mac_pipe ( input clk, rst_n, input [7:0] a, b, output [15:0] result ); // Stage 1: multiply reg [15:0] s1_product; always @(posedge clk) s1_product <= a * b; // Stage 2: accumulate reg [15:0] s2_accum; always @(posedge clk) s2_accum <= s1_product + 16'h100; // Stage 3: saturate / clamp to 8-bit reg [15:0] s3_out; always @(posedge clk) s3_out <= (s2_accum > 16'hFF) ? 16'hFF : s2_accum; assign result = s3_out; endmodule
The valid-ready (AXI-style) handshake decouples producers from consumers. A transaction happens only when both valid and ready are high at the clock edge.
// Transfer rule: data moves when valid AND ready are both 1 always @(posedge clk) begin if (valid_in && ready_in) // handshake on input data_buf <= data_in; if (valid_out && ready_out) // handshake on output valid_out <= 0; end // Critical property: valid must NOT deassert unless handshake occurred // (AXI rule: once valid is raised, hold until ready arrives) always @(posedge clk or negedge rst_n) begin if (!rst_n) valid_out <= 0; else if (produce_data) valid_out <= 1; else if (valid_out && ready_out) valid_out <= 0; // deassert AFTER handshake end
| Signal | Direction | Meaning |
|---|---|---|
valid | Producer → Consumer | Producer has valid data ready |
ready | Consumer → Producer | Consumer can accept data now |
| Transfer | — | valid && ready both high at clock edge |
A synchronous FIFO uses one clock domain, a RAM array, and pointer arithmetic. The extra MSB trick distinguishes full from empty when the pointers are equal.
module sync_fifo #( parameter WIDTH = 8, parameter DEPTH = 16 )( input clk, rst_n, input wr_en, rd_en, input [WIDTH-1:0] wr_data, output [WIDTH-1:0] rd_data, output full, empty ); localparam PTR_W = $clog2(DEPTH); reg [WIDTH-1:0] mem [0:DEPTH-1]; reg [PTR_W:0] wr_ptr, rd_ptr; // extra MSB bit assign full = ((wr_ptr[PTR_W] != rd_ptr[PTR_W]) && (wr_ptr[PTR_W-1:0] == rd_ptr[PTR_W-1:0])); assign empty = (wr_ptr == rd_ptr); assign rd_data = mem[rd_ptr[PTR_W-1:0]]; always @(posedge clk or negedge rst_n) begin if (!rst_n) begin wr_ptr <= 0; rd_ptr <= 0; end else begin if (wr_en && !full) begin mem[wr_ptr[PTR_W-1:0]] <= wr_data; wr_ptr <= wr_ptr + 1; end if (rd_en && !empty) rd_ptr <= rd_ptr + 1; end end endmodule
A round-robin arbiter grants access to N requestors in rotating order. The last-served index is remembered; the next grant searches forward circularly for the next asserted request.
module rr_arbiter #(parameter N=4) ( input clk, rst_n, input [N-1:0] req, output reg [N-1:0] grant ); reg [$clog2(N)-1:0] last; // index of last granted integer i; always @(posedge clk or negedge rst_n) begin if (!rst_n) begin grant <= 0; last <= 0; end else begin grant <= 0; for (i=1; i<=N; i++) begin if (req[(last+i)%N] && (grant==0)) begin grant[(last+i)%N] <= 1; last <= (last+i)%N; end end end end endmodule
The for loop unrolls at synthesis time — it is not a loop in hardware. The grant == 0 guard ensures only one bit is set even if multiple requests arrive simultaneously.
Never gate the clock with AND logic in RTL. Instead, use the flip-flop's clock enable input (or an if (en) guard before a non-blocking assignment).
// WRONG: clock gating in RTL (can cause glitches, timing violations) wire gated_clk = clk & enable; // DO NOT DO THIS always @(posedge gated_clk) q <= d; // CORRECT: clock enable — synthesis infers CE pin on the flip-flop always @(posedge clk) if (enable) q <= d; // q holds when enable=0 // Equivalent with explicit CE: q <= enable ? d : q;
Any single-bit signal crossing a clock domain boundary must pass through a 2-FF synchronizer to tame metastability. For multi-bit signals, use Gray-coded counters or handshake protocols — never a plain 2-FF.
// 2-FF synchronizer — synthesize with DONT_TOUCH / no_opt module sync_2ff #(parameter STAGES=2) ( input clk_dst, rst_n, d, output q ); reg [STAGES-1:0] chain; // synthesis attribute to prevent retiming of these FFs // (* KEEP = "TRUE", ASYNC_REG = "TRUE" *) always @(posedge clk_dst or negedge rst_n) if (!rst_n) chain <= 0; else chain <= {chain[STAGES-2:0], d}; assign q = chain[STAGES-1]; endmodule // Edge detector on synchronized signal module edge_detect (input clk, sig_sync, output rise, fall); reg sig_d; always @(posedge clk) sig_d <= sig_sync; assign rise = sig_sync & ~sig_d; assign fall = ~sig_sync & sig_d; endmodule
(* ASYNC_REG = "TRUE" *) (Xilinx) or altera_attribute (Intel) to prevent the tool from retiming or merging them. This ensures the two FFs are placed adjacently on chip to minimize re-metastabilization risk.A skid buffer allows a handshake interface to pipeline its ready signal. Without it, ready must be combinationally derived from downstream, creating long timing paths. The skid buffer breaks this with a one-element elastic buffer.
module skid_buffer #(parameter W=8) ( input clk, rst_n, input s_valid, m_ready, input [W-1:0] s_data, output reg s_ready, m_valid, output reg [W-1:0] m_data ); reg [W-1:0] buf_data; reg buf_valid; always @(posedge clk or negedge rst_n) begin if (!rst_n) begin s_ready <= 1; m_valid <= 0; buf_valid <= 0; end else begin if (m_ready) begin if (buf_valid) begin // drain buffer first m_data <= buf_data; m_valid <= 1; buf_valid <= 0; s_ready <= 1; end else if (s_valid) begin // pass through m_data <= s_data; m_valid <= 1; end else m_valid <= 0; end else if (s_valid && s_ready) begin // consumer stalled buf_data <= s_data; buf_valid <= 1; s_ready <= 0; // stop accepting end end end endmodule
| Situation | Pattern to use |
|---|---|
| Critical path too long for target Fmax | Pipeline — add register stages to cut the path |
| Producer and consumer at different rates | Valid-ready handshake + skid buffer |
| Decoupling two blocks in the same clock domain | Synchronous FIFO |
| Multiple masters sharing one slave | Round-robin arbiter |
| Conditional register update | Clock enable (if/CE — never gate the clock) |
| Single-bit signal crossing clock domains | 2-FF synchronizer |
| Multi-bit bus crossing clock domains | Async FIFO with Gray-coded pointers |
| Registered ready to break combinational path | Skid buffer |