Synchronous FIFO Design
Pointer Logic · Full/Empty · Verilog
The fundamental data buffer behind every bus protocol, memory controller, and streaming interface — learn to build one from scratch with correct full and empty detection.
1. What is a FIFO?
A FIFO (First-In First-Out) buffer is a queue: the first word written is the first word read out. It decouples a producer that writes data from a consumer that reads it, absorbing mismatches in rate or timing.
| Application | Why a FIFO? |
|---|---|
| AXI / APB bus bridges | Decouple fast master from slow peripheral |
| UART TX buffer | CPU writes bytes; serializer drains them at baud rate |
| DDR write buffer | Burst writes absorb latency of memory controller |
| Video line buffer | Pixel pipeline: one line written, next line read |
| CDC (dual-clock) | Async FIFO passes data between clock domains safely |
2. Synchronous FIFO Architecture
Three components: a register array (the storage), a write pointer (wr_ptr), and a read pointer (rd_ptr). On each clock:
| Signal | Direction | Description |
|---|---|---|
| clk | Input | Single clock for all operations |
| rst_n | Input | Active-low synchronous reset — zeros both pointers |
| wr_en | Input | Write enable — write wr_data when asserted and not full |
| wr_data[W-1:0] | Input | Data to write (W = WIDTH parameter) |
| rd_en | Input | Read enable — advance rd_ptr when asserted and not empty |
| rd_data[W-1:0] | Output | Data at read pointer (registered or combinational) |
| full | Output | Asserted when no more writes can be accepted |
| empty | Output | Asserted when no data is available to read |
| count[$clog2(DEPTH):0] | Output | Number of valid entries currently in the FIFO |
!full and every read with !empty. Violating this causes pointer corruption — data is lost or corrupted silently.3. Pointer Logic & Full/Empty Detection
Both pointers start at zero. Each write increments wr_ptr; each read increments rd_ptr. Both wrap modulo DEPTH. The count is simply wr_ptr − rd_ptr.
Naive approach — and the problem
With N-bit pointers that wrap at DEPTH (a power of 2):
// Naive — BROKEN when DEPTH is not a power of 2, or ambiguous
assign empty = (wr_ptr == rd_ptr);
assign full = (wr_ptr == rd_ptr); // same condition! ambiguous
When both pointers are equal the FIFO could be completely empty or completely full — we've gone all the way around. A simple == comparison cannot tell these cases apart.
One common fix — separate count register
always @(posedge clk) begin
if (!rst_n) count <= 0;
else if (wr && !rd) count <= count + 1;
else if (rd && !wr) count <= count - 1;
end
assign full = (count == DEPTH);
assign empty = (count == 0);
Works but requires an extra adder. The MSB trick below gives the same result using only pointer comparison — no counter logic at all.
4. The MSB Pointer Trick
Full and empty rules
| Condition | Expression | Meaning |
|---|---|---|
| Empty | wr_ptr == rd_ptr | All bits equal — same lap, same position |
| Full | wr_ptr[MSB] != rd_ptr[MSB] | MSBs differ (write lapped read once) but address bits are equal |
| Count | wr_ptr - rd_ptr | Subtraction works correctly across the wrap due to two's complement |
Example: DEPTH=4, pointer width=3 bits (2 address bits + 1 MSB). Pointers 3'b100 and 3'b000 — MSBs differ (1 vs 0), address bits equal (00 vs 00) → FULL.
Walking example — DEPTH=4
| Action | wr_ptr (3-bit) | rd_ptr (3-bit) | Count | Status |
|---|---|---|---|---|
| Reset | 000 | 000 | 0 | EMPTY |
| Write A | 001 | 000 | 1 | — |
| Write B | 010 | 000 | 2 | — |
| Write C | 011 | 000 | 3 | — |
| Write D | 100 | 000 | 4 | FULL — MSBs 1≠0, addr 00=00 |
| Read A | 100 | 001 | 3 | — |
| Read B,C,D | 100 | 100 | 0 | EMPTY — all bits equal |
5. Parameterized Verilog Implementation
module sync_fifo #(
parameter WIDTH = 8,
parameter DEPTH = 16 // must be a power of 2
)(
input clk,
input rst_n,
input wr_en,
input [WIDTH-1:0] wr_data,
input rd_en,
output reg [WIDTH-1:0] rd_data,
output full,
output empty,
output [$clog2(DEPTH):0] count
);
// ── Storage ──────────────────────────────────────────────────
reg [WIDTH-1:0] mem [0:DEPTH-1];
// ── Pointers: clog2(DEPTH)+1 bits (MSB = lap bit) ─────────
reg [$clog2(DEPTH):0] wr_ptr, rd_ptr;
// ── Write ─────────────────────────────────────────────────────
always @(posedge clk) begin
if (!rst_n)
wr_ptr <= 0;
else if (wr_en && !full) begin
mem[wr_ptr[$clog2(DEPTH)-1:0]] <= wr_data;
wr_ptr <= wr_ptr + 1;
end
end
// ── Read ──────────────────────────────────────────────────────
always @(posedge clk) begin
if (!rst_n)
rd_ptr <= 0;
else if (rd_en && !empty) begin
rd_data <= mem[rd_ptr[$clog2(DEPTH)-1:0]];
rd_ptr <= rd_ptr + 1;
end
end
// ── Status flags ──────────────────────────────────────────────
localparam ADDR_W = $clog2(DEPTH);
assign empty = (wr_ptr == rd_ptr);
assign full = (wr_ptr[ADDR_W] != rd_ptr[ADDR_W]) &&
(wr_ptr[ADDR_W-1:0] == rd_ptr[ADDR_W-1:0]);
assign count = wr_ptr - rd_ptr;
endmodule
Registered vs combinational read data
The implementation above registers rd_data — it appears one cycle after rd_en. For fall-through / lookahead behaviour (data valid the same cycle rd_en is asserted):
// Combinational (fall-through) read — replace the rd always block:
assign rd_data = mem[rd_ptr[$clog2(DEPTH)-1:0]];
always @(posedge clk) begin
if (!rst_n) rd_ptr <= 0;
else if (rd_en && !empty) rd_ptr <= rd_ptr + 1;
end
Trade-off: combinational read creates a longer timing path from mem through rd_ptr mux to the output. Registered read adds one cycle latency but is easier to meet timing.
6. Self-Checking Testbench
module tb_sync_fifo;
parameter WIDTH = 8;
parameter DEPTH = 8;
reg clk, rst_n, wr_en, rd_en;
reg [WIDTH-1:0] wr_data;
wire [WIDTH-1:0] rd_data;
wire full, empty;
wire [$clog2(DEPTH):0] count;
sync_fifo #(.WIDTH(WIDTH), .DEPTH(DEPTH)) dut (.*);
initial clk = 0;
always #5 clk = ~clk;
integer i; reg [WIDTH-1:0] exp_q [0:DEPTH-1]; integer head, tail, qcnt;
task automatic fifo_write(input [WIDTH-1:0] d);
@(posedge clk); #1;
if (full) begin $display("ERROR: write to full FIFO"); $finish; end
wr_en = 1; wr_data = d;
exp_q[tail % DEPTH] = d; tail++; qcnt++;
@(posedge clk); #1; wr_en = 0;
endtask
task automatic fifo_read(input [WIDTH-1:0] expected);
@(posedge clk); #1;
if (empty) begin $display("ERROR: read from empty FIFO"); $finish; end
rd_en = 1;
@(posedge clk); #1; rd_en = 0;
if (rd_data !== expected)
$display("FAIL: got %0h, expected %0h", rd_data, expected);
else
$display("PASS: read %0h", rd_data);
endtask
initial begin
$dumpfile("wave.vcd"); $dumpvars(0, tb_sync_fifo);
{clk,rst_n,wr_en,rd_en,wr_data} = 0;
repeat(2) @(posedge clk);
rst_n = 1;
// Fill to full
for (i=0; i<DEPTH; i++) fifo_write(i * 8'h11);
$display("full=%b (expect 1)", full);
// Drain all — check order
for (i=0; i<DEPTH; i++) fifo_read(i * 8'h11);
$display("empty=%b (expect 1)", empty);
// Simultaneous read+write
fifo_write(8'hAB); fifo_write(8'hCD);
@(posedge clk); #1;
wr_en=1; rd_en=1; wr_data=8'hEF;
@(posedge clk); #1; wr_en=0; rd_en=0;
fifo_read(8'hCD); fifo_read(8'hEF);
$display("All tests done."); $finish;
end
endmodule
Simulate with Icarus Verilog
iverilog -o fifo_sim sync_fifo.v tb_sync_fifo.v
vvp fifo_sim
gtkwave wave.vcd
7. Timing & Waveform Behaviour
| Cycle | wr_en | wr_data | rd_en | rd_data | count | Flags |
|---|---|---|---|---|---|---|
| 0 | 0 | — | 0 | — | 0 | empty |
| 1 | 1 | A | 0 | — | 1 | — |
| 2 | 1 | B | 0 | — | 2 | — |
| 3 | 0 | — | 1 | A | 1 | — |
| 4 | 1 | C | 1 | B | 1 | simultaneous read+write |
| 5 | 0 | — | 1 | C | 0 | empty after this cycle |
Note: rd_data reflects the registered output — it is valid on the cycle after rd_en. Row 3 shows data A appearing the cycle after the read at cycle 3 (using registered read path).
rd_data is one cycle late. If your consumer needs fall-through behaviour, switch to the combinational read variant shown in Section 5.8. FIFO Depth Calculation
The minimum FIFO depth needed to prevent overflow during a burst:
| Scenario | Fill | Drain | Burst | Min Depth |
|---|---|---|---|---|
| USB FS → slow SPI | 12 Mb/s | 1 Mb/s | 64 B | 59 bytes |
| DDR burst to APB | 3200 MT/s | 200 MT/s | 16 | 15 entries |
| UART TX (CPU writes) | CPU speed | baud rate | 256 B | ~256 bytes |
Always round up to the next power of 2 to use the MSB trick. Add 20–50% margin for real designs.
9. Variants & Extensions
| Variant | Key change | Use case |
|---|---|---|
| Asynchronous (dual-clock) FIFO | Gray-code pointers + 2-FF sync per bit | Crossing clock domains (see CDC page) |
| Show-ahead / Fall-through FIFO | Combinational rd_data | Zero-latency consumer interfaces |
| FWFT FIFO (First Word Fall-Through) | Pre-read first word after write | AXI-Stream source |
| Almost-full / Almost-empty | Extra threshold comparators on count | Flow control with lookahead |
| Programmable depth | Run-time DEPTH register | DMA channel rings |
| ECC-protected FIFO | SECDED on each word in mem | Safety-critical / radiation hardened |
| Skid buffer | Depth=2, valid-ready handshake wrapper | AXI-Stream pipeline stage (see RTL patterns) |