FIFO Design

FIFO Depth
Calculation

How to size a FIFO correctly — burst analysis, synchronous vs. asynchronous formulas, and why a depth that looks sufficient can still cause overflow under worst-case traffic.

1. Why FIFO Depth Matters

A FIFO (First-In First-Out) buffer decouples a producer from a consumer that operate at different rates, or in different clock domains. The single most critical parameter at design time is its depth — the number of words the buffer can hold simultaneously.

Undersize the FIFO and you get overflow: the writer tries to push a word into a full buffer and data is silently dropped. Oversize it and you waste silicon area and power for unused memory. Both outcomes are expensive after tape-out, so depth must be calculated analytically before RTL is frozen.

The core insight: a FIFO only needs to hold the words that accumulate during the worst-case period when the writer is faster than the reader. Once the burst ends, the reader drains the excess and the fill level returns to zero. The peak fill level during that drain window is the required depth.

2. Synchronous FIFO — Same Clock Domain

When both the writer and reader share the same clock but the reader cannot accept data every cycle (due to backpressure, protocol overhead, or stall cycles), the depth depends on the duty-cycle mismatch during a burst.

Scenario: Writer writes every cycle, reader reads every N cycles

If the writer pushes one word per clock for a burst of B words, and the reader can only accept a word every N clocks (read rate = 1/N), the fill level grows by (1 − 1/N) words per clock during the burst.

ParameterSymbolExample
Burst length (words)B32
Write clock (MHz)Fwr100 MHz
Read clock (MHz)Frd40 MHz
Words written in burst timeB32
Words read in same burst timeB × (Frd / Fwr)32 × 0.4 = 12.8
Required depthceil(B × (1 − Frd/Fwr))ceil(32 × 0.6) = 20
// Synchronous FIFO depth formula
// Depth = ceil( B × (1 − Frd / Fwr) )
// where B = burst size, Fwr = write freq, Frd = read freq

// Example: B=32, Fwr=100, Frd=40
depth_required = ceil(32 × (1 − 40/100))
               = ceil(32 × 0.6)
               = ceil(19.2)
               = 20  // round up to next power-of-2 → 32

3. Asynchronous FIFO — Clock Domain Crossing

Async FIFOs cross between two completely unrelated clocks. The same burst-analysis approach applies, but now the write and read time bases are physically different. The calculation asks: "during the time it takes to write B words at Fwr, how many words can be read at Frd?"

The formula is algebraically identical to the synchronous case, but the physical interpretation differs: Fwr and Frd are now asynchronous frequencies that may be completely unrelated (e.g., 83.33 MHz and 27 MHz on a display interface).

CDC margin: Add 2–4 extra words to account for the 2-FF synchronizer latency. The Gray code pointer synchronized into the other domain is always slightly stale — the reader may see the write pointer as 2 clock cycles old, effectively seeing the FIFO as 2 entries more full than it is. Most designs add ceil(Fwr / Frd) + 2 as a safety margin.

4. Worst-Case Burst Analysis

The formula above assumes the writer starts immediately and the reader starts reading from cycle 0. Real protocols often have gaps between bursts and acknowledgment latencies. The worst case for FIFO depth is:

  1. Writer sends maximum burst B back-to-back (no gaps, full throughput)
  2. Reader is blocked for maximum stall cycles before it begins reading
  3. Both extremes occur simultaneously (simultaneous worst-case assumption)

In AXI and APB interfaces, the writer can burst at full clock rate while the reader stalls for a HREADY or PREADY de-assertion. Always design to the combined worst case, not the average case.

Practical rule: Calculate the formula depth, add 2 words for CDC margin, then round up to the next power of 2. Add one more power-of-2 step if the interface protocol has unpredictable stall insertion. A depth that is 2× oversized costs only a few percent more area on a modern process but eliminates a class of hard-to-reproduce overflow bugs.

5. Power-of-2 Requirement

FIFO depth must be a power of 2 whenever Gray code pointers are used for CDC. This is not an optional convention — it is a correctness requirement.

Why Gray code needs power-of-2 depth

A Gray code sequence only changes one bit per count. For a standard binary counter of width N, the Gray sequence cycling through 0 → 2^N − 1 → 0 has exactly one bit change at every transition, including the wrap from (2^N − 1) back to 0.

If the depth is not a power of 2 — say 12 — the pointer counter would need to reset at count 12 instead of 16. That non-power-of-2 modulo operation breaks the single-bit-change property at the wrap boundary, causing multi-bit transitions that invalidate the 2-FF synchronizer.

Calculated DepthRound toPointer Width (N+1)Address Bits (N)
1–2221
3–4432
5–8843
9–161654
17–323265
33–646476

6. Full and Empty Flag Generation

Getting the full and empty flags wrong is the most common FIFO bug. The N+1 bit pointer scheme (one extra MSB beyond the address width) eliminates the ambiguity:

// Full/empty using N+1 bit Gray code pointers
// wptr_gray: write pointer in Gray code (N+1 bits)
// rptr_gray: read pointer in Gray code (N+1 bits)
// wptr_sync: write ptr synchronized into read domain
// rptr_sync: read ptr synchronized into write domain

assign empty = (rptr_gray == wptr_sync);

// Full: MSBs differ, lower bits match (one full lap)
assign full  = (wptr_gray[N]   != rptr_sync[N])
            & (wptr_gray[N-1] != rptr_sync[N-1])
            & (wptr_gray[N-2:0] == rptr_sync[N-2:0]);

7. Parameterized RTL Implementation

module async_fifo_depth_calc #(
  parameter DATA_W = 8,
  parameter DEPTH  = 16   // must be power of 2
) (
  input  logic              wr_clk, wr_rst_n, wr_en,
  input  logic [DATA_W-1:0] wr_data,
  input  logic              rd_clk, rd_rst_n, rd_en,
  output logic [DATA_W-1:0] rd_data,
  output logic              full, empty
);
  localparam AW = $clog2(DEPTH);       // address bits
  localparam PW = AW + 1;              // pointer width (N+1)

  logic [DATA_W-1:0] mem [0:DEPTH-1];
  logic [PW-1:0] wbin, wgray, rbin, rgray;
  logic [PW-1:0] wgray_s1, wgray_sync;  // write ptr synced to rd_clk
  logic [PW-1:0] rgray_s1, rgray_sync;  // read ptr synced to wr_clk

  // Write domain
  always_ff @(posedge wr_clk or negedge wr_rst_n)
    if (!wr_rst_n) wbin <= '0;
    else if (wr_en & !full) begin
      mem[wbin[AW-1:0]] <= wr_data;
      wbin <= wbin + 1'b1;
    end
  assign wgray = wbin ^ (wbin >> 1);

  // Read domain
  always_ff @(posedge rd_clk or negedge rd_rst_n)
    if (!rd_rst_n) rbin <= '0;
    else if (rd_en & !empty)
      rbin <= rbin + 1'b1;
  assign rgray   = rbin ^ (rbin >> 1);
  assign rd_data = mem[rbin[AW-1:0]];

  // 2-FF synchronizers
  always_ff @(posedge rd_clk or negedge rd_rst_n)
    if (!rd_rst_n) {wgray_s1, wgray_sync} <= '0;
    else          {wgray_s1, wgray_sync} <= {wgray, wgray_s1};

  always_ff @(posedge wr_clk or negedge wr_rst_n)
    if (!wr_rst_n) {rgray_s1, rgray_sync} <= '0;
    else          {rgray_s1, rgray_sync} <= {rgray, rgray_s1};

  // Full/empty flags
  assign empty = (rgray == wgray_sync);
  assign full  = (wgray == {~rgray_sync[PW-1:PW-2], rgray_sync[PW-3:0]});

endmodule

8. Worked Examples

Example A: AXI crossbar, 200 → 100 MHz

A 200 MHz AXI master bursts 64 beats into a 100 MHz slave domain. Read rate is half the write rate, so the accumulation is 64 × (1 − 100/200) = 32 words. Round up to next power-of-2: depth = 32. Add CDC margin: 32 + 2 = 34 → round up to 64.

Example B: PCIe TLP to DDR, 250 → 200 MHz

Write: 250 MHz, Read: 200 MHz, Burst: 512 beats (max TLP). Accumulation = 512 × (1 − 200/250) = 512 × 0.2 = 102.4 → ceil = 103. Round to power-of-2: 128. With CDC margin: 128 + 3 = 131 → 256. TLP FIFOs are commonly 256–512 deep for this reason.

Example C: UART receiver, 16× oversampling

A UART at 115200 baud with 16× oversampling clock (1.8432 MHz) writes one byte every ~160 clocks. If the CPU reads via polling with up to 1 ms latency at 48 MHz: bytes arriving in 1 ms = 115200/1000 ≈ 12 bytes. FIFO depth: 16 (includes margin). Classic embedded UART FIFOs are 16 bytes for this reason.

Interactive Simulation Lab
Set write/read frequencies and burst size — trigger a burst and watch the fill level to verify whether your configured depth prevents overflow.

Traffic Parameters

Theoretical Analysis

Formula
Min Depth
Next Power-of-2
RISK: Depth too small — overflow likely
DATA LOST — OVERFLOW DETECTED
Buffer Grid (0 / 16)
0%
Fill Level vs Time

Frequently Asked Questions

For an async FIFO between write clock Fwr and read clock Frd with burst size B: Depth = ceil(B × (1 − Frd/Fwr)). This is the number of words that accumulate during the burst because the reader is slower. Add 2–4 extra words for the CDC synchronizer pipeline delay, then round up to the next power of 2.
Two reasons. First, power-of-2 depth lets pointer arithmetic use a simple bitwise AND mask instead of division. Second, and more critically, Gray code pointer generation for CDC only produces a valid single-bit-change sequence when the counter wraps at a power-of-2 boundary. Non-power-of-2 sizes break Gray code integrity and allow multi-bit transitions that defeat the 2-FF synchronizer.
A FIFO of depth 2^N uses N+1 bit pointers — one extra MSB beyond the memory address width. Both empty and full conditions have all lower N bits equal, but in the empty case all N+1 bits match, while in the full case the two MSBs differ (the write pointer has lapped the read pointer). The extra bit distinguishes the two states unambiguously without any subtraction or fill counter.
Behavior depends on the design. Most RTL implementations simply assert a full flag and block further writes — the word is silently dropped. Some designs assert an overflow sticky bit that must be explicitly cleared. Either way, downstream logic sees a gap in the data stream. In protocol-facing FIFOs (AXI, PCIe) a single dropped word corrupts the entire transaction.
Yes. When Frd > Fwr, the depth formula gives a negative or zero result because the reader always drains faster than the writer fills. In this case the minimum depth is 1 (or 2 for registered output). The FIFO never overflows, though it will frequently underflow (be empty) between bursts. The depth sizing problem only matters when Frd < Fwr during a burst.