Clock Domain Crossing

Async FIFO
Design

The canonical solution for crossing multi-bit data between asynchronous clock domains — Gray code pointers, 2-FF synchronizers, and safe full/empty generation.

1. Why Async FIFOs Exist

Modern SoCs integrate subsystems running at different frequencies: a CPU at 1 GHz, a DSP at 600 MHz, a memory controller at 400 MHz, and USB PHY at 480 MHz. When data must flow between two of these domains, it cannot simply be registered and passed — the receiving flip-flop's setup time cannot be guaranteed when the sender's clock is asynchronous.

Passing a single-bit signal between domains is solved with a 2-FF synchronizer, which tolerates metastability by giving the flip-flop extra time to resolve. But passing a multi-bit bus is more dangerous: each bit may resolve independently, creating a brief window where the bus holds an impossible intermediate value — for example, binary 7 transitioning to 8 as 0111→1000, sampled mid-flip as 1010.

The asynchronous FIFO solves this by never passing a multi-bit address directly across the domain boundary. Instead, it passes Gray-coded pointers — one bit changes at a time — making 2-FF synchronization safe.

2. Architecture Overview

An async FIFO consists of five key components, each residing in a specific clock domain:

ComponentDomainFunction
Dual-port SRAMShared (no clock)Storage — written by wclk, read by rclk
Write pointer (wptr)Write (wclk)Binary counter → Gray encoded, indexes next write address
Read pointer (rptr)Read (rclk)Binary counter → Gray encoded, indexes next read address
W2R synchronizerRead (rclk)2-FF synchronizer: wptr_gray → rclk domain for empty generation
R2W synchronizerWrite (wclk)2-FF synchronizer: rptr_gray → wclk domain for full generation

3. The Pointer Width Trick

A FIFO with 2N locations uses N+1 bit pointers. The extra MSB solves the ambiguity between full and empty — both conditions produce address bits that are equal. With the extra bit:

empty = (wptr_gray_sync == rptr_gray) Read domain: all N+1 bits match
full = (wptr_gray[N:N-1] != rptr_gray_sync[N:N-1]) Write domain: top 2 Gray bits differ, rest match

The full condition works in Gray code because the MSB and next bit together encode whether the write pointer has wrapped around and lapped the read pointer. The remaining N-1 bits must still match — the write pointer is exactly 2N ahead.

Why N+1 bits? With only N address bits, pointer 0 after reset and pointer 0 after a full wrap-around look identical. The N+1 bit disambiguates: if the write pointer has wrapped an odd number of times and the read pointer has not (or vice versa), the MSBs differ — indicating full rather than empty.

4. Gray Code Synchronization Flow

The synchronization flow for both pointers is identical in structure:

Never decode synchronized Gray code to binary before comparison. The synchronized value is already one or two cycles stale — it may be slightly behind the true pointer. This is safe for flag generation (conservative empty/full), but decoding to binary and using it as a SRAM address would cause data corruption.

5. RTL Implementation

// Top-level async FIFO (16-deep, 8-bit data, 5-bit pointers)
module async_fifo #(
  parameter DSIZE = 8,
  parameter ASIZE = 4   // depth = 2^ASIZE = 16
)(
  input  logic           wclk, wrst_n, winc,
  input  logic [DSIZE-1:0] wdata,
  output logic           wfull,
  input  logic           rclk, rrst_n, rinc,
  output logic [DSIZE-1:0] rdata,
  output logic           rempty
);
  logic [ASIZE:0] wptr, rptr, wq2_rptr, rq2_wptr;

  // Write-domain pointer → synced into read domain
  sync_r2w #(ASIZE) i_sync_r2w (.rq2_wptr, .rptr, .wclk, .wrst_n);

  // Read-domain pointer → synced into write domain
  sync_w2r #(ASIZE) i_sync_w2r (.wq2_rptr, .wptr, .rclk, .rrst_n);

  // Dual-port memory
  fifomem  #(DSIZE,ASIZE) i_mem (
    .rdata, .wdata,
    .waddr(wptr[ASIZE-1:0]),
    .raddr(rptr[ASIZE-1:0]),
    .wclken(!wfull), .wclk
  );

  // Write pointer logic + full flag
  wptr_full  #(ASIZE) i_wptr (.wfull, .wptr, .rq2_wptr, .winc, .wclk, .wrst_n);

  // Read pointer logic + empty flag
  rptr_empty #(ASIZE) i_rptr (.rempty, .rptr, .wq2_rptr, .rinc, .rclk, .rrst_n);
endmodule

Write Pointer Module

module wptr_full #(parameter ASIZE = 4) (
  output logic          wfull,
  output logic [ASIZE:0] wptr,
  input  logic [ASIZE:0] rq2_wptr,
  input  logic          winc, wclk, wrst_n
);
  logic [ASIZE:0] wbin;
  logic [ASIZE:0] wgraynext, wbinnext;

  always_ff @(posedge wclk or negedge wrst_n)
    if (!wrst_n) {wbin, wptr} <= '0;
    else         {wbin, wptr} <= {wbinnext, wgraynext};

  assign wbinnext  = wbin + (winc & !wfull);
  assign wgraynext = wbinnext ^ (wbinnext >> 1);

  // Full: top 2 Gray bits differ, rest match
  assign wfull = (wgraynext == {~rq2_wptr[ASIZE:ASIZE-1], rq2_wptr[ASIZE-2:0]});
endmodule

2-FF Synchronizer

module sync_w2r #(parameter ASIZE = 4) (
  output logic [ASIZE:0] wq2_rptr,
  input  logic [ASIZE:0] wptr,
  input  logic           rclk, rrst_n
);
  logic [ASIZE:0] wq1_rptr;

  always_ff @(posedge rclk or negedge rrst_n)
    if (!rrst_n) {wq2_rptr, wq1_rptr} <= '0;
    else         {wq2_rptr, wq1_rptr} <= {wq1_rptr, wptr};

  // Synthesis: (*ASYNC_REG = "TRUE"*) on these flops
endmodule

6. Full/Empty Flag Conservatism

Both flags are intentionally conservative due to synchronization latency:

Empty Flag (Read Domain)

May assert empty even when data is available — the write pointer is 1–2 rclk cycles stale. The FIFO never loses data, but may under-utilize throughput slightly.

Full Flag (Write Domain)

May assert full even when space is available — the read pointer is 1–2 wclk cycles stale. The FIFO never overflows, but may have slightly less usable depth in practice.

Safety Guarantee

Conservative flags guarantee data integrity. A non-conservative flag that says "not full" when it actually is would cause silent data corruption — the worst class of hardware bug.

7. Synthesis and CDC Sign-Off

Interactive Lab — Async FIFO Simulator

Adjust write and read clock frequencies, burst-write data, and start continuous reading. Watch Gray code pointers synchronize across domains and full/empty flags assert correctly.

Controls

Write Clock 100 MHz
Read Clock 50 MHz
W_BIN
00000
W_GRAY
00000
R_BIN
00000
R_GRAY
00000
W2R sync
00000
R2W sync
00000
> FIFO ready.
Async FIFO — Dual-Domain Architecture
WRITE DOMAIN (wclk) READ DOMAIN (rclk) DUAL PORT SRAM W_BIN COUN 00000 BIN2GRAY W_PTR(Gray) 00000 FULL LOGIC FULL SYNC R2W 00000 R_BIN COUN 00000 BIN2GRAY R_PTR(Gray) 00000 EMPTY LOGIC EMPTY SYNC W2R 00000

Frequently Asked Questions

An async FIFO is a dual-port buffer where data is written by one clock domain and read by a different, asynchronous clock domain. It solves the multi-bit CDC problem by never passing a multi-bit address directly across the clock boundary. Instead, it passes Gray-coded pointers through 2-FF synchronizers — since Gray code changes only one bit per count, the synchronized pointer is always a valid count value.
Binary pointers change multiple bits simultaneously on each count. When sampled by a 2-FF synchronizer across a clock boundary, the sampling flip-flop may see a mid-transition state — for example, binary 7→8 as 0111→1000 could be sampled as 1010. Gray code guarantees only one bit changes per count, so a metastable sampling produces at most an off-by-one pointer. The FIFO reads slightly stale count information, but never a garbage address.
Empty is detected in the read domain: when the synchronized write pointer (wq2_rptr) equals the read pointer (rptr) in Gray code — all N+1 bits match. Full is detected in the write domain: when the write pointer's top two Gray bits are the inverse of the synchronized read pointer's top two Gray bits, and the remaining bits match. The extra N+1 bit disambiguates full (one pointer lapped the other) from empty (pointers are truly equal).
The synchronized pointer in each domain is 1–2 clock cycles stale due to the 2-FF synchronizer latency. The empty flag may assert "empty" even when new data has just been written but not yet synchronized. The full flag may assert "full" even when data has just been read but not yet synchronized. This conservatism is intentional — it prevents overflows and underflows at the cost of a small reduction in peak throughput, which is always the safe tradeoff.

Async FIFO in Production ASIC Design — Engineering Realities

Bus Protocol Integration: AXI-to-Async-FIFO Patterns

In most real SoCs, the async FIFO sits between two bus-attached subsystems rather than between raw producer/consumer logic. A common pattern is an AXI4 write channel (AW + W channels in the write clock domain) feeding an async FIFO, with the FIFO output being read by a DDR memory controller in its own clock domain. This integration requires careful handling of the AXI protocol constraints at the FIFO boundary. The AXI master cannot de-assert VALID after asserting it — so when the FIFO is full (FULL asserted), the design must not back-pressure the master by de-asserting its own READY signal mid-transaction. Instead, the FIFO must be sized large enough that FULL never asserts during a maximum-length AXI burst. Alternatively, the AXI slave interface in the write clock domain must buffer the entire burst before asserting AWREADY, ensuring the full burst is accepted before the FIFO can even approach full. Neither approach is "free" — the first increases FIFO depth, the second adds burst-buffer area — and the trade-off is explicitly specified in the subsystem's design document.

FIFO Depth Calculation Including Synchronizer Latency

The minimum FIFO depth formula must account for the 2-FF synchronizer's pipeline delay. When the write domain writes N words, the read domain cannot see those words until the write pointer has propagated through two flip-flop stages in the read clock domain — a latency of 2 read clock cycles. Similarly, when the read domain drains N words, the write domain cannot see the freed space until 2 write clock cycles have elapsed. The correct minimum depth formula is: D_min = ceil(B × (1 − F_rd / F_wr)) + 2, where the +2 accounts for the synchronizer latency on the full-flag path. For high clock-ratio cases (F_wr >> F_rd), the +2 is negligible. But for near-equal clock frequencies, where the burst accumulation term is small, the synchronizer latency can represent a significant fraction of the total depth — omitting it produces a FIFO that overflows precisely when the clock ratio is closest to 1:1, a failure that is nearly impossible to reproduce in RTL simulation because most simulation environments use exact frequency ratios rather than the realistic spread that occurs in silicon.

Physical Design Constraints for Async FIFOs

The 2-FF synchronizer cells in an async FIFO have special physical design requirements. First, they must be placed close together — the launch flip-flop and the first capture flip-flop should be within 5–10 microns to minimize routing delay and avoid introducing setup-time violations on the synchronizer path (the synchronizer path has no setup requirement between the domains, but it does have a minimum resolution time requirement). Second, the synchronizer flip-flops should use high-drive-strength, low-metastability-window cells from the standard cell library — many foundry libraries offer specific "sync" cells with improved MTBF characteristics. Third, the CDC timing constraint must be applied: the path from the write-domain Gray pointer flip-flop to the read-domain 2-FF synchronizer must be set_max_delay (with datapath only) in the SDC to tell the static timing analysis tool to ignore this crossing's setup check while still analyzing it for max transition. If the SDC exception is omitted, the STA tool flags the synchronizer path as a timing violation — which it is not, structurally, but which the tool cannot automatically distinguish from a true CDC violation without the designer's explicit annotation.