What is an Asynchronous FIFO?

An asynchronous FIFO (First-In First-Out buffer) allows data to be written and read using two completely independent clocks — there is no shared clock signal between the writer and the reader. This makes it the fundamental building block for crossing clock domain boundaries in modern SoC designs.

Common use cases include: PCIe controller ↔ NVMe SSD, CPU ↔ DDR PHY, USB MAC ↔ application logic, Ethernet MAC ↔ system fabric, and any interface between two IPs clocked at different frequencies.

Key problem: You cannot simply connect wires between two clock domains. A flip-flop in domain B sampling a signal from domain A may violate setup/hold timing and enter metastability — an indeterminate output that resolves to 0 or 1 unpredictably. The async FIFO solves this by synchronizing only a single-bit-changing Gray-coded pointer.
PropertySynchronous FIFOAsynchronous FIFO
Clock domainsSingle shared clockIndependent WCLK and RCLK
Pointer crossingNot needed2-FF synchronizer with Gray code
Full/Empty logicSimple subtractionGray code comparison
ComplexityLowMedium (5 modules)
Use caseSame-clock bufferingClock domain crossing (CDC)

Architecture

A properly designed async FIFO (Cliff Cummings style) has five modules:

WRITE PTR LOGIC wptr_gray FULL flag 2-FF SYNC R→W DUAL-PORT RAM mem[0:DEPTH-1] 2-FF SYNC W→R READ PTR LOGIC rptr_gray EMPTY flag ← WCLK domain RCLK domain → clock boundary

Why Gray Code?

This is the single most important insight in async FIFO design. When you synchronize a multi-bit signal across clock domains with 2 flip-flops, you are betting that the signal is stable when the destination FF samples it.

With a binary counter, transitioning from 3 (011) to 4 (100) changes all three bits simultaneously. If the destination FF samples during this transition, it might see 111, 010, 101 — any combination — not just 3 or 4. This corrupt value can cause a catastrophic FIFO over/underflow.

Gray code rule: Only 1 bit changes between any two consecutive values. Even if that 1 bit is caught mid-transition (metastability), the worst case is reading the old pointer — which is a conservative error (you think the FIFO has slightly less data than it really does). This is safe.
CountBinaryBits changedGray CodeBits changed
0000000
10011 bit0011 bit ✓
20102 bits!0111 bit ✓
30111 bit0101 bit ✓
41003 bits!!!1101 bit ✓
51011 bit1111 bit ✓
61102 bits!1011 bit ✓
71111 bit1001 bit ✓

Full & Empty Flag Logic

Pointer Width

For a FIFO of depth 2N, use (N+1)-bit pointers. The extra MSB acts as a wrap-around indicator, allowing full/empty distinction when both pointers point to the same address.

EMPTY Flag (Read Domain)

empty = (rptr_gray == wptr_gray_sync)
// Both pointers at same gray value → no unread data
// wptr_gray_sync = wptr_gray delayed by 2 RCLK cycles (2-FF sync)

FULL Flag (Write Domain)

full = (wptr_gray == { ~rptr_gray_sync[N:N-1], rptr_gray_sync[N-2:0] })
// Top 2 MSBs inverted — detects wrap-around in Gray code space
// rptr_gray_sync = rptr_gray delayed by 2 WCLK cycles (2-FF sync)

The top-2-MSB inversion works because in Gray code, a pointer that has wrapped around once differs from the "same address" unwrapped pointer in exactly the top 2 MSBs. The remaining lower bits are identical in Gray code.

Verilog RTL

// Async FIFO — synthesizable, Cliff Cummings style module async_fifo #( parameter DATA_W = 8, parameter DEPTH_BITS = 3 // depth = 2^3 = 8 )( input wire wclk, wrst_n, wen, input wire [DATA_W-1:0] wdata, output wire full, input wire rclk, rrst_n, ren, output reg [DATA_W-1:0] rdata, output wire empty ); localparam DEPTH = 1 << DEPTH_BITS; localparam PTR_W = DEPTH_BITS + 1; reg [DATA_W-1:0] mem [0:DEPTH-1]; reg [PTR_W-1:0] wptr, rptr; // Binary → Gray code wire [PTR_W-1:0] wgray = wptr ^ (wptr >> 1); wire [PTR_W-1:0] rgray = rptr ^ (rptr >> 1); // 2-FF synchronizers reg [PTR_W-1:0] wg_r1, wg_r2; // wgray → read domain reg [PTR_W-1:0] rg_w1, rg_w2; // rgray → write domain // Write logic always @(posedge wclk or negedge wrst_n) if (!wrst_n) wptr <= 0; else if (wen && !full) begin mem[wptr[DEPTH_BITS-1:0]] <= wdata; wptr <= wptr + 1; end // Read logic always @(posedge rclk or negedge rrst_n) if (!rrst_n) rptr <= 0; else if (ren && !empty) begin rdata <= mem[rptr[DEPTH_BITS-1:0]]; rptr <= rptr + 1; end // Sync wgray into read domain always @(posedge rclk or negedge rrst_n) if (!rrst_n) {wg_r2, wg_r1} <= 0; else {wg_r2, wg_r1} <= {wg_r1, wgray}; // Sync rgray into write domain always @(posedge wclk or negedge wrst_n) if (!wrst_n) {rg_w2, rg_w1} <= 0; else {rg_w2, rg_w1} <= {rg_w1, rgray}; // FULL (write domain) — top 2 MSBs inverted in comparison assign full = (wgray == {~rg_w2[PTR_W-1:PTR_W-2], rg_w2[PTR_W-3:0]}); // EMPTY (read domain) assign empty = (rgray == wg_r2); endmodule

Interview FAQ

An async FIFO is a data buffer that allows safe data transfer between two clock domains with different (and potentially unrelated) frequencies or phases. You use it whenever two IPs must exchange data but run on separate clocks — e.g., PCIe controller to NVMe, CPU to DDR PHY, or USB MAC to application logic. The FIFO absorbs the timing difference and prevents metastability from propagating into the design.
When you synchronize a multi-bit counter across clock domains with 2 flip-flops, if more than 1 bit changes simultaneously, the destination FF could sample a completely invalid intermediate value (e.g., 011→100 could sample as 111, 010, 101...). Gray code guarantees only 1 bit changes per count step. Even if that 1 bit is caught mid-transition (metastability), the synchronized value is either the old count or the new count — both are safe, conservative values.
FULL is computed in the write domain. The synchronized rptr_gray (rptr_gray_sync, 2 WCLK cycles old) is compared to wptr_gray with the top 2 MSBs inverted: full = (wptr_gray == {~rptr_gray_sync[N:N-1], rptr_gray_sync[N-2:0]}). This pattern appears in Gray code when the write pointer has wrapped around once and is now exactly DEPTH entries ahead of the read pointer. The top-2-MSB inversion distinguishes "same address, same wrap" (empty/equal) from "same address, one wrap apart" (full).
The 2-FF synchronizer introduces a 2-cycle delay, so FULL/EMPTY flags are slightly conservative — they assert a bit later than the actual condition (from the other domain's perspective). This means the FIFO might be slightly fuller than the write domain thinks (it won't overwrite data) or slightly emptier than the read domain thinks (it won't read invalid data). This is safe by design. The FIFO may appear to have slightly less capacity than its rated depth due to this margin.
No. A single FF synchronizer is not sufficient for reliable CDC. The single FF output has insufficient time to resolve metastability before it is sampled by downstream logic. A 2-FF synchronizer gives the first FF one full destination clock cycle to settle, reducing the probability of metastability propagation to negligibly low levels (typically <10⁻¹⁴ errors/second for modern processes). High-speed designs at 1 GHz+ sometimes use 3-FF synchronizers for extra margin.