1. Why Async FIFOs Exist
Modern SoCs integrate subsystems running at different frequencies: a CPU at 1 GHz, a DSP at 600 MHz, a memory controller at 400 MHz, and USB PHY at 480 MHz. When data must flow between two of these domains, it cannot simply be registered and passed — the receiving flip-flop's setup time cannot be guaranteed when the sender's clock is asynchronous.
Passing a single-bit signal between domains is solved with a 2-FF synchronizer, which tolerates metastability by giving the flip-flop extra time to resolve. But passing a multi-bit bus is more dangerous: each bit may resolve independently, creating a brief window where the bus holds an impossible intermediate value — for example, binary 7 transitioning to 8 as 0111→1000, sampled mid-flip as 1010.
The asynchronous FIFO solves this by never passing a multi-bit address directly across the domain boundary. Instead, it passes Gray-coded pointers — one bit changes at a time — making 2-FF synchronization safe.
2. Architecture Overview
An async FIFO consists of five key components, each residing in a specific clock domain:
| Component | Domain | Function |
|---|---|---|
| Dual-port SRAM | Shared (no clock) | Storage — written by wclk, read by rclk |
| Write pointer (wptr) | Write (wclk) | Binary counter → Gray encoded, indexes next write address |
| Read pointer (rptr) | Read (rclk) | Binary counter → Gray encoded, indexes next read address |
| W2R synchronizer | Read (rclk) | 2-FF synchronizer: wptr_gray → rclk domain for empty generation |
| R2W synchronizer | Write (wclk) | 2-FF synchronizer: rptr_gray → wclk domain for full generation |
3. The Pointer Width Trick
A FIFO with 2N locations uses N+1 bit pointers. The extra MSB solves the ambiguity between full and empty — both conditions produce address bits that are equal. With the extra bit:
The full condition works in Gray code because the MSB and next bit together encode whether the write pointer has wrapped around and lapped the read pointer. The remaining N-1 bits must still match — the write pointer is exactly 2N ahead.
Why N+1 bits? With only N address bits, pointer 0 after reset and pointer 0 after a full wrap-around look identical. The N+1 bit disambiguates: if the write pointer has wrapped an odd number of times and the read pointer has not (or vice versa), the MSBs differ — indicating full rather than empty.
4. Gray Code Synchronization Flow
The synchronization flow for both pointers is identical in structure:
- Binary counter increments on each valid write/read
- Binary value is XOR-shifted to Gray code:
gray = bin ^ (bin >> 1) - Gray code pointer crosses domain boundary through 2-FF synchronizer
- Receiving domain uses the synchronized Gray pointer for flag comparison
- Receiving domain never decodes back to binary — comparison stays in Gray
Never decode synchronized Gray code to binary before comparison. The synchronized value is already one or two cycles stale — it may be slightly behind the true pointer. This is safe for flag generation (conservative empty/full), but decoding to binary and using it as a SRAM address would cause data corruption.
5. RTL Implementation
// Top-level async FIFO (16-deep, 8-bit data, 5-bit pointers) module async_fifo #( parameter DSIZE = 8, parameter ASIZE = 4 // depth = 2^ASIZE = 16 )( input logic wclk, wrst_n, winc, input logic [DSIZE-1:0] wdata, output logic wfull, input logic rclk, rrst_n, rinc, output logic [DSIZE-1:0] rdata, output logic rempty ); logic [ASIZE:0] wptr, rptr, wq2_rptr, rq2_wptr; // Write-domain pointer → synced into read domain sync_r2w #(ASIZE) i_sync_r2w (.rq2_wptr, .rptr, .wclk, .wrst_n); // Read-domain pointer → synced into write domain sync_w2r #(ASIZE) i_sync_w2r (.wq2_rptr, .wptr, .rclk, .rrst_n); // Dual-port memory fifomem #(DSIZE,ASIZE) i_mem ( .rdata, .wdata, .waddr(wptr[ASIZE-1:0]), .raddr(rptr[ASIZE-1:0]), .wclken(!wfull), .wclk ); // Write pointer logic + full flag wptr_full #(ASIZE) i_wptr (.wfull, .wptr, .rq2_wptr, .winc, .wclk, .wrst_n); // Read pointer logic + empty flag rptr_empty #(ASIZE) i_rptr (.rempty, .rptr, .wq2_rptr, .rinc, .rclk, .rrst_n); endmodule
Write Pointer Module
module wptr_full #(parameter ASIZE = 4) ( output logic wfull, output logic [ASIZE:0] wptr, input logic [ASIZE:0] rq2_wptr, input logic winc, wclk, wrst_n ); logic [ASIZE:0] wbin; logic [ASIZE:0] wgraynext, wbinnext; always_ff @(posedge wclk or negedge wrst_n) if (!wrst_n) {wbin, wptr} <= '0; else {wbin, wptr} <= {wbinnext, wgraynext}; assign wbinnext = wbin + (winc & !wfull); assign wgraynext = wbinnext ^ (wbinnext >> 1); // Full: top 2 Gray bits differ, rest match assign wfull = (wgraynext == {~rq2_wptr[ASIZE:ASIZE-1], rq2_wptr[ASIZE-2:0]}); endmodule
2-FF Synchronizer
module sync_w2r #(parameter ASIZE = 4) ( output logic [ASIZE:0] wq2_rptr, input logic [ASIZE:0] wptr, input logic rclk, rrst_n ); logic [ASIZE:0] wq1_rptr; always_ff @(posedge rclk or negedge rrst_n) if (!rrst_n) {wq2_rptr, wq1_rptr} <= '0; else {wq2_rptr, wq1_rptr} <= {wq1_rptr, wptr}; // Synthesis: (*ASYNC_REG = "TRUE"*) on these flops endmodule
6. Full/Empty Flag Conservatism
Both flags are intentionally conservative due to synchronization latency:
Empty Flag (Read Domain)
May assert empty even when data is available — the write pointer is 1–2 rclk cycles stale. The FIFO never loses data, but may under-utilize throughput slightly.
Full Flag (Write Domain)
May assert full even when space is available — the read pointer is 1–2 wclk cycles stale. The FIFO never overflows, but may have slightly less usable depth in practice.
Safety Guarantee
Conservative flags guarantee data integrity. A non-conservative flag that says "not full" when it actually is would cause silent data corruption — the worst class of hardware bug.
7. Synthesis and CDC Sign-Off
- Mark all synchronizer flops with
ASYNC_REGsynthesis attribute (Xilinx) or equivalent CDC annotation - Apply
set_false_pathfrom wptr Gray FFs to the W2R synchronizer's first FF - Apply
set_false_pathfrom rptr Gray FFs to the R2W synchronizer's first FF - Run CDC analysis (Questa CDC, Synopsys SpyGlass CDC) to verify no unconstrained paths cross domain boundaries
- Verify that dual-port SRAM constraints allow simultaneous read and write without timing conflicts
- Confirm FIFO depth is at least 4× the latency difference between the two clock domains to prevent stalls
Adjust write and read clock frequencies, burst-write data, and start continuous reading. Watch Gray code pointers synchronize across domains and full/empty flags assert correctly.
Controls
Frequently Asked Questions
Async FIFO in Production ASIC Design — Engineering Realities
Bus Protocol Integration: AXI-to-Async-FIFO Patterns
In most real SoCs, the async FIFO sits between two bus-attached subsystems rather than between raw producer/consumer logic. A common pattern is an AXI4 write channel (AW + W channels in the write clock domain) feeding an async FIFO, with the FIFO output being read by a DDR memory controller in its own clock domain. This integration requires careful handling of the AXI protocol constraints at the FIFO boundary. The AXI master cannot de-assert VALID after asserting it — so when the FIFO is full (FULL asserted), the design must not back-pressure the master by de-asserting its own READY signal mid-transaction. Instead, the FIFO must be sized large enough that FULL never asserts during a maximum-length AXI burst. Alternatively, the AXI slave interface in the write clock domain must buffer the entire burst before asserting AWREADY, ensuring the full burst is accepted before the FIFO can even approach full. Neither approach is "free" — the first increases FIFO depth, the second adds burst-buffer area — and the trade-off is explicitly specified in the subsystem's design document.
FIFO Depth Calculation Including Synchronizer Latency
The minimum FIFO depth formula must account for the 2-FF synchronizer's pipeline delay. When the write domain writes N words, the read domain cannot see those words until the write pointer has propagated through two flip-flop stages in the read clock domain — a latency of 2 read clock cycles. Similarly, when the read domain drains N words, the write domain cannot see the freed space until 2 write clock cycles have elapsed. The correct minimum depth formula is: D_min = ceil(B × (1 − F_rd / F_wr)) + 2, where the +2 accounts for the synchronizer latency on the full-flag path. For high clock-ratio cases (F_wr >> F_rd), the +2 is negligible. But for near-equal clock frequencies, where the burst accumulation term is small, the synchronizer latency can represent a significant fraction of the total depth — omitting it produces a FIFO that overflows precisely when the clock ratio is closest to 1:1, a failure that is nearly impossible to reproduce in RTL simulation because most simulation environments use exact frequency ratios rather than the realistic spread that occurs in silicon.
Physical Design Constraints for Async FIFOs
The 2-FF synchronizer cells in an async FIFO have special physical design requirements. First, they must be placed close together — the launch flip-flop and the first capture flip-flop should be within 5–10 microns to minimize routing delay and avoid introducing setup-time violations on the synchronizer path (the synchronizer path has no setup requirement between the domains, but it does have a minimum resolution time requirement). Second, the synchronizer flip-flops should use high-drive-strength, low-metastability-window cells from the standard cell library — many foundry libraries offer specific "sync" cells with improved MTBF characteristics. Third, the CDC timing constraint must be applied: the path from the write-domain Gray pointer flip-flop to the read-domain 2-FF synchronizer must be set_max_delay (with datapath only) in the SDC to tell the static timing analysis tool to ignore this crossing's setup check while still analyzing it for max transition. If the SDC exception is omitted, the STA tool flags the synchronizer path as a timing violation — which it is not, structurally, but which the tool cannot automatically distinguish from a true CDC violation without the designer's explicit annotation.