1. Overview: Why Dual-Clock FIFO?
Days 1-4 covered individual synchronization techniques:
- Day 1: Two-FF synchronizer (single bit)
- Day 2: Gray code (multi-bit counters)
- Day 3: Pulse synchronizers (control signals)
- Day 4: Timing and metastability analysis
Now we integrate all of these into the most common real-world CDC pattern: the Dual-Clock FIFO.
A dual-clock FIFO solves this practical problem: One clock domain produces data, another consumes it, and they're not synchronized.
- USB host writes USB packets (Clock A) into the chip
- Processor reads packets (Clock B) at its own clock rate
- Clock A and Clock B are completely asynchronous
- Solution: Dual-clock FIFO buffers the data safely
This is used in virtually every chip with multiple clock domains that need to exchange data: SoCs, network interfaces, memory controllers, accelerators.
2. Architecture Overview
3. Write and Read Pointers
Pointer Design
Both write and read pointers are counters (incrementing on each write/read). For a 256-entry FIFO, pointers are 8-bit counters that wrap around:
- write_ptr (Clock A): Increments on every write, wraps at 256
- read_ptr (Clock B): Increments on every read, wraps at 256
Empty/Full Flag Detection
To determine if the FIFO is empty or full, we need to compare pointers across clock domains:
- FIFO is empty (Clock B): read_ptr == write_ptr_in_clk_b (synchronized from Clock A)
- FIFO is full (Clock A): write_ptr == read_ptr_in_clk_a (synchronized from Clock B)
4. Full RTL Implementation
module dual_clock_fifo #(
parameter DATA_WIDTH = 32,
parameter DEPTH = 256,
parameter ADDR_WIDTH = 8 // log2(DEPTH)
) (
// Write domain
input clk_w,
input rst_w,
input [DATA_WIDTH-1:0] write_data,
input write_en,
output full,
// Read domain
input clk_r,
input rst_r,
output [DATA_WIDTH-1:0] read_data,
input read_en,
output empty
);
// ============ Write Domain (Clock W) ============
reg [ADDR_WIDTH:0] write_ptr_w, write_ptr_w_next;
reg [ADDR_WIDTH:0] read_ptr_sync_w; // Sync'd read ptr
reg [ADDR_WIDTH-1:0] read_ptr_w_gray, read_ptr_w_gray_ff1;
// Dual-port RAM
reg [DATA_WIDTH-1:0] fifo_mem [0:DEPTH-1];
// Write pointer logic
always @(posedge clk_w or negedge rst_w) begin
if (!rst_w) begin
write_ptr_w <= {ADDR_WIDTH+1{1'b0}};
end else if (write_en && !full) begin
write_ptr_w <= write_ptr_w + 1;
end
end
// RAM write
always @(posedge clk_w) begin
if (write_en && !full) begin
fifo_mem[write_ptr_w[ADDR_WIDTH-1:0]] <= write_data;
end
end
// Convert read ptr to Gray and synchronize to write domain
// (Two-stage synchronizer for read_ptr_gray)
always @(posedge clk_w or negedge rst_w) begin
if (!rst_w) begin
read_ptr_w_gray_ff1 <= {ADDR_WIDTH{1'b0}};
read_ptr_w_gray <= {ADDR_WIDTH{1'b0}};
end else begin
read_ptr_w_gray_ff1 <= read_ptr_gray; // FF1
read_ptr_w_gray <= read_ptr_w_gray_ff1; // FF2
end
end
// Convert synced Gray read ptr back to binary
wire [ADDR_WIDTH:0] read_ptr_w_bin;
gray_to_binary #(.WIDTH(ADDR_WIDTH)) g2b_w (
.gray(read_ptr_w_gray),
.binary(read_ptr_w_bin)
);
// Full flag: write_ptr == read_ptr means FULL
// (all entries occupied)
assign full = (write_ptr_w == read_ptr_w_bin);
// ============ Read Domain (Clock R) ============
reg [ADDR_WIDTH:0] read_ptr_r, read_ptr_r_next;
wire [ADDR_WIDTH-1:0] write_ptr_r_gray, write_ptr_r_gray_ff1;
reg [ADDR_WIDTH-1:0] write_ptr_r_gray_sync_ff1, write_ptr_r_gray_sync;
// Read pointer logic
always @(posedge clk_r or negedge rst_r) begin
if (!rst_r) begin
read_ptr_r <= {ADDR_WIDTH+1{1'b0}};
end else if (read_en && !empty) begin
read_ptr_r <= read_ptr_r + 1;
end
end
// RAM read
assign read_data = fifo_mem[read_ptr_r[ADDR_WIDTH-1:0]];
// Convert write ptr to Gray and synchronize to read domain
binary_to_gray #(.WIDTH(ADDR_WIDTH)) b2g_r (
.binary(write_ptr_w[ADDR_WIDTH-1:0]),
.gray(write_ptr_r_gray)
);
always @(posedge clk_r or negedge rst_r) begin
if (!rst_r) begin
write_ptr_r_gray_sync_ff1 <= {ADDR_WIDTH{1'b0}};
write_ptr_r_gray_sync <= {ADDR_WIDTH{1'b0}};
end else begin
write_ptr_r_gray_sync_ff1 <= write_ptr_r_gray; // FF1
write_ptr_r_gray_sync <= write_ptr_r_gray_sync_ff1; // FF2
end
end
// Convert synced Gray write ptr back to binary
wire [ADDR_WIDTH:0] write_ptr_r_bin;
gray_to_binary #(.WIDTH(ADDR_WIDTH)) g2b_r (
.gray(write_ptr_r_gray_sync),
.binary(write_ptr_r_bin)
);
// Empty flag: read_ptr == write_ptr means EMPTY
assign empty = (read_ptr_r == write_ptr_r_bin);
endmodule
5. Timing and Synchronization Latency
Write-to-Read Latency
When data is written to the FIFO:
- Data written to RAM (Clock A) — visible immediately in RAM
- Write pointer incremented (Clock A)
- Write pointer converted to Gray (Clock A) — combinational
- Gray write pointer synchronized to Clock B — 2 Clock B cycles
- Synchronized Gray pointer converted back to binary (Clock B) — combinational
- Read logic can now see empty=0 — ~2.5 Clock B cycles after the write
Total latency: ~2.5 Clock B cycles from write to read visibility. This is acceptable for most applications.
Full Flag Latency
Similar analysis: Write domain sees full flag updated ~2.5 Clock A cycles after a read occurs.
During this window, the write domain might write data thinking FIFO is not full, but a read is happening in the other domain. The FIFO control logic must be designed to handle this gracefully (no data loss).
6. Practical Design Considerations
Synchronizer Placement
Key rule: Gray code pointers must be synchronized before converting back to binary.
- ✅ Correct: write_ptr_gray → (Clock B) → FF1 → FF2 → gray2bin → write_ptr_b
- ❌ Incorrect: write_ptr → (Clock B) → FF1 → FF2 → read binary version
The incorrect version loses the benefits of Gray code—binary can change all bits, losing the single-bit property during CDC.
Pointer Width
Pointers are typically (log2(DEPTH) + 1) bits:
- 7-bit for 128-entry FIFO
- 8-bit for 256-entry FIFO
- 9-bit for 512-entry FIFO
The extra bit is critical for distinguishing full vs. empty (both have read_ptr == write_ptr without the extra bit).
Reset Synchronization
Both reset signals (rst_w, rst_r) must be synchronized into their respective clock domains to avoid metastability. Often built into the module.
7. Formal Verification
Dual-clock FIFOs are complex enough to warrant formal verification:
- Properties to verify: No data loss, no overflow, no underflow, empty/full flags correct
- Metastability: Gray code synchronizers meet MTBF requirements
- Pointer arithmetic: Wrap-around behavior correct for arbitrary pointer widths
Tools like Cadence Incisive, Mentor Questa, or open-source ProveRtl can verify these properties automatically.
8. Real-World Examples
USB Interface Example
USB Host Controller (100 MHz) writes packets into a dual-clock FIFO. System Processor (1 GHz) reads and processes packets. Architecture: - FIFO depth: 256 entries (8-bit pointers + 1 bit) - Data width: 32 bits per entry (4 bytes) - Write clock: 100 MHz (USB domain) - Read clock: 1 GHz (processor domain) - Write latency to read domain: ~25 ns (2.5 × 10ns Clock B period) - Throughput: 100 MB/s (100 MHz × 4 bytes) The FIFO buffers burst writes from USB while processor samples at its own rate.
9. Summary Checklist
- ✅ Dual-port RAM for data (no CDC needed on data)
- ✅ Gray code pointers synchronized across clock boundary
- ✅ Two-stage FF synchronizer on Gray pointers
- ✅ Binary conversion of synchronized Gray pointers
- ✅ Empty/full flags generated from synchronized pointers
- ✅ Pointer width = log2(DEPTH) + 1 to distinguish full vs. empty
- ✅ Reset both domains to known state
- ✅ Formal verification of FIFO properties and metastability
- ✅ Test wrap-around: Verify FIFO works correctly when pointers wrap from MAX → 0
- ✅ Timing verification: Setup/hold constraints met for all pointer arithmetic
This completes the core CDC techniques (Days 1-5).
Next (Days 6-15): Advanced topics, testing strategies, industry tools, and production verification.