HomeCDCDay 5 Enhanced

Dual-Clock FIFO

The industrial-strength pattern for moving data across clock domains. Architecture, Gray code pointer synchronization, empty/full flag generation, formal verification, complete RTL design, and production verification.

By EcrioniX · Published June 13, 2026 · ~4900 words · 15 min read

1. Overview: Why Dual-Clock FIFO?

Days 1-4 covered individual synchronization techniques:

Now we integrate all of these into the most common real-world CDC pattern: the Dual-Clock FIFO.

A dual-clock FIFO solves this practical problem: One clock domain produces data, another consumes it, and they're not synchronized.

This is used in virtually every chip with multiple clock domains that need to exchange data: SoCs, network interfaces, memory controllers, accelerators.

2. Architecture Overview

Dual-Clock FIFO Architecture: Write Domain (Clock A) | Read Domain (Clock B) (Producer) | (Consumer) write_data ─→ ┌─────────────┐ | write_en ──→ │ FIFO │ │ ┌──────────┐ write_clk ──→ │ Memory ├─────→│ read_data│ │ (Dual │ │ └──────────┘ write_ptr ──→ │ Port │ │ full ←─────┐ │ RAM) │ │ ← read_en, read_clk, read_ptr │ └─────────────┘ │ │ ↑ │ │ Synchronization │ │ (Gray Code + │ Empty └── Dual-FF) ──→ Full Flag in Clock B ↓ in Clock A (Pointers Cross via CDC) Key insight: Data memory is dual-port RAM (no CDC needed). Only POINTERS need CDC (via Gray code synchronizers).

3. Write and Read Pointers

Pointer Design

Both write and read pointers are counters (incrementing on each write/read). For a 256-entry FIFO, pointers are 8-bit counters that wrap around:

Empty/Full Flag Detection

To determine if the FIFO is empty or full, we need to compare pointers across clock domains:

Empty and Full Detection Logic: In Clock B Domain: write_ptr_gray = bin2gray(write_ptr_clk_a) write_ptr_sync_b = (synchronized via Gray CDC) write_ptr_b = gray2bin(write_ptr_sync_b) empty_b = (read_ptr_b == write_ptr_b) In Clock A Domain: read_ptr_gray = bin2gray(read_ptr_clk_b) read_ptr_sync_a = (synchronized via Gray CDC) read_ptr_a = gray2bin(read_ptr_sync_a) full_a = (write_ptr_a == (read_ptr_a - 1)) // Note: -1 because write_ptr at read_ptr means FULL // (all entries between read and write are occupied)

4. Full RTL Implementation

module dual_clock_fifo #(
  parameter DATA_WIDTH = 32,
  parameter DEPTH = 256,
  parameter ADDR_WIDTH = 8  // log2(DEPTH)
) (
  // Write domain
  input                  clk_w,
  input                  rst_w,
  input [DATA_WIDTH-1:0] write_data,
  input                  write_en,
  output                 full,

  // Read domain
  input                  clk_r,
  input                  rst_r,
  output [DATA_WIDTH-1:0] read_data,
  input                  read_en,
  output                 empty
);

  // ============ Write Domain (Clock W) ============
  reg [ADDR_WIDTH:0] write_ptr_w, write_ptr_w_next;
  reg [ADDR_WIDTH:0] read_ptr_sync_w;  // Sync'd read ptr
  reg [ADDR_WIDTH-1:0] read_ptr_w_gray, read_ptr_w_gray_ff1;

  // Dual-port RAM
  reg [DATA_WIDTH-1:0] fifo_mem [0:DEPTH-1];

  // Write pointer logic
  always @(posedge clk_w or negedge rst_w) begin
    if (!rst_w) begin
      write_ptr_w <= {ADDR_WIDTH+1{1'b0}};
    end else if (write_en && !full) begin
      write_ptr_w <= write_ptr_w + 1;
    end
  end

  // RAM write
  always @(posedge clk_w) begin
    if (write_en && !full) begin
      fifo_mem[write_ptr_w[ADDR_WIDTH-1:0]] <= write_data;
    end
  end

  // Convert read ptr to Gray and synchronize to write domain
  // (Two-stage synchronizer for read_ptr_gray)
  always @(posedge clk_w or negedge rst_w) begin
    if (!rst_w) begin
      read_ptr_w_gray_ff1 <= {ADDR_WIDTH{1'b0}};
      read_ptr_w_gray <= {ADDR_WIDTH{1'b0}};
    end else begin
      read_ptr_w_gray_ff1 <= read_ptr_gray;  // FF1
      read_ptr_w_gray <= read_ptr_w_gray_ff1;  // FF2
    end
  end

  // Convert synced Gray read ptr back to binary
  wire [ADDR_WIDTH:0] read_ptr_w_bin;
  gray_to_binary #(.WIDTH(ADDR_WIDTH)) g2b_w (
    .gray(read_ptr_w_gray),
    .binary(read_ptr_w_bin)
  );

  // Full flag: write_ptr == read_ptr means FULL
  // (all entries occupied)
  assign full = (write_ptr_w == read_ptr_w_bin);

  // ============ Read Domain (Clock R) ============
  reg [ADDR_WIDTH:0] read_ptr_r, read_ptr_r_next;
  wire [ADDR_WIDTH-1:0] write_ptr_r_gray, write_ptr_r_gray_ff1;
  reg [ADDR_WIDTH-1:0] write_ptr_r_gray_sync_ff1, write_ptr_r_gray_sync;

  // Read pointer logic
  always @(posedge clk_r or negedge rst_r) begin
    if (!rst_r) begin
      read_ptr_r <= {ADDR_WIDTH+1{1'b0}};
    end else if (read_en && !empty) begin
      read_ptr_r <= read_ptr_r + 1;
    end
  end

  // RAM read
  assign read_data = fifo_mem[read_ptr_r[ADDR_WIDTH-1:0]];

  // Convert write ptr to Gray and synchronize to read domain
  binary_to_gray #(.WIDTH(ADDR_WIDTH)) b2g_r (
    .binary(write_ptr_w[ADDR_WIDTH-1:0]),
    .gray(write_ptr_r_gray)
  );

  always @(posedge clk_r or negedge rst_r) begin
    if (!rst_r) begin
      write_ptr_r_gray_sync_ff1 <= {ADDR_WIDTH{1'b0}};
      write_ptr_r_gray_sync <= {ADDR_WIDTH{1'b0}};
    end else begin
      write_ptr_r_gray_sync_ff1 <= write_ptr_r_gray;  // FF1
      write_ptr_r_gray_sync <= write_ptr_r_gray_sync_ff1;  // FF2
    end
  end

  // Convert synced Gray write ptr back to binary
  wire [ADDR_WIDTH:0] write_ptr_r_bin;
  gray_to_binary #(.WIDTH(ADDR_WIDTH)) g2b_r (
    .gray(write_ptr_r_gray_sync),
    .binary(write_ptr_r_bin)
  );

  // Empty flag: read_ptr == write_ptr means EMPTY
  assign empty = (read_ptr_r == write_ptr_r_bin);

endmodule

5. Timing and Synchronization Latency

Write-to-Read Latency

When data is written to the FIFO:

  1. Data written to RAM (Clock A) — visible immediately in RAM
  2. Write pointer incremented (Clock A)
  3. Write pointer converted to Gray (Clock A) — combinational
  4. Gray write pointer synchronized to Clock B — 2 Clock B cycles
  5. Synchronized Gray pointer converted back to binary (Clock B) — combinational
  6. Read logic can now see empty=0 — ~2.5 Clock B cycles after the write

Total latency: ~2.5 Clock B cycles from write to read visibility. This is acceptable for most applications.

Full Flag Latency

Similar analysis: Write domain sees full flag updated ~2.5 Clock A cycles after a read occurs.

During this window, the write domain might write data thinking FIFO is not full, but a read is happening in the other domain. The FIFO control logic must be designed to handle this gracefully (no data loss).

6. Practical Design Considerations

Synchronizer Placement

Key rule: Gray code pointers must be synchronized before converting back to binary.

The incorrect version loses the benefits of Gray code—binary can change all bits, losing the single-bit property during CDC.

Pointer Width

Pointers are typically (log2(DEPTH) + 1) bits:

The extra bit is critical for distinguishing full vs. empty (both have read_ptr == write_ptr without the extra bit).

Reset Synchronization

Both reset signals (rst_w, rst_r) must be synchronized into their respective clock domains to avoid metastability. Often built into the module.

7. Formal Verification

Dual-clock FIFOs are complex enough to warrant formal verification:

Tools like Cadence Incisive, Mentor Questa, or open-source ProveRtl can verify these properties automatically.

8. Real-World Examples

USB Interface Example

USB Host Controller (100 MHz) writes packets into a dual-clock FIFO. System Processor (1 GHz) reads and processes packets. Architecture: - FIFO depth: 256 entries (8-bit pointers + 1 bit) - Data width: 32 bits per entry (4 bytes) - Write clock: 100 MHz (USB domain) - Read clock: 1 GHz (processor domain) - Write latency to read domain: ~25 ns (2.5 × 10ns Clock B period) - Throughput: 100 MB/s (100 MHz × 4 bytes) The FIFO buffers burst writes from USB while processor samples at its own rate.

9. Summary Checklist

This completes the core CDC techniques (Days 1-5).

Next (Days 6-15): Advanced topics, testing strategies, industry tools, and production verification.