VLSI — CDC Deep Dive

CDC Pulse Problem: Your Pulse Got Lost. Here's Why — and How to Fix It.

A one-cycle pulse in the source domain looks fine in simulation. In silicon, it vanishes at the clock domain boundary — silently, with no errors. This is one of the most common CDC bugs in real chips. Three proven fixes, complete Verilog code.

Toggle Synchronizer
Pulse Stretcher
REQ/ACK Handshake
Verilog Code Included

Why a Pulse Disappears at a Clock Domain Boundary

A standard 2-FF synchronizer is designed for stable level signals — a signal that stays asserted for many destination clock cycles. It gives the first flip-flop a full clock period to resolve metastability, and the second flip-flop then samples the settled output.

A short pulse breaks this assumption. If the pulse is only 1–2 source clock cycles wide, the destination clock may never have a rising edge inside the pulse window. The flip-flop never sees the signal — the pulse is gone. Worse, if the destination clock is faster than the source, the pulse might be sampled twice, generating two events where only one existed.

Pulse Too Short

Source pulse = 1 cycle = 8 ns. Destination period = 25 ns. The destination clock edge may land entirely outside the 8 ns window every single time.

Random Alignment

The two clocks are asynchronous — their phase relationship drifts continuously. A pulse that gets captured today may be missed tomorrow under different temperature or voltage.

RTL Simulation Hides It

Standard simulation uses ideal timing: zero skew, perfect edges. Metastability never fires and the pulse always lands exactly on an edge. The bug only surfaces in silicon.

No Functional Error Message

The destination logic simply never fires. No assertion fails, no X propagates. The design silently does nothing, and the bug can be traced only by observing missing behavior on the bench.

The dangerous assumption: "My pulse is 3 source cycles wide — that's enough for the destination to catch it." Not guaranteed. With a 2:1 clock ratio, the destination may have its edge at cycle 1 (too early), cycle 4 (too late), or sometimes cycle 2 (captured). The hit rate depends entirely on the asynchronous phase alignment at runtime.

Fix 1 — The Toggle Synchronizer (Recommended for All Pulse CDC)

The toggle synchronizer is the industry-standard solution for passing pulses across asynchronous clock domains. It works by converting a transient pulse event into a persistent level change that the destination can safely sample.

How It Works

In the source domain, a flip-flop toggles its output every time a pulse arrives. Once toggled, the signal stays at its new level — it doesn't go back. The destination 2-FF synchronizer then captures this stable level. The destination detects the level change (edge detect via XOR) to reconstruct the original one-shot pulse.

Timing Diagram — Toggle Synchronizer
Verilog — Toggle Synchronizer (Complete Module)
module toggle_sync (
    input  wire clk_src,
    input  wire clk_dst,
    input  wire rst_n,        // async active-low reset (or use separate resets)
    input  wire pulse_src,    // 1-cycle pulse in clk_src domain
    output wire pulse_dst     // 1-cycle pulse in clk_dst domain
);

    // ── Source domain: toggle FF ────────────────────────────────────
    reg toggle_s;
    always @(posedge clk_src or negedge rst_n)
        if (!rst_n) toggle_s <= 1'b0;
        else if (pulse_src) toggle_s <= ~toggle_s;

    // ── Destination domain: 3-stage sync + edge detect ─────────────
    // 3 stages: FF1 + FF2 for metastability, FF3 for edge detection
    (* ASYNC_REG = "TRUE" *)
    reg [2:0] sync_d;
    always @(posedge clk_dst or negedge rst_n)
        if (!rst_n) sync_d <= 3'b000;
        else        sync_d <= {sync_d[1:0], toggle_s};

    // Edge detect: XOR of stages 2 and 3 = 1 for exactly one cycle
    assign pulse_dst = sync_d[2] ^ sync_d[1];

endmodule
Why this can't miss a pulse: The toggle stays asserted (high or low — it's now a level, not a pulse) until the next event. The 2-FF sync chain has unlimited time to capture it. Whether the destination clock edge arrives 1 ns or 100 µs after the toggle, it will eventually sample the new level and propagate it through the chain.

Key Constraints

Fix 2 — Pulse Stretcher (Simple, But Only When You Control the Ratio)

A pulse stretcher extends the source pulse to N source clock cycles, wide enough for the destination clock to reliably sample at least once. It is simpler to understand but more fragile — it only works when the source clock is significantly faster and the frequency ratio is known at design time.

Rule of thumb: After stretching, the signal must remain asserted for at least 3 destination clock cycles. This means:

STRETCH (in source cycles) ≥ 3 × ceil(T_dst / T_src) = 3 × ceil(f_src / f_dst)

Example: f_src = 400 MHz, f_dst = 100 MHz → ratio = 4 → STRETCH ≥ 12 source cycles.
Verilog — Pulse Stretcher
module pulse_stretch #(
    parameter STRETCH = 8     // extend to N source clock cycles
)(
    input  wire clk_src,
    input  wire rst_n,
    input  wire pulse_in,     // original 1-cycle pulse
    output wire stretched     // stretched level (feed into 2-FF sync)
);
    localparam W = $clog2(STRETCH + 1);
    reg [W-1:0] cnt;

    always @(posedge clk_src or negedge rst_n)
        if      (!rst_n)    cnt <= '0;
        else if (pulse_in)  cnt <= STRETCH[W-1:0]; // reload on new pulse
        else if (|cnt)      cnt <= cnt - 1'b1;

    assign stretched = |cnt;

endmodule

// ── Usage: stretched output → standard 2-FF sync → edge detect ─────
// sync_2ff u_sync (.clk_dst(clk_dst), .rst_n(rst_n),
//                  .async_in(stretched), .sync_out(sync_level));
//
// reg sync_prev;
// always @(posedge clk_dst or negedge rst_n)
//     if (!rst_n) sync_prev <= 0;
//     else        sync_prev <= sync_level;
// assign pulse_dst = sync_level && !sync_prev;  // rising edge detect
When pulse stretching fails: If the destination clock frequency changes (PLL reconfiguration, low-power mode), or if your STRETCH count is set at the minimum with no margin, stretched pulses can still be missed. The toggle synchronizer has no such frequency dependency — prefer it when in doubt.

Fix 3 — REQ/ACK Handshake (When You Need Data + the Pulse Together)

The REQ/ACK handshake is the right answer when a pulse represents a write event carrying data — for example, a register write, a command trigger, or a DMA request. It ensures both the event and the associated data are transferred atomically and reliably.

The source holds data stable on a shared bus and asserts req. The destination picks up the data once req is synchronized, asserts ack, and the source releases the bus only after receiving the synchronized ack. This four-phase handshake survives any clock frequency relationship.

Verilog — REQ/ACK Handshake (Source + Destination)
// ── SOURCE DOMAIN ──────────────────────────────────────────────────
module req_ack_src #(parameter DW = 8) (
    input  wire          clk_src, rst_n,
    input  wire          send,        // 1-cycle trigger from local logic
    input  wire [DW-1:0] data_in,
    output reg           req,
    output reg  [DW-1:0] tx_data,
    input  wire          ack_sync     // ACK synchronized back from destination
);
    always @(posedge clk_src or negedge rst_n) begin
        if (!rst_n) begin req <= 1'b0; tx_data <= '0; end
        else if (send && !req && !ack_sync) begin  // idle: accept new transfer
            tx_data <= data_in;
            req     <= 1'b1;
        end else if (ack_sync && req) begin          // ACK received: release
            req <= 1'b0;
        end
    end
endmodule

// ── DESTINATION DOMAIN ─────────────────────────────────────────────
module req_ack_dst #(parameter DW = 8) (
    input  wire          clk_dst, rst_n,
    input  wire          req_sync,    // REQ synchronized from source
    input  wire [DW-1:0] tx_data,     // stable data from source domain
    output reg           ack,
    output reg  [DW-1:0] rx_data
);
    reg req_prev;
    wire req_rise = req_sync && !req_prev;  // rising edge of synced REQ

    always @(posedge clk_dst or negedge rst_n) begin
        if (!rst_n) begin ack <= 1'b0; req_prev <= 1'b0; rx_data <= '0; end
        else begin
            req_prev <= req_sync;
            if (req_rise) begin     // new transfer detected
                rx_data <= tx_data;
                ack     <= 1'b1;
            end else if (!req_sync) begin  // REQ deasserted: release ACK
                ack <= 1'b0;
            end
        end
    end
endmodule

// ── TOP LEVEL: wire the two domains together with 2-FF syncs ───────
// sync_2ff u_req_sync (.clk_dst(clk_dst), .async_in(req), .sync_out(req_sync));
// sync_2ff u_ack_sync (.clk_dst(clk_src), .async_in(ack), .sync_out(ack_sync));
Throughput: One handshake cycle takes approximately 4–6 destination clock cycles (2 for REQ to propagate, 1 to process, 2 for ACK to return). For configuration registers and control commands, this latency is acceptable. For high-bandwidth streaming, use an async FIFO instead.

Which Fix to Use — Side-by-Side Comparison

Method Min Pulse Width Carries Data? Frequency-Ratio Aware? Burst Safe? Best For
Toggle Sync 1 src cycle No Yes (any ratio) No (≥3 dst cycles gap) Single event signals: interrupts, triggers, enables
Pulse Stretcher Must be ≥3 dst cycles No No (ratio must be known) No Fast→slow crossing where ratio is fixed at design time
REQ/ACK 1 src cycle Yes Yes (any ratio) No (one at a time) Register writes, command transfers, control + data pairs
Async FIFO 1 src cycle Yes Yes (any ratio) Yes (up to depth) Streaming data, burst events, throughput-critical paths

5 Common CDC Pulse Mistakes That Slip Into Real Chips

01

Putting a 1-cycle pulse directly through a 2-FF synchronizer

The 2-FF sync is designed for stable levels. A short pulse may be missed entirely, captured once, or captured twice depending on clock alignment. RTL simulation always shows correct behavior because it uses deterministic timing.

Fix: Use toggle synchronizer or stretch the pulse first
02

Setting STRETCH too small in a pulse stretcher

Engineers often calculate the stretch for the nominal clock ratio and add no margin. If the destination clock is at the slow end of its tolerance or the source is at the fast end, the pulse can still be missed. Set STRETCH with at least 2× safety margin over the calculated minimum.

Fix: STRETCH ≥ ceil(3 × f_src / f_dst) + margin
03

Sending back-to-back pulses into a toggle synchronizer without gap

If two pulses arrive before the first toggle propagates through the destination chain, the toggle register returns to its original value. The destination sees no events. This manifests as intermittently missing events under load — notoriously hard to reproduce.

Fix: Enforce a minimum inter-pulse gap of 3 dst clock cycles, or use async FIFO for bursts
04

Forgetting ASYNC_REG on the toggle sync chain

Without the ASYNC_REG attribute (or ASIC equivalent), the P&R tool may place the synchronizer flip-flops far apart with significant routing delay between them. This reduces the time available for metastability resolution, degrading MTBF by orders of magnitude.

Fix: Always annotate (* ASYNC_REG = "TRUE" *) on synchronizer chains
05

Resetting the toggle FF in only one domain during reset sequencing

If the source domain toggle FF is reset but the destination sync chain is not (or vice versa), the destination will see a spurious edge on reset release. This causes a phantom pulse event at startup, which may incorrectly trigger downstream logic before the system is ready.

Fix: Reset all FFs in both domains, or sequence resets so destination releases after source

CDC Pulse FAQ — Questions Engineers Actually Search For

A pulse shorter than one destination clock period may never be sampled — the destination clock edge arrives either before or after the pulse window. Even a 3-cycle source pulse can be invisible if the destination clock is slower or happens to be aligned badly. The 2-FF synchronizer, designed for stable levels, offers no guarantee for short pulses. RTL simulation hides this because it models ideal timing with no asynchronous phase drift.
A toggle synchronizer converts a one-shot pulse into a level change by toggling a flip-flop in the source domain. Because the toggle persists, the destination 2-FF sync chain has unlimited time to capture it regardless of clock frequencies. The destination XORs consecutive sync stages to detect the edge and reconstruct the pulse. It requires no knowledge of clock ratios, works at any frequency relationship, and has no minimum pulse width requirement — making it the most robust solution.
If two pulses arrive before the first toggle has propagated through the destination chain (less than ~3 destination clock cycles apart), the two toggles cancel each other and the destination sees no event. This is the key limitation of toggle synchronizers. For burst events or back-to-back pulses faster than 3 destination clock cycles, use an asynchronous FIFO, which can store multiple pending events.
A pulse stretcher only makes sense when the source clock is faster than the destination. If the destination is faster, the 1-cycle source pulse is already wider than several destination clock periods — it would be sampled correctly without stretching. In this case, use a standard 2-FF synchronizer directly. If the original pulse is just 1 cycle, a toggle synchronizer is still safer because it doesn't depend on any frequency assumption.
Tools like SpyGlass CDC, Questa CDC, and JasperGold analyze the RTL structurally. They flag any signal that: (1) originates in one clock domain, (2) transitions in fewer cycles than required for the destination to reliably sample it, and (3) is not passed through a recognized synchronization structure (toggle FF + 2-FF chain, or handshake). Pulse width violations typically appear as CDC-13 or similar rule violations in SpyGlass. RTL simulation will never catch these — static analysis is mandatory for signoff.
The minimum gap between consecutive source pulses is 3 destination clock cycles. This is the time needed for the first toggle to propagate through FF1 (resolves metastability), FF2 (samples settled value), and FF3 (the edge detect stage), so that pulse_dst fires before the second toggle arrives and corrupts the chain. In practice, add margin: allow 4–5 destination cycles between pulses. If you cannot guarantee this gap, use an async FIFO with sufficient depth to absorb burst events.

Continue Learning

Related CDC & Synchronization Topics

CDC Complete Guide
2-FF synchronizer, Gray code, async FIFO, MTBF, and CDC verification — the full reference for clock domain crossing in VLSI.
Async FIFO Design
When you need to pass burst events or streaming data across clock domains — full Verilog implementation with Gray-coded pointers.
Metastability Analysis
The physics behind why flip-flops go metastable, MTBF calculations, and how synchronizer stages exponentially improve reliability.