CDC / Reliability

The Forbidden Zone
Inside Your Flip-Flop

Metastability is the ghost in every chip — a signal that is neither 0 nor 1, hovering at mid-rail, silently poisoning downstream logic. Here is what it actually looks like in silicon.

Setup / Hold Violation Forbidden Zone Live Oscilloscope 5-FF CDC Chain MTBF Calculator
Physics

The Bistable Energy Landscape

A D flip-flop has two stable energy states: State '0' and State '1'. Between them sits an unstable equilibrium — the metastable point. A setup/hold violation kicks the FF to this peak. Thermal noise eventually pushes it to one side, but the time it takes is random.

State '0' Stable State '1' Stable Metastable Peak (unstable equilibrium — any noise resolves it) noise noise Energy ↑

Fig 1 — Bistable potential well. A setup/hold violation kicks the ball to the metastable peak. Thermal noise resolves it to State 0 or State 1, but resolution time is unbounded.

Silicon View

What It Looks Like on an Oscilloscope

On a high-bandwidth probe at a CDC boundary, you see FF1_Q hovering near VDD/2 — the Forbidden Zone — before snapping to a rail. Every other channel stays clean. This is metastability in real silicon.

OSCILLOSCOPE — 5 channels, live simulation
⚠ METASTABLE EVENT
CLK_A fast domain clock
CLK_B slow domain clock
DATA async from CLK_A domain
FF1_Q CDC reg — goes METASTABLE (red glow + forbidden zone)
FF2_Q sync output — always clean
The red band on FF1_Q is the Forbidden Zone (VIL to VIH, ~0.3–0.7 × VDD). CMOS has no valid logic interpretation here. Both PMOS and NMOS are partially on — causing high current, heat, and unpredictable gate behavior downstream.
Timing

The Setup and Hold Window — Where Metastability Is Born

Data must be stable for tsu before and th after the active clock edge. Any transition inside this window violates the FF's timing contract and triggers metastability.

CLK_B CLK edge DATA ✓ SAFE setup met ✗ META violation! ✓ SAFE hold met t_su t_h Safe zone Forbidden window (t_su + t_h) — any data change here = metastability

Fig 2 — Setup and hold timing window. Data must not change within t_su before or t_h after the clock edge.

Live Animation

5 Flip-Flop CDC Chain — Watch It Happen

FF1 sits at the CDC boundary and may go metastable. FF2 and FF3 are synchronizer stages — by the time FF2 samples FF1, FF1 has had one full CLK_B period to resolve. FF4 and FF5 always see clean data.

Initialising… CLK_B: 0
CLK_A
Async src
0
CLK_B Domain (Destination)
FF1
CDC Register
0
↑ CLK_B
FF2
Sync Stage 1
0
↑ CLK_B
FF3
Sync Stage 2
0
↑ CLK_B
FF4
Pipeline Reg
0
↑ CLK_B
FF5
Output Reg
0
↑ CLK_B
Sampling (CLK_B edge)
Metastable — output undefined
Resolved — clean logic
Why FF2 is safe: FF2 samples FF1 exactly one CLK_B period later. At 200 MHz (5 ns period) with τ = 30 ps, the probability of FF1 still being metastable is e−5000/30 ≈ 10−72. This is smaller than the probability of a cosmic ray flipping a bit — so FF2 always sees a clean value.
Danger

Why Metastability Destroys Chips — Even After It Resolves

The real danger is not the mid-rail voltage itself — it is what happens to downstream logic before the FF resolves.

⚡ Fanout Disagreement

FF1_Q at 0.55 V feeds three gates. Gate A (NAND) sees it as '1'. Gate B (NOR) sees it as '0'. Same wire, opposite interpretations — impossible logic state that no designer intended.

🔥 Crowbar Current

A CMOS gate with a mid-rail input has both PMOS and NMOS partially conducting simultaneously — a direct VDD-to-GND path. In chips with millions of gates this causes localised heat and electromigration failure.

🔒 FSM Lockup

A metastable value reaching an FSM state register can push the machine into a state no valid binary encoding represents. The FSM deadlocks with no exit condition — a reboot is the only recovery.

🛰 Real Disasters

The SOHO satellite (1998) lost attitude control due to unprotected CDC crossings. Therac-25 radiation overdoses involved race conditions creating impossible states. Metastability is not academic — it kills chips in production.

Math

MTBF Calculator — How Safe Is Your Design?

Metastability cannot be eliminated, only made statistically improbable. Adding a second synchronizer stage can change MTBF from milliseconds to longer than the age of the universe.

Estimated MTBF
Solution

The 2-FF Synchronizer — The Industry Standard Fix

Two flip-flops in series, both clocked by the destination clock. FF1 can go metastable — FF2 samples a resolved output one clock period later. For aerospace / automotive ISO 26262, use 3 stages.

// 2-FF synchronizer — correct implementation
module sync_2ff #(parameter W = 1) (
  input  logic        clk_dst,
  input  logic        rst_n,
  input  logic [W-1:0] d_async,
  output logic [W-1:0] d_sync
);
  // dont_touch prevents the tool from merging or moving FFs
  (* dont_touch = "true" *) logic [W-1:0] ff1_q, ff2_q;

  always_ff @(posedge clk_dst or negedge rst_n)
    if (!rst_n) {ff1_q, ff2_q} <= '0;
    else begin
      ff1_q <= d_async; // may go metastable — intentional
      ff2_q <= ff1_q;   // samples resolved ff1 one period later
    end

  assign d_sync = ff2_q;
endmodule

// Required SDC — tells STA this is a sync path, not a logic path
set_max_delay -datapath_only 2.0 \
  -from [get_cells sync_inst/ff1_q_reg*] \
  -to   [get_cells sync_inst/ff2_q_reg*]
The SDC constraint is not optional. Without set_max_delay -datapath_only, STA may flag the FF1→FF2 path as a timing violation and let the tool optimize it — possibly placing them far apart on the die, defeating the synchronizer entirely.
FAQ

Frequently Asked Questions

Metastability occurs when a flip-flop samples its data input within the setup+hold window around the clock edge. The output enters an analog state near VDD/2 — neither valid 0 nor 1. It eventually resolves, but the resolution time is random and theoretically unbounded. If downstream logic samples before resolution, different gates can interpret the mid-rail voltage as opposite values, corrupting design state.

Any data transition inside the setup+hold window around the active clock edge. This is most common at clock domain crossings (CDC) where asynchronous signals from one clock domain are sampled by another. Because the two clocks share no fixed phase relationship, the data edge will inevitably land inside the forbidden window given enough time.

FF1 may go metastable. The metastable state decays exponentially with time constant τ (~30 ps in 28 nm CMOS). FF2 samples FF1 one full clock period later. After 10 ns (100 MHz clock), the probability of FF1 still being metastable is e^(−10000/30) ≈ 10^(−145) — far smaller than any real concern. FF2 outputs a clean resolved value with overwhelming probability.

No. STA verifies timing within a clock domain. CDC paths without proper constraints appear as false paths. STA cannot check synchronizer architecture or metastability probability. You need dedicated CDC tools — Synopsys SpyGlass CDC, Cadence JasperGold CDC, or Mentor Questa CDC — to verify that every async crossing has a correct synchronizer and that multi-bit signals use Gray coding or handshake protocols.

A binary N-bit counter changes multiple bits simultaneously on each increment. The destination domain may sample some bits in the old state and some in the new, creating intermediate values that were never valid counts. The solutions: Gray coding for counters (only 1 bit changes per step — worst case is off by one, not garbage), or handshake protocols (req/ack across a FIFO) for arbitrary data buses.