Metastability is the ghost in every chip — a signal that is neither 0 nor 1, hovering at mid-rail, silently poisoning downstream logic. Here is what it actually looks like in silicon.
A D flip-flop has two stable energy states: State '0' and State '1'. Between them sits an unstable equilibrium — the metastable point. A setup/hold violation kicks the FF to this peak. Thermal noise eventually pushes it to one side, but the time it takes is random.
Fig 1 — Bistable potential well. A setup/hold violation kicks the ball to the metastable peak. Thermal noise resolves it to State 0 or State 1, but resolution time is unbounded.
On a high-bandwidth probe at a CDC boundary, you see FF1_Q hovering near VDD/2 — the Forbidden Zone — before snapping to a rail. Every other channel stays clean. This is metastability in real silicon.
Data must be stable for tsu before and th after the active clock edge. Any transition inside this window violates the FF's timing contract and triggers metastability.
Fig 2 — Setup and hold timing window. Data must not change within t_su before or t_h after the clock edge.
FF1 sits at the CDC boundary and may go metastable. FF2 and FF3 are synchronizer stages — by the time FF2 samples FF1, FF1 has had one full CLK_B period to resolve. FF4 and FF5 always see clean data.
The real danger is not the mid-rail voltage itself — it is what happens to downstream logic before the FF resolves.
FF1_Q at 0.55 V feeds three gates. Gate A (NAND) sees it as '1'. Gate B (NOR) sees it as '0'. Same wire, opposite interpretations — impossible logic state that no designer intended.
A CMOS gate with a mid-rail input has both PMOS and NMOS partially conducting simultaneously — a direct VDD-to-GND path. In chips with millions of gates this causes localised heat and electromigration failure.
A metastable value reaching an FSM state register can push the machine into a state no valid binary encoding represents. The FSM deadlocks with no exit condition — a reboot is the only recovery.
The SOHO satellite (1998) lost attitude control due to unprotected CDC crossings. Therac-25 radiation overdoses involved race conditions creating impossible states. Metastability is not academic — it kills chips in production.
Metastability cannot be eliminated, only made statistically improbable. Adding a second synchronizer stage can change MTBF from milliseconds to longer than the age of the universe.
Two flip-flops in series, both clocked by the destination clock. FF1 can go metastable — FF2 samples a resolved output one clock period later. For aerospace / automotive ISO 26262, use 3 stages.
// 2-FF synchronizer — correct implementation module sync_2ff #(parameter W = 1) ( input logic clk_dst, input logic rst_n, input logic [W-1:0] d_async, output logic [W-1:0] d_sync ); // dont_touch prevents the tool from merging or moving FFs (* dont_touch = "true" *) logic [W-1:0] ff1_q, ff2_q; always_ff @(posedge clk_dst or negedge rst_n) if (!rst_n) {ff1_q, ff2_q} <= '0; else begin ff1_q <= d_async; // may go metastable — intentional ff2_q <= ff1_q; // samples resolved ff1 one period later end assign d_sync = ff2_q; endmodule // Required SDC — tells STA this is a sync path, not a logic path set_max_delay -datapath_only 2.0 \ -from [get_cells sync_inst/ff1_q_reg*] \ -to [get_cells sync_inst/ff2_q_reg*]
set_max_delay -datapath_only, STA may flag the FF1→FF2 path as a timing violation and let the tool optimize it — possibly placing them far apart on the die, defeating the synchronizer entirely.Metastability occurs when a flip-flop samples its data input within the setup+hold window around the clock edge. The output enters an analog state near VDD/2 — neither valid 0 nor 1. It eventually resolves, but the resolution time is random and theoretically unbounded. If downstream logic samples before resolution, different gates can interpret the mid-rail voltage as opposite values, corrupting design state.
Any data transition inside the setup+hold window around the active clock edge. This is most common at clock domain crossings (CDC) where asynchronous signals from one clock domain are sampled by another. Because the two clocks share no fixed phase relationship, the data edge will inevitably land inside the forbidden window given enough time.
FF1 may go metastable. The metastable state decays exponentially with time constant τ (~30 ps in 28 nm CMOS). FF2 samples FF1 one full clock period later. After 10 ns (100 MHz clock), the probability of FF1 still being metastable is e^(−10000/30) ≈ 10^(−145) — far smaller than any real concern. FF2 outputs a clean resolved value with overwhelming probability.
No. STA verifies timing within a clock domain. CDC paths without proper constraints appear as false paths. STA cannot check synchronizer architecture or metastability probability. You need dedicated CDC tools — Synopsys SpyGlass CDC, Cadence JasperGold CDC, or Mentor Questa CDC — to verify that every async crossing has a correct synchronizer and that multi-bit signals use Gray coding or handshake protocols.
A binary N-bit counter changes multiple bits simultaneously on each increment. The destination domain may sample some bits in the old state and some in the new, creating intermediate values that were never valid counts. The solutions: Gray coding for counters (only 1 bit changes per step — worst case is off by one, not garbage), or handshake protocols (req/ack across a FIFO) for arbitrary data buses.