DFT Day 7 — MBIST: Memory Fault Models, March Algorithms & Repair Analysis

Q: What is the difference between SAF and TF in memory testing?

A Stuck-At Fault (SAF) means a memory cell is permanently stuck at logic 0 or logic 1 — it cannot be written to any other value. A Transition Fault (TF) means the cell can hold either value but cannot make the transition from 0 to 1 (or vice versa). SAF is detected by writing the opposite value and reading it back. TF requires forcing the transition — write 0 then write 1 and read 1, then write 1 then write 0 and read 0. March C- detects both fault types.

Q: What is a coupling fault and why is March C- needed to detect it?

A coupling fault (CF) occurs when writing to one memory cell (the aggressor) disturbs the content of another cell (the victim). In CFin (inversion coupling), the victim cell inverts whenever the aggressor transitions. In CFid (idempotent coupling), the victim is forced to a fixed value. Simple write-read tests miss coupling faults because they only stress one cell at a time. March C- detects coupling faults by writing patterns in both ascending and descending address order, which ensures every aggressor-victim pair is exercised in both coupling directions.

Q: How does March C- work — walk through the steps?

March C- has 6 March elements: (1) Ascending write 0 to all cells. (2) Ascending: read 0, then write 1. (3) Ascending: read 1, then write 0. (4) Descending: read 0, then write 1. (5) Descending: read 1, then write 0. (6) Ascending: read 0. The bidirectional address traversal (up then down) is what exposes coupling faults between pairs of cells. Total complexity is 10N memory operations for an N-cell array.

Q: What is BISR and how does it improve memory yield?

BISR stands for Built-In Self-Repair. Most SRAMs are manufactured with spare rows and columns. MBIST first identifies failing cell addresses. BISR logic then runs a repair analysis algorithm to determine the optimal mapping of spare rows/columns to cover the failing cells. The selected spare elements are activated by programming e-fuses or laser fuses on-chip, redirecting address decoding away from the faulty cells. BISR eliminates the need for an external ATE to compute repair solutions, speeding up test and enabling in-system re-test after deployment.

Q: How many operations does March C- require and what faults does it detect?

March C- requires 10N operations for an N-cell memory array (5 read operations and 5 write operations per cell on average across the 6 March elements). It detects Stuck-At Faults (SAF), Transition Faults (TF), inversion coupling faults (CFin), and idempotent coupling faults (CFid). It does NOT detect state coupling faults (CFst) — for those, March LR (14N operations) is required.

Why Memories Need Special Test

By Day 6 you understood ATPG and fault coverage for logic circuits. But every modern SoC contains dozens — sometimes hundreds — of embedded memories: SRAMs for caches, register files, FIFOs, ROM for boot code. These memories are fundamentally different from random logic, and standard scan ATPG fails to test them properly.

The root issue is that ATPG treats embedded memories as black boxes. Scan can observe inputs and outputs of the memory, but the internal bitcell array (often millions of transistors in a dense 6T or 8T bitcell layout) is invisible to scan chains. The bitcells, wordlines, bitlines, sense amplifiers, and address decoders are all untested.

Memory faults also have a completely different physical nature:

Bitcell faults — a single storage transistor fails to hold charge (stuck-at or transition fault)
Coupling faults — capacitive or resistive coupling between adjacent bitcells causes one cell write to disturb a neighbor
Address decoder faults — a wordline driver selects the wrong row, causing the wrong cell to be accessed
Sense amplifier faults — the differential sense amp reads incorrectly near the threshold

The solution is MBIST — Memory Built-In Self-Test. Rather than relying on external scan chains, a dedicated test controller is synthesized adjacent to each memory instance. This MBIST controller takes over the memory's address, data, and control inputs during test mode, applies a sequence of read/write patterns called a March algorithm, and checks all responses internally. The result is a single pass/fail flag (or a fail address list for repair).

Core MBIST Principle

Embed a self-contained test engine next to each memory. During test, the engine drives all possible addresses with structured read/write sequences designed to expose specific physical fault types. No external tester access to bitcells is required.

Memory Fault Models

Just as logic DFT uses stuck-at and path-delay fault models, memory testing uses a set of fault models that map directly to physical failure mechanisms in the bitcell array:

Stuck-At Fault (SAF)

A bitcell is permanently stuck at 0 (SAF-0) or stuck at 1 (SAF-1) regardless of what is written to it. Caused by oxide breakdown, metal short to power/ground, or a failed access transistor. Detection: write the opposite value and read back — a mismatch indicates SAF.

Transition Fault (TF)

The cell can hold either value when left alone, but it cannot perform the 0→1 transition (TF-1) or the 1→0 transition (TF-0). Caused by a weak write driver or marginal bitline precharge. Detection: force the specific transition and verify the final state.

Coupling Fault (CF)

Writing to an aggressor cell disturbs the content of a victim cell through capacitive or resistive coupling. Three sub-types exist:

CFin (Inversion Coupling): the victim cell inverts whenever the aggressor cell transitions. If aggressor goes 0→1, victim flips from 0 to 1 (or 1 to 0).
CFid (Idempotent Coupling): the victim is forced to a fixed value (0 or 1) whenever the aggressor transitions, regardless of victim's prior value.
CFst (State Coupling): the victim is disturbed only when the aggressor is in a specific state (e.g., aggressor=1 forces victim=0). Hardest to detect — requires March LR.

Address Fault (AF)

The address decoder maps address A to the wrong physical row or column. Two cells may share the same address (aliasing) or a cell may be unreachable. Detected by writing unique patterns per address and verifying that reading the same address returns the written value, not a neighbor's value.

Read Destructive Fault (RDF)

A single read operation destroys the cell's content — the sense amplifier read current discharges the storage node below the switching threshold. Rare in modern designs but exists in aggressive technology nodes. Detection: read a cell and immediately read it again — the second read must return the same value.

Fault Type	Notation	Description	Physical Cause	March Algorithm
Stuck-At Fault	SAF-0 / SAF-1	Cell permanently 0 or 1	Metal short, failed transistor	March C-, March LR
Transition Fault	TF-0 / TF-1	Cell can't transition in one direction	Weak write driver, bitline margin	March C-, March LR
Inversion Coupling	CFin	Aggressor transition inverts victim	Capacitive coupling between adjacent cells	March C-
Idempotent Coupling	CFid	Aggressor transition forces victim to fixed value	Resistive bitline coupling	March C-
State Coupling	CFst	Victim disturbed when aggressor is in state S	Slow decay via coupling, deep subthreshold	March LR only
Address Fault	AF	Wrong physical cell is accessed	Decoder logic fault, wordline routing	March C-
Read Destructive	RDF	Read destroys cell content	Sense amp discharge, node capacitance	Doubled-read test

March Algorithm Basics

A March test is a structured sequence of memory operations that systematically exercises every cell in every address order needed to detect the target fault models. The notation is standardized and compact.

A March test consists of a sequence of March elements. Each March element has the form:

↑(op1, op2, ...) or ↓(op1, op2, ...)

Where:

↑ = ascending address order (address 0, 1, 2, ..., N-1)
↓ = descending address order (address N-1, N-2, ..., 0)
⇕ = either direction (used in some advanced algorithms)

Operations within each March element are applied to every address in the specified order before moving to the next March element:

Operation	Symbol	Meaning
Write 0	w0	Write the value 0 to the current address
Write 1	w1	Write the value 1 to the current address
Read expecting 0	r0	Read the current address; flag error if value ≠ 0
Read expecting 1	r1	Read the current address; flag error if value ≠ 1

The complexity of a March test is measured in the number of memory operations performed, expressed as a multiple of N (the number of cells). Each operation (read or write at one address) counts as 1. A complexity of 10N means 10 operations per cell — a 1 Mbit memory requires 10 million operations.

March C− Algorithm

March C− is the most widely used MBIST algorithm in industry. It was derived from the original March C algorithm by removing one March element that was proven redundant, reducing complexity from 12N to 10N while retaining the same fault coverage. The minus sign (−) in the name denotes this removal.

The complete March C− sequence has 6 March elements:

March C− = { ↑(w0) ; ↑(r0,w1) ; ↑(r1,w0) ; ↓(r0,w1) ; ↓(r1,w0) ; ↑(r0) }

#	March Element	Direction	Operations	Purpose
M0	↑(w0)	Ascending	Write 0	Initialize all cells to 0
M1	↑(r0, w1)	Ascending	Read 0, then Write 1	Verify 0 was stored; set cells to 1. Detects SAF-0 and TF-1
M2	↑(r1, w0)	Ascending	Read 1, then Write 0	Verify 1 was stored; set cells to 0. Detects SAF-1 and TF-0
M3	↓(r0, w1)	Descending	Read 0, then Write 1	Descending r0 detects CFin/CFid from ascending direction
M4	↓(r1, w0)	Descending	Read 1, then Write 0	Descending r1 detects CFin/CFid from ascending direction
M5	↑(r0)	Ascending	Read 0	Final verification all cells hold 0 after M4

Complexity: 10N. Count the operations: M0=1N, M1=2N, M2=2N, M3=2N, M4=2N, M5=1N → total 10N.

Fault coverage of March C−:

SAF (Stuck-At Faults) — Yes
TF (Transition Faults) — Yes
CFin (Inversion Coupling) — Yes
CFid (Idempotent Coupling) — Yes
AF (Address Faults) — Yes (partial)
CFst (State Coupling) — No — need March LR

Why does direction matter? Coupling faults are directional — an aggressor at address k affects victim at address k+1 differently than address k-1. Running March elements in both ascending and descending order ensures every possible aggressor-victim pair is exercised in both coupling directions, giving complete coverage of CFin and CFid.

March C− — Address vs. Value over time (3-cell example: Addr 0, 1, 2)

March LR Algorithm

March LR was proposed to detect state coupling faults (CFst) that March C− cannot catch. A CFst fault only manifests when the aggressor cell is in a specific state — for example, victim is disturbed only when aggressor=1. March C− misses this because it always transitions the aggressor before reading the victim. March LR uses a different traversal pattern that keeps the aggressor static while reading the victim.

March LR = { ↑(w0) ; ↑(r0,w1) ; ↓(r1,w0) ; ↑(r0,w1) ; ↓(r1,w0) ; ↑(r0,w1) ; ↓(r1,w0) ; ↑(r0) }

Complexity: 14N operations. The additional 4N overhead vs March C− buys CFst detection.

Algorithm	Complexity	SAF	TF	CFin	CFid	CFst	AF
March C−	10N	Yes	Yes	Yes	Yes	No	Yes (partial)
March LR	14N	Yes	Yes	Yes	Yes	Yes	Yes
March B	17N	Yes	Yes	Yes	Yes	Yes	Yes (full)
MATS+	5N	Yes	No	No	No	No	No

When to choose March LR over March C−: Use March LR when your technology node or memory compiler characterization data shows CFst faults — common in very dense 6T SRAM bitcells at 7 nm and below where inter-cell coupling is high. For most 28 nm and older designs, March C− is sufficient. The test time cost of 14N vs 10N is roughly 40% longer for the MBIST phase.

MBIST Controller Architecture

An MBIST controller is a dedicated RTL block synthesized alongside the memory. In test mode, it takes over the memory's address and data buses and runs the March algorithm autonomously. A typical MBIST controller consists of four sub-modules:

Address Generator: Counts addresses in ascending or descending order. For March C−, it must support both directions and know when to reverse.
Data Generator / Background Pattern Generator: Generates the write data (w0 or w1) for each March element and the expected read data (r0 or r1) for comparison.
March FSM Controller: Sequences through the March elements, controlling address direction, data pattern, read/write mode, and clock count.
Response Analyzer / Comparator: Compares memory read output against expected data. Records fail addresses for repair analysis.

MBIST Controller Block Diagram

MBIST FSM States

The March FSM inside the MBIST controller steps through these states:

State	Description	Next State
IDLE	Normal functional operation; memory driven by system logic	INITIALIZE (when BIST_EN=1)
INITIALIZE	Set March element index to 0; load first direction and data	MARCH_UP
MARCH_UP	Apply current March element in ascending address order; increment address each cycle	MARCH_DOWN (if next element is ↓), else MARCH_UP
MARCH_DOWN	Apply current March element in descending address order; decrement address each cycle	MARCH_UP or DONE
COMPARE	On each read cycle: compare dout vs expected; log fail address if mismatch	(embedded in MARCH_UP/DOWN)
DONE	All March elements complete; assert pass or fail output; return to IDLE	IDLE

Redundancy and Repair Analysis

A key insight of modern SRAM design is that no memory is manufactured perfect at advanced nodes. To compensate, foundries add spare rows and columns to every SRAM. A 256-row SRAM might have 260 physical rows — 4 are spares. If up to 4 rows have faults, the memory can be repaired by activating spare rows and deactivating the faulty ones.

The repair flow has two phases:

Fault Detection (MBIST): Run the March algorithm. Record all failing (address, bit) pairs in a Fail Address Register (FAR). This gives a map of defective cells.
Repair Analysis: An algorithm (often run on ATE or by BISR logic on-chip) determines the minimum set of spare rows/columns to activate to cover all failing cells. This is a set-cover optimization problem.

BISR — Built-In Self-Repair

Traditional repair analysis runs off-chip on the ATE, which is slow and requires shipping fail data to ATE software. BISR (Built-In Self-Repair) embeds the repair analysis logic on-chip alongside the MBIST controller. After MBIST completes, the BISR block:

Reads the fail address log from the FAR
Runs the repair algorithm to compute the optimal spare row/column mapping
Programs the e-fuse (electrically programmable fuse) array that controls the redundancy row/column decoders
Verifies the repair by re-running MBIST — pass/fail is checked again

E-fuse programming is a one-time operation using a high-current pulse. After programming, the SRAM's address decoder permanently redirects accesses to faulty rows through the spare rows. The fuses are read at power-up to reconstruct the repair mapping.

BISR Yield Impact

Adding redundancy can increase memory yield from 60% to 95%+ at advanced nodes. BISR automates this recovery without requiring individual chip testing on expensive ATE systems, enabling high-volume yield improvement at low marginal cost.

MBIST Integration in SoC

A modern SoC may contain 50 to 500 separate embedded memories. Instantiating one MBIST controller per memory would consume enormous area. Modern MBIST integration uses a hierarchical approach:

IEEE P1500 Wrapper

IEEE P1500 (CTLW — Core Test Language Wrapper) defines a standard interface for wrapping IP blocks — including memories — so they can be accessed via a common test infrastructure. The P1500 wrapper provides: a serial scan interface (WSI/WSO), a parallel interface for fast data access, and standardized control signals (ShiftWR, CaptureWR, UpdateWR). MBIST controllers connect to the P1500 wrapper to receive test enable signals and report results.

Hierarchical MBIST

A top-level MBIST controller fans out test signals to multiple sub-controllers. Memories can be tested in parallel (faster, but more power) or in series (lower power, longer time). A JTAG interface at the top of the hierarchy allows the test engineer to select which memories to test, observe fail addresses, and trigger repair.

Metric	MBIST	Scan ATPG (for logic)
Memory fault coverage	SAF, TF, CF, AF — 95%+	None (memory treated as black box)
Logic fault coverage	Not applicable	99%+ with compression
Test time	10N–14N clock cycles per memory	Depends on chain length and patterns
ATE requirements	Low — only pass/fail pin needed	High — requires scan pin bandwidth
Area overhead	3–8% of memory area per MBIST controller	5–15% flip-flop replacement overhead
Repair support	Yes — integrates with BISR	No
IEEE standard	P1500, IEEE 1687 (IJTAG)	IEEE 1149.1 (JTAG access)

Frequently Asked Questions

What is the difference between SAF and TF in memory testing?

A Stuck-At Fault (SAF) means a memory cell is permanently stuck at logic 0 or 1 — no write operation can change it. A Transition Fault (TF) means the cell can hold either value but cannot perform the transition in one specific direction: either 0→1 is impossible (TF-1) or 1→0 is impossible (TF-0). SAF is detected by writing the opposite value and reading back. TF is detected by March C−'s r0,w1 and r1,w0 sequences, which force the cell through both transitions and verify the result.

What is a coupling fault and why is March C- needed to detect it?

A coupling fault (CF) occurs when writing to one cell (aggressor) disturbs the content of an adjacent cell (victim) through capacitive or resistive coupling on the bitline. Simple write-read tests miss CFs because they only stress one cell at a time and don't exercise the inter-cell coupling. March C− detects coupling faults by: (1) running operations in both ascending and descending address order, ensuring that for every aggressor-victim pair, the aggressor transitions while the victim's value is being checked; (2) the specific r0,w1 / r1,w0 sequences create the exact stimulus needed to expose an inversion or idempotent coupling fault.

How does March C- work — walk through the steps?

March C− has 6 steps for an N-cell memory: Step 1 — ascending w0: write 0 to every cell (initialize). Step 2 — ascending r0,w1: starting at address 0, read (expect 0), then write 1; proceed to next address. This detects SAF-0. Step 3 — ascending r1,w0: read (expect 1) then write 0. Detects SAF-1, TF-0. Step 4 — descending r0,w1: same as step 2 but high-to-low address. The direction change exposes CFin/CFid faults. Step 5 — descending r1,w0: read (expect 1) then write 0, descending. Step 6 — ascending r0: verify all cells are 0. Total: 10N operations. A mismatch in any read step flags a fault at that address.

What is BISR and how does it improve memory yield?

BISR (Built-In Self-Repair) is an on-chip logic block that works alongside MBIST to automate the memory repair process. After MBIST detects failing cell addresses, BISR runs a repair analysis algorithm to determine the optimal mapping of spare rows and columns to cover the defective cells. It then programs on-chip e-fuses or laser fuses to permanently redirect the address decoder away from faulty cells to spare cells. BISR eliminates the need to send fail data to external ATE software, reduces test time, and enables in-system repair after deployment. It can raise memory yield from 60% to 95%+ at advanced nodes by recovering chips that would otherwise be discarded.

How many operations does March C- require and what faults does it detect?

March C− requires 10N operations for an N-cell memory (where one operation = one read or one write at one address). The count: M0=1N, M1=2N, M2=2N, M3=2N, M4=2N, M5=1N — total 10N. It detects: SAF (stuck-at-0 and stuck-at-1), TF (both transition directions), CFin (inversion coupling), CFid (idempotent coupling), and AF (address faults, partially). It does NOT detect CFst (state coupling faults) — for those you need March LR at 14N complexity.

← Day 6: Compression & EDT Day 8: JTAG →

MBIST — Memory Built-In Self-TestFault Models · March Algorithms · Repair