Home DFT Course Day 7 — MBIST
← Day 6 Day 7 of 12 — MBIST Day 8 →
DFT Course · Day 07 of 12

MBIST — Memory Built-In Self-Test
Fault Models · March Algorithms · Repair

By EcrioniX · Updated June 2026 · ~50 min read
SAF / TF / CF Coupling Faults March C- March LR MBIST Controller Repair Analysis BISR IEEE P1500

Why Memories Need Special Test

By Day 6 you understood ATPG and fault coverage for logic circuits. But every modern SoC contains dozens — sometimes hundreds — of embedded memories: SRAMs for caches, register files, FIFOs, ROM for boot code. These memories are fundamentally different from random logic, and standard scan ATPG fails to test them properly.

The root issue is that ATPG treats embedded memories as black boxes. Scan can observe inputs and outputs of the memory, but the internal bitcell array (often millions of transistors in a dense 6T or 8T bitcell layout) is invisible to scan chains. The bitcells, wordlines, bitlines, sense amplifiers, and address decoders are all untested.

Memory faults also have a completely different physical nature:

The solution is MBIST — Memory Built-In Self-Test. Rather than relying on external scan chains, a dedicated test controller is synthesized adjacent to each memory instance. This MBIST controller takes over the memory's address, data, and control inputs during test mode, applies a sequence of read/write patterns called a March algorithm, and checks all responses internally. The result is a single pass/fail flag (or a fail address list for repair).

Core MBIST Principle

Embed a self-contained test engine next to each memory. During test, the engine drives all possible addresses with structured read/write sequences designed to expose specific physical fault types. No external tester access to bitcells is required.

Memory Fault Models

Just as logic DFT uses stuck-at and path-delay fault models, memory testing uses a set of fault models that map directly to physical failure mechanisms in the bitcell array:

Stuck-At Fault (SAF)

A bitcell is permanently stuck at 0 (SAF-0) or stuck at 1 (SAF-1) regardless of what is written to it. Caused by oxide breakdown, metal short to power/ground, or a failed access transistor. Detection: write the opposite value and read back — a mismatch indicates SAF.

Transition Fault (TF)

The cell can hold either value when left alone, but it cannot perform the 0→1 transition (TF-1) or the 1→0 transition (TF-0). Caused by a weak write driver or marginal bitline precharge. Detection: force the specific transition and verify the final state.

Coupling Fault (CF)

Writing to an aggressor cell disturbs the content of a victim cell through capacitive or resistive coupling. Three sub-types exist:

Address Fault (AF)

The address decoder maps address A to the wrong physical row or column. Two cells may share the same address (aliasing) or a cell may be unreachable. Detected by writing unique patterns per address and verifying that reading the same address returns the written value, not a neighbor's value.

Read Destructive Fault (RDF)

A single read operation destroys the cell's content — the sense amplifier read current discharges the storage node below the switching threshold. Rare in modern designs but exists in aggressive technology nodes. Detection: read a cell and immediately read it again — the second read must return the same value.

Fault TypeNotationDescriptionPhysical CauseMarch Algorithm
Stuck-At FaultSAF-0 / SAF-1Cell permanently 0 or 1Metal short, failed transistorMarch C-, March LR
Transition FaultTF-0 / TF-1Cell can't transition in one directionWeak write driver, bitline marginMarch C-, March LR
Inversion CouplingCFinAggressor transition inverts victimCapacitive coupling between adjacent cellsMarch C-
Idempotent CouplingCFidAggressor transition forces victim to fixed valueResistive bitline couplingMarch C-
State CouplingCFstVictim disturbed when aggressor is in state SSlow decay via coupling, deep subthresholdMarch LR only
Address FaultAFWrong physical cell is accessedDecoder logic fault, wordline routingMarch C-
Read DestructiveRDFRead destroys cell contentSense amp discharge, node capacitanceDoubled-read test

March Algorithm Basics

A March test is a structured sequence of memory operations that systematically exercises every cell in every address order needed to detect the target fault models. The notation is standardized and compact.

A March test consists of a sequence of March elements. Each March element has the form:

↑(op1, op2, ...)    or    ↓(op1, op2, ...)

Where:

Operations within each March element are applied to every address in the specified order before moving to the next March element:

OperationSymbolMeaning
Write 0w0Write the value 0 to the current address
Write 1w1Write the value 1 to the current address
Read expecting 0r0Read the current address; flag error if value ≠ 0
Read expecting 1r1Read the current address; flag error if value ≠ 1

The complexity of a March test is measured in the number of memory operations performed, expressed as a multiple of N (the number of cells). Each operation (read or write at one address) counts as 1. A complexity of 10N means 10 operations per cell — a 1 Mbit memory requires 10 million operations.

March C− Algorithm

March C− is the most widely used MBIST algorithm in industry. It was derived from the original March C algorithm by removing one March element that was proven redundant, reducing complexity from 12N to 10N while retaining the same fault coverage. The minus sign (−) in the name denotes this removal.

The complete March C− sequence has 6 March elements:

March C− = { ↑(w0) ; ↑(r0,w1) ; ↑(r1,w0) ; ↓(r0,w1) ; ↓(r1,w0) ; ↑(r0) }
#March ElementDirectionOperationsPurpose
M0↑(w0)AscendingWrite 0Initialize all cells to 0
M1↑(r0, w1)AscendingRead 0, then Write 1Verify 0 was stored; set cells to 1. Detects SAF-0 and TF-1
M2↑(r1, w0)AscendingRead 1, then Write 0Verify 1 was stored; set cells to 0. Detects SAF-1 and TF-0
M3↓(r0, w1)DescendingRead 0, then Write 1Descending r0 detects CFin/CFid from ascending direction
M4↓(r1, w0)DescendingRead 1, then Write 0Descending r1 detects CFin/CFid from ascending direction
M5↑(r0)AscendingRead 0Final verification all cells hold 0 after M4

Complexity: 10N. Count the operations: M0=1N, M1=2N, M2=2N, M3=2N, M4=2N, M5=1N → total 10N.

Fault coverage of March C−:

Why does direction matter? Coupling faults are directional — an aggressor at address k affects victim at address k+1 differently than address k-1. Running March elements in both ascending and descending order ensures every possible aggressor-victim pair is exercised in both coupling directions, giving complete coverage of CFin and CFid.
March C− — Address vs. Value over time (3-cell example: Addr 0, 1, 2)
M0 ↑(w0) M1 ↑(r0,w1) M2 ↑(r1,w0) M3 ↓(r0,w1) M4 ↓(r1,w0) M5 ↑(r0) Addr 0 Addr 1 Addr 2 w0 w0 w0 r0 w1 r0 w1 r0 w1 r1 w0 r1 w0 r1 w0 r0 w1 r0 w1 r0 w1 r1 w0 r1 w0 r1 w0 r0 r0 r0 write read check write 1

March LR Algorithm

March LR was proposed to detect state coupling faults (CFst) that March C− cannot catch. A CFst fault only manifests when the aggressor cell is in a specific state — for example, victim is disturbed only when aggressor=1. March C− misses this because it always transitions the aggressor before reading the victim. March LR uses a different traversal pattern that keeps the aggressor static while reading the victim.

March LR = { ↑(w0) ; ↑(r0,w1) ; ↓(r1,w0) ; ↑(r0,w1) ; ↓(r1,w0) ; ↑(r0,w1) ; ↓(r1,w0) ; ↑(r0) }

Complexity: 14N operations. The additional 4N overhead vs March C− buys CFst detection.

AlgorithmComplexitySAFTFCFinCFidCFstAF
March C−10NYesYesYesYesNoYes (partial)
March LR14NYesYesYesYesYesYes
March B17NYesYesYesYesYesYes (full)
MATS+5NYesNoNoNoNoNo

When to choose March LR over March C−: Use March LR when your technology node or memory compiler characterization data shows CFst faults — common in very dense 6T SRAM bitcells at 7 nm and below where inter-cell coupling is high. For most 28 nm and older designs, March C− is sufficient. The test time cost of 14N vs 10N is roughly 40% longer for the MBIST phase.

MBIST Controller Architecture

An MBIST controller is a dedicated RTL block synthesized alongside the memory. In test mode, it takes over the memory's address and data buses and runs the March algorithm autonomously. A typical MBIST controller consists of four sub-modules:

MBIST Controller Block Diagram
MBIST CONTROLLER March FSM Address Generator Data / Pattern Gen Response Analyzer MUX 0 1 Functional BIST_EN (select) Memory Under Test (SRAM / ROM) addr/din Comparator / Fail Logger dout expected data Pass/Fail fail addresses (for repair)

MBIST FSM States

The March FSM inside the MBIST controller steps through these states:

StateDescriptionNext State
IDLENormal functional operation; memory driven by system logicINITIALIZE (when BIST_EN=1)
INITIALIZESet March element index to 0; load first direction and dataMARCH_UP
MARCH_UPApply current March element in ascending address order; increment address each cycleMARCH_DOWN (if next element is ↓), else MARCH_UP
MARCH_DOWNApply current March element in descending address order; decrement address each cycleMARCH_UP or DONE
COMPAREOn each read cycle: compare dout vs expected; log fail address if mismatch(embedded in MARCH_UP/DOWN)
DONEAll March elements complete; assert pass or fail output; return to IDLEIDLE

Redundancy and Repair Analysis

A key insight of modern SRAM design is that no memory is manufactured perfect at advanced nodes. To compensate, foundries add spare rows and columns to every SRAM. A 256-row SRAM might have 260 physical rows — 4 are spares. If up to 4 rows have faults, the memory can be repaired by activating spare rows and deactivating the faulty ones.

The repair flow has two phases:

  1. Fault Detection (MBIST): Run the March algorithm. Record all failing (address, bit) pairs in a Fail Address Register (FAR). This gives a map of defective cells.
  2. Repair Analysis: An algorithm (often run on ATE or by BISR logic on-chip) determines the minimum set of spare rows/columns to activate to cover all failing cells. This is a set-cover optimization problem.

BISR — Built-In Self-Repair

Traditional repair analysis runs off-chip on the ATE, which is slow and requires shipping fail data to ATE software. BISR (Built-In Self-Repair) embeds the repair analysis logic on-chip alongside the MBIST controller. After MBIST completes, the BISR block:

  1. Reads the fail address log from the FAR
  2. Runs the repair algorithm to compute the optimal spare row/column mapping
  3. Programs the e-fuse (electrically programmable fuse) array that controls the redundancy row/column decoders
  4. Verifies the repair by re-running MBIST — pass/fail is checked again

E-fuse programming is a one-time operation using a high-current pulse. After programming, the SRAM's address decoder permanently redirects accesses to faulty rows through the spare rows. The fuses are read at power-up to reconstruct the repair mapping.

BISR Yield Impact

Adding redundancy can increase memory yield from 60% to 95%+ at advanced nodes. BISR automates this recovery without requiring individual chip testing on expensive ATE systems, enabling high-volume yield improvement at low marginal cost.

MBIST Integration in SoC

A modern SoC may contain 50 to 500 separate embedded memories. Instantiating one MBIST controller per memory would consume enormous area. Modern MBIST integration uses a hierarchical approach:

IEEE P1500 Wrapper

IEEE P1500 (CTLW — Core Test Language Wrapper) defines a standard interface for wrapping IP blocks — including memories — so they can be accessed via a common test infrastructure. The P1500 wrapper provides: a serial scan interface (WSI/WSO), a parallel interface for fast data access, and standardized control signals (ShiftWR, CaptureWR, UpdateWR). MBIST controllers connect to the P1500 wrapper to receive test enable signals and report results.

Hierarchical MBIST

A top-level MBIST controller fans out test signals to multiple sub-controllers. Memories can be tested in parallel (faster, but more power) or in series (lower power, longer time). A JTAG interface at the top of the hierarchy allows the test engineer to select which memories to test, observe fail addresses, and trigger repair.

MetricMBISTScan ATPG (for logic)
Memory fault coverageSAF, TF, CF, AF — 95%+None (memory treated as black box)
Logic fault coverageNot applicable99%+ with compression
Test time10N–14N clock cycles per memoryDepends on chain length and patterns
ATE requirementsLow — only pass/fail pin neededHigh — requires scan pin bandwidth
Area overhead3–8% of memory area per MBIST controller5–15% flip-flop replacement overhead
Repair supportYes — integrates with BISRNo
IEEE standardP1500, IEEE 1687 (IJTAG)IEEE 1149.1 (JTAG access)

Frequently Asked Questions

What is the difference between SAF and TF in memory testing?

A Stuck-At Fault (SAF) means a memory cell is permanently stuck at logic 0 or 1 — no write operation can change it. A Transition Fault (TF) means the cell can hold either value but cannot perform the transition in one specific direction: either 0→1 is impossible (TF-1) or 1→0 is impossible (TF-0). SAF is detected by writing the opposite value and reading back. TF is detected by March C−'s r0,w1 and r1,w0 sequences, which force the cell through both transitions and verify the result.

What is a coupling fault and why is March C- needed to detect it?

A coupling fault (CF) occurs when writing to one cell (aggressor) disturbs the content of an adjacent cell (victim) through capacitive or resistive coupling on the bitline. Simple write-read tests miss CFs because they only stress one cell at a time and don't exercise the inter-cell coupling. March C− detects coupling faults by: (1) running operations in both ascending and descending address order, ensuring that for every aggressor-victim pair, the aggressor transitions while the victim's value is being checked; (2) the specific r0,w1 / r1,w0 sequences create the exact stimulus needed to expose an inversion or idempotent coupling fault.

How does March C- work — walk through the steps?

March C− has 6 steps for an N-cell memory: Step 1 — ascending w0: write 0 to every cell (initialize). Step 2 — ascending r0,w1: starting at address 0, read (expect 0), then write 1; proceed to next address. This detects SAF-0. Step 3 — ascending r1,w0: read (expect 1) then write 0. Detects SAF-1, TF-0. Step 4 — descending r0,w1: same as step 2 but high-to-low address. The direction change exposes CFin/CFid faults. Step 5 — descending r1,w0: read (expect 1) then write 0, descending. Step 6 — ascending r0: verify all cells are 0. Total: 10N operations. A mismatch in any read step flags a fault at that address.

What is BISR and how does it improve memory yield?

BISR (Built-In Self-Repair) is an on-chip logic block that works alongside MBIST to automate the memory repair process. After MBIST detects failing cell addresses, BISR runs a repair analysis algorithm to determine the optimal mapping of spare rows and columns to cover the defective cells. It then programs on-chip e-fuses or laser fuses to permanently redirect the address decoder away from faulty cells to spare cells. BISR eliminates the need to send fail data to external ATE software, reduces test time, and enables in-system repair after deployment. It can raise memory yield from 60% to 95%+ at advanced nodes by recovering chips that would otherwise be discarded.

How many operations does March C- require and what faults does it detect?

March C− requires 10N operations for an N-cell memory (where one operation = one read or one write at one address). The count: M0=1N, M1=2N, M2=2N, M3=2N, M4=2N, M5=1N — total 10N. It detects: SAF (stuck-at-0 and stuck-at-1), TF (both transition directions), CFin (inversion coupling), CFid (idempotent coupling), and AF (address faults, partially). It does NOT detect CFst (state coupling faults) — for those you need March LR at 14N complexity.

← Day 6: Compression & EDT Day 8: JTAG →