What is ATPG?
In Days 1–3 we established what faults look like (stuck-at, transition, bridging), how scan chains make every flip-flop controllable and observable, and how EDT compression reduces the number of test cycles needed. Now comes the central question: how do we compute the actual test patterns?
Automatic Test Pattern Generation (ATPG) is the process of algorithmically computing a set of binary input patterns such that, when applied to the chip, each pattern detects one or more faults from the fault list. The ATPG tool's goal is to maximize fault coverage (the fraction of modeled faults detected) while minimizing pattern count (because more patterns = more test time = more cost).
ATPG works at the gate level on the synthesized netlist. For each undetected fault in the fault list, the algorithm searches for a primary input assignment (and, via scan chains, a flip-flop state assignment) that:
- Activates the fault — drives the faulty net to the opposite of its stuck value
- Propagates the resulting error — creates a path from the fault site to an observable output (primary output or scan chain output)
- Justifies the assignment — ensures all required gate input values are consistently achievable from primary inputs
A chip with 100,000 gates has millions of possible fault sites. Computing test patterns manually is impossible. ATPG tools (Tessent, TetraMAX, FastScan) run on the gate-level netlist and generate 99%+ coverage in minutes to hours, depending on design complexity. The resulting patterns are loaded onto an ATE (Automatic Test Equipment) which applies them at-speed to every manufactured die.
D-Notation — The Language of ATPG
Roth's 1966 paper introduced the D-calculus, a five-valued logic system that allows ATPG algorithms to reason simultaneously about fault-free and faulty circuit behavior. The five values are:
| Symbol | Fault-Free Value | Faulty Circuit Value | Meaning |
|---|---|---|---|
| 0 | 0 | 0 | Normal logic 0 (same in both circuits) |
| 1 | 1 | 1 | Normal logic 1 (same in both circuits) |
| X | unassigned | unassigned | Don't-care / not yet assigned |
| D | 1 | 0 | Error: fault-free=1, faulty=0 — detectable difference |
| D̄ | 0 | 1 | Error: fault-free=0, faulty=1 — detectable difference |
D and D-bar (D with a bar) represent the error signal — a point in the circuit where the fault-free and faulty circuits produce different values. The ATPG goal is to propagate D or D-bar to a primary output. Once D reaches a primary output, the ATE can observe the difference: the good chip outputs 1 while the bad chip outputs 0 (for D), or vice versa (for D-bar).
D-Calculus Through an AND Gate
Consider a 2-input AND gate. If input A has a stuck-at-0 (SA0) fault, and we want to detect it:
| Input A (fault site) | Input B | AND output (fault-free) | AND output (faulty, A=SA0) | D-notation output |
|---|---|---|---|---|
| D (1→0 due to SA0) | 1 | 1 | 0 | D — fault detected! |
| D (1→0 due to SA0) | 0 | 0 | 0 | 0 — error masked by B=0 |
| 0 | X | 0 | 0 | 0 — fault not activated |
The key rule for propagating D through an AND gate: all non-D inputs must be 1 (the controlling-value complement for AND). Setting B=1 allows the D on input A to pass through to the output as D. This is called sensitizing the gate to the fault.
The D-Algorithm
The D-Algorithm (Roth, 1966) was the first systematic ATPG procedure. It operates on the gate-level circuit graph and proceeds in three phases:
Worked Example — 3-Gate Circuit
Consider this simple circuit: A AND B → G1 → G1 OR C → G2 → output Z. Suppose we want to detect a SA0 fault on the output of G1 (the AND gate).
Step-by-step trace: (1) Activate: set A=1, B=1 so G1's fault-free output = 1, but SA0 forces it to 0. D is placed on G1's output. (2) Propagate: G1.out → G2.input. For the OR gate, we need C=0 (the non-controlling value for OR) so D passes through. G2 output = D. (3) Justify: A=1 (PI), B=1 (PI), C=0 (PI) — all directly set. No conflict. Test vector: {A=1, B=1, C=0}. Good chip output Z=1; faulty chip Z=0. Fault detected.
PODEM — Path-Oriented Decision Making
The D-algorithm was groundbreaking but had a critical scalability problem: it backtracks at internal circuit nodes. When D-propagation fails at a gate deep in the circuit, the algorithm must try all possible gate input assignments — an exponential search in the worst case. For industrial circuits with hundreds of thousands of gates, this backtracking explosion made D-algorithm impractical.
PODEM (Prabhu Goel, 1981) solved this with a key insight: all backtracking decisions should be made at primary inputs, not internal nodes. This transforms the search space from exponential in circuit depth to linear in number of primary inputs.
PODEM Algorithm Structure
PODEM's backtrack function operates only on primary inputs. This means the search tree has depth equal to the number of primary inputs (say 100), not circuit depth (potentially 1000+ gates). The number of nodes in the search tree is at most 2^100 in theory, but PODEM's objective function is effective enough that in practice it finds patterns in very few backtrack steps. Industrial benchmarks show 10–100x speedup over D-algorithm on real circuits.
FAN — Fanout-Oriented ATPG
FAN (Fanout-oriented ATPG, Fujiwara & Shimono, 1983) is the next major improvement after PODEM, adding three key ideas that make it significantly faster for circuits with high fanout:
1. Multiple-Backtrace Procedure
Instead of making one PI assignment at a time (PODEM), FAN propagates requirements through the circuit simultaneously from multiple gates, resolving which PI to set without committing prematurely. This avoids unnecessary backtracks caused by choosing a PI that doesn't help the current goal.
2. Head Lines
FAN identifies head lines — the first line on each unique sensitization path from the fault site to primary outputs. Head lines act as decision points. FAN assigns values at head lines first, then works backward to primary inputs. This focuses the search on the paths most relevant to fault propagation.
3. Unique Sensitization
If there is only one possible path from D to a primary output (unique sensitization), FAN commits to that path immediately without exploration — eliminating backtracking for that branch entirely. This is particularly powerful in deep pipeline designs where most paths are unique.
| Algorithm | Year | Backtrack Point | Key Innovation | Relative Speed |
|---|---|---|---|---|
| D-Algorithm | 1966 | Internal nodes | D-calculus, first systematic ATPG | 1x (baseline) |
| PODEM | 1981 | Primary inputs only | PI-level backtracking, objective function | 10–100x faster |
| FAN | 1983 | Head lines + PIs | Multiple backtrace, unique sensitization | 100–1000x faster |
Modern commercial ATPG tools (Tessent, TetraMAX) use variants of FAN with additional optimizations: dominance-based pruning, redundancy identification during ATPG, and concurrent fault simulation to detect multiple faults per pattern.
Fault Simulation
Fault simulation is the process of simulating the circuit with each fault injected and determining whether a given test pattern detects that fault. It is distinct from ATPG: ATPG generates patterns targeting specific faults; fault simulation evaluates a pattern set against the entire fault list.
Parallel Fault Simulation
Simulating one fault at a time would be impractical — a circuit with 100,000 nets has 200,000 stuck-at fault sites. Parallel fault simulation exploits the bit-parallel nature of computer arithmetic: simulate N faults simultaneously by using N-bit words. Each bit position represents one fault. For each gate operation, the simulator computes the N-bit word representing the output for all N faults simultaneously using one 64-bit (or 256-bit SIMD) operation.
Fault Dropping
Fault dropping is the key optimization that makes fault simulation feasible. Once a fault is detected by any pattern, it is removed from the active fault list. Subsequent patterns only simulate against the remaining undetected faults. Since early patterns tend to detect many faults, the active fault list shrinks rapidly. A typical simulation run detects 80% of faults with the first 100 patterns and spends most of its time on the last 1% of hard-to-detect faults.
These are different concepts. Fault dropping removes a fault from simulation once detected — it's a runtime optimization. Fault collapsing reduces the initial fault list by identifying faults that are equivalent (have identical detection conditions) or dominated (detecting one guarantees detecting the other). Fault collapsing is done once before ATPG; fault dropping happens dynamically during simulation.
Untestable Faults
Not all faults in the fault list can be detected by any test pattern. Untestable faults (also called ATPG-redundant or structurally redundant faults) are those for which no test pattern exists because the circuit's logic structure makes it impossible to simultaneously:
- Activate the fault (drive the faulty net to the opposite value), AND
- Propagate the resulting error to any primary output
This often happens in circuits with intentional redundancy — logic added to prevent hazards or glitches. For example, a glitch-suppression AND gate that duplicates another logic path creates a situation where the fault on the duplicate path can never be observed independently. Intentional redundancy must be documented in the DFT sign-off report.
| Fault Classification | Definition | In Coverage Denominator? | Action |
|---|---|---|---|
| Detected | At least one pattern detects it | Yes (numerator) | Count toward coverage |
| Undetected | No pattern found yet (time limit, tool limit) | Yes (denominator) | Reduce coverage; investigate |
| ATPG-Redundant | Provably untestable — no pattern possible | Excluded (or both) | Document; review for unintentional redundancy |
| Aborted | ATPG gave up within time/backtrack limit | Yes (denominator) | Same as undetected; tune ATPG effort |
The distinction matters for coverage calculation. Some tools include ATPG-redundant faults in the denominator and separately report them; others exclude them. Industry comparisons between coverage numbers must account for this — "99% coverage (tool A)" may not equal "99% coverage (tool B)" if one excludes redundant faults and the other doesn't.
Tessent / TetraMAX ATPG Flow
A production ATPG run in Siemens Tessent (or Synopsys TetraMAX) follows these steps. The output is a set of test patterns in STIL (Standard Test Interface Language) or WGL format, ready to be loaded onto the ATE.
# ── Step 1: Setup Tessent ATPG context ────────────────────── set_context patterns -scan # ── Step 2: Read scan-inserted netlist (from Day 3 flow) ──── read_verilog top_chip_scan.v read_cell_library {tsmc7nm_tt.lib tsmc7nm_scan.lib} read_sdc top_chip_scan.sdc set_current_design top_chip # ── Step 3: Set ATPG constraints ──────────────────────────── add_clock 0 clk -period 2.0 add_scan_enable scan_en set_fault_type stuck ;# or: transition, path_delay # ── Step 4: Build the fault list ──────────────────────────── create_patterns -auto ;# auto generates + simulates set_atpg_limits -backtrack_limit 50 set_atpg_limits -coverage_target 99.0 # ── Step 5: Run ATPG ──────────────────────────────────────── run_atpg -auto # ── Step 6: Report coverage ───────────────────────────────── report_statistics report_fault_coverage -class {detected undetected redundant} # ── Step 7: Write patterns in STIL format for ATE ─────────── write_patterns top_chip_atpg.stil -format stil write_patterns top_chip_atpg.wgl -format wgl # ── Transition fault run (at-speed) ───────────────────────── set_fault_type transition run_atpg -auto write_patterns top_chip_trans.stil -format stil
The report_statistics output will show the number of detected, undetected, and ATPG-redundant faults, along with pattern count and CPU time. A well-tuned ATPG run on a 5M-gate block typically generates 3,000–8,000 stuck-at patterns achieving 99.2–99.8% coverage in 1–4 hours of compute time.
Coverage vs Pattern Count — The Last-Mile Problem
Fault coverage doesn't grow linearly with pattern count. Early patterns detect many faults (easy targets). As coverage increases, the remaining undetected faults are progressively harder — requiring more specific input conditions, longer propagation paths, or more backtracking. This creates the last-mile problem: the final 1% of coverage costs as many patterns as the first 98%.
| Fault Coverage Target | Approx. Pattern Count | Incremental Patterns Needed | Marginal DPPM Reduction |
|---|---|---|---|
| 90% | ~200 patterns | 200 | High |
| 95% | ~500 patterns | 300 | High |
| 99% | ~1,500 patterns | 1,000 | Moderate |
| 99.5% | ~3,000 patterns | 1,500 | Low |
| 99.9% | ~15,000 patterns | 12,000 | Very Low |
| 99.99% | ~80,000+ patterns | 65,000 | Diminishing |
In practice, DFT engineers set a coverage target (e.g., 99.0% stuck-at) and accept the residual DPPM risk associated with the undetected 1%. The ATPG tool's -coverage_target parameter stops generation once the target is met, avoiding unnecessary pattern explosion. Additional techniques for hard-to-detect faults include: increasing the backtrack limit, running a dedicated "top-up" ATPG pass with higher effort, or using dynamic compaction to squeeze multiple fault detections into each pattern.
A single test pattern often detects dozens or hundreds of faults simultaneously. Dynamic compaction fills the "don't care" (X) bits of a generated pattern — bits not needed for the target fault — with assignments that detect additional faults. This is why the pattern count in the table above is far less than the fault count. Without compaction, you would need one pattern per fault.