Home DFT Course Day 1 — Introduction & Fault Models
DFT Course · Day 01 of 12

Introduction to DFT
Why Test? Fault Models & Coverage

By EcrioniX · Updated June 2026 · ~40 min read
Fault vs Defect Stuck-at-0/1 Transition Fault Bridging Fault Path Delay Controllability Observability Fault Coverage

Why Do Chips Need Testing?

A chip that is correctly designed can still be incorrectly manufactured. Semiconductor fabrication involves hundreds of process steps — photolithography, deposition, etching, chemical-mechanical polishing, implantation — each with its own variability. A particle of contamination landing on a wafer, a lithography focus error, a metal via that didn't fill completely — any of these can create a defect that turns a functional design into a broken chip.

The numbers are sobering. Even at mature process nodes with well-controlled fabs, typical die yields are 60–90%. At leading-edge nodes (3nm, 2nm), yields for large dies start significantly lower and improve over time. Every chip that ships with a manufacturing defect is a potential field return, a safety issue, or a recall.

Testing is the last gate before a chip reaches the customer. The Automatic Test Equipment (ATE) — a machine costing $1–5 million — applies thousands of test vectors to every chip, measures the outputs, and decides: pass or fail. The set of test vectors is generated by ATPG (Automatic Test Pattern Generation) tools, and the probability that a bad chip slips through is determined by the fault coverage of those vectors.

The Core DFT Problem

Modern chips have billions of transistors. You cannot apply enough random test vectors to thoroughly exercise all of them in a reasonable time. DFT solves this by adding special test structures (scan chains, BIST, JTAG) that make the chip's internal state directly accessible — converting an intractable test problem into a manageable one.

Defect vs Fault vs Error vs Failure

DFT engineers use these four terms precisely. Confusing them is a common interview mistake.

TermDefinitionLayerExample
DefectPhysical imperfection from manufacturingPhysicalMetal particle bridging two wires; unfilled via
FaultLogical abstraction of a defect's circuit effectLogicalNet stuck-at-1 due to short to VDD
ErrorIncorrect logic value produced at a nodeLogicalOutput of gate = 1 when it should be 0
FailureObservable incorrect system behaviourSystemChip produces wrong output; system crashes

A defect always causes a fault (in the fault model). A fault causes an error only when the faulty net is exercised (driven to the opposite of its stuck value). An error causes a failure only when it propagates to an observable output. A defect can be present without causing a field failure — if it happens to be on a rarely-used path or if the error is masked by other logic. This is why fault coverage targets are not 100% — some faults are structurally impossible to observe (untestable faults).

Defect → Fault → Error → Failure chain
Defect Physical layer models as Fault Logical abstraction activates Error Wrong logic value propagates to Failure Observable at output

Fault Models

Because directly modeling billions of physical defect sites is intractable, DFT uses fault models — simplified logical abstractions that correlate well with real manufacturing defects. Different fault models detect different classes of physical defects.

Model 1
Stuck-at Fault (SAF)
A net is permanently fixed at logic 0 (SA0) or logic 1 (SA1), regardless of what the driving gate outputs. SA1 models a short to VDD; SA0 models a short to GND or an open metal line (which pulls low). The most widely used fault model — industry requires >99% SA coverage for production test.
net A stuck-at-1: even if driver outputs 0, A reads 1
Model 2
Transition Fault (TF)
A net fails to transition within the required time — slow-to-rise (STR) or slow-to-fall (STF). Models resistive defects: a partially-formed via with high resistance allows a 0→1 transition but too slowly, causing setup-time violations at speed. Caught by at-speed testing. Target: >95% transition fault coverage.
net A STR: 0→1 transition takes 2× longer than spec
Model 3
Bridging Fault (BF)
Two or more nets are unintentionally shorted together. The result depends on the logic driving both nets — could cause wired-AND or wired-OR behaviour (or intermediate voltage in CMOS). Bridging faults model metal shorts between adjacent wires, which become more common at advanced nodes with tighter pitch.
net A and net B shorted → A = B = A AND B (wired-AND)
Model 4
Path Delay Fault (PDF)
The total propagation delay along a specific logic path exceeds the clock period, causing a setup-time violation. Models distributed delay degradation along long paths. Path delay faults are expensive to test (combinatorial explosion of paths) — typically only critical/longest paths are targeted.
8-gate critical path: delay = 1.05× Tclk → setup fail
Stuck-at-1 fault on AND gate input — fault activation and propagation
GOOD CIRCUIT A = 1 B = 0 AND Y = 0 A AND B = 1 AND 0 = 0 ✓ SA1 FAULT ON INPUT A A = 0 B = 1 SA1 AND Y = 1 ✗ A is forced to 1 → 1 AND 1 = 1 fault activated & propagated to output

Controllability and Observability

These are the two fundamental DFT concepts. Together they determine whether a fault can be tested.

Controllability

Controllability is the ability to set a specific logic value on an internal net by applying signals at the chip's primary inputs (or scan inputs). A net buried deep inside sequential logic has low controllability — you may need to clock through many flip-flops to reach a specific state at that net.

To activate a stuck-at-0 fault on net N, you must be able to drive N to logic 1 (so the SA0 fault causes an error — N reads 0 when it should be 1). If you can't controllably set N = 1, the fault is untestable.

Observability

Observability is the ability to propagate the value on an internal net to a primary output where it can be measured by the tester. Deep internal logic has low observability — an error must propagate through many gate stages to reach an output.

The Scan Solution

Scan chains solve both problems simultaneously. In scan mode, every flip-flop in the design is connected as a shift register. To set a flip-flop's value: shift in the value directly (100% controllability). To read a flip-flop's value: shift it out to the scan output (100% observability). This is why scan insertion is the most fundamental DFT technique — it turns every register into a directly accessible test point.

Fault Coverage Metrics

Fault Coverage (FC) = (Detected Faults / Total Testable Faults) × 100% Total Testable Faults = Total Faults − Untestable Faults (AU + TI + BU + RE) ATPG Fault Classifications: DT — Detected (test pattern exercises and detects the fault) PT — Possibly Detected (X-state propagation may detect; depends on tester) AU — ATPG Untestable (structurally impossible to test) TI — Tied (net is tied to constant 0 or 1, fault model doesn't apply) BU — Blocked Untestable (fault path blocked by sequential depth / bus) RE — Redundant (equivalent to a detected fault, same effect) Example: 1,000,000 total SA faults, 15,000 AU/TI/BU/RE → 985,000 testable ATPG detects 975,150 → FC = 975,150 / 985,000 = 99.0% ✓ Pass

Industry Coverage Targets

Fault ModelTypical TargetWhy This LevelTool
Stuck-at (SA)> 99.0%Maps directly to open/short defects. 99% correlates to ~100–300 DPPM at typical defect densities.Tessent, TetraMAX
Transition (TF)> 95.0%At-speed; harder to achieve due to launch constraints. 95% is typical, 98%+ for high-reliability.Tessent LOC/LOS mode
Path Delay (PDF)Top 1,000–10,000 pathsCombinatorial explosion — target only critical paths. All critical timing paths in STA.Tessent TDF mode
BridgingBest effort, layout-awareRequires layout extraction to identify adjacent nets. Optional but valuable.Calibre + ATPG
MBIST (memory)100% (memory cells)Memory cells are regular → full coverage is achievable with March algorithms.Tessent MBIST

DFT Concepts in Verilog

Before scan is inserted, this is a simple combinational block. The problem: if there's a SA1 fault on net and_out, you can only detect it if you can observe and_out at the output — which requires sel to be set correctly and the mux to pass through. Deep inside a large design, this observability chain may span dozens of logic levels.

Verilog — Circuit with low observability deep node
// This module has a deep internal node 'and_out'
// Detecting a SA1 on and_out requires:
// 1. Controllability: set a=1, b=0 (or a=0, b=1) to activate fault
// 2. Observability: route the error through mux → y to see it at output
module deep_logic (
  input  a, b, c, d, sel,
  output y
);
  wire and_out;   // internal node — low observability
  wire or_out;

  assign and_out = a & b;   // SA1 fault here: and_out stuck at 1
  assign or_out  = c | d;

  // To observe and_out, sel must = 0 AND
  // and_out must differ from the correct value
  assign y = sel ? or_out : and_out;
endmodule

// Test vector to ACTIVATE SA1 on and_out:
//   Set a=0, b=X (or a=X, b=0) → correct and_out = 0
//   Fault makes and_out = 1 → ERROR
// Test vector to PROPAGATE error to output y:
//   Set sel=0 → y = and_out = 1 (faulty) vs 0 (good) → DETECTED
//   If sel=1 → y = or_out → fault is MASKED (not observable this cycle)
After scan: scan flip-flop makes and_out directly controllable + observable
// ATPG views the circuit with scan FFs replacing regular FFs
// Any flip-flop output can be:
//   - Controlled: shift a known value into the FF via scan chain
//   - Observed:   capture logic value into FF, then shift out

module scan_ff (
  input  clk, scan_en,
  input  d,        // functional data
  input  scan_in,  // scan chain input
  output reg q,
  output scan_out  // connects to next FF's scan_in
);
  assign scan_out = q;

  always @(posedge clk)
    q <= scan_en ? scan_in  // SHIFT MODE: load from chain
                 : d;       // CAPTURE MODE: normal operation
endmodule

// SHIFT MODE (scan_en=1):  SI → FF[0].q → FF[1].q → ... → SO
// CAPTURE MODE (scan_en=0): apply 1 functional clock, capture logic values
// Then SHIFT MODE again to read out captured values at SO

Common Manufacturing Defects & Their Fault Models

Physical DefectFault ModelProcess Node RiskDetection Method
Metal open (broken wire)SA0 (line floats low) or SA1All nodes; vias most vulnerableStuck-at ATPG
Via void / unfilled viaSA0 or Transition Fault (resistive)Advanced nodes (7nm→)ATPG + at-speed TF
Metal-to-metal shortBridging FaultIncreases at advanced nodes (tight pitch)Bridging ATPG
Gate oxide defectSA0 / SA1 (transistor always on/off)All nodesStuck-at ATPG
Resistive contactTransition Fault (slow rise/fall)All nodesAt-speed TF ATPG
Particle contaminationBridging or SAAll nodes; worse at smaller nodesATPG
CMP over-polishingSA0 (metal thinned → open)7nm and belowStuck-at ATPG

From Fault Coverage to DPPM

DPPM — Defective Parts Per Million — is the number of bad chips expected to pass the test and reach customers. It's directly tied to fault coverage. A simple model:

DPPM ≈ (1 − Fault Coverage) × Defect Rate × 10⁶ Example: Fault Coverage = 99.0% = 0.990 Defect Rate = 5,000 DPPM gross (before test) DPPM_out = (1 − 0.990) × 5,000 = 0.01 × 5,000 = 50 DPPM Going from 99.0% to 99.9% coverage: DPPM_out = (1 − 0.999) × 5,000 = 0.001 × 5,000 = 5 DPPM → 10× improvement in outgoing quality for 0.9% more coverage → This is why every 0.1% of coverage matters at production volume

For automotive and safety-critical applications (ISO 26262), DPPM targets are often <10 DPPM — requiring fault coverage above 99.9% with additional diagnostic coverage requirements.

Day 1 — Interview Questions

Q1What is the difference between a defect and a fault in VLSI testing?
A defect is a physical imperfection introduced during manufacturing — for example, a broken metal wire, an unfilled via, or a particle creating a short between two wires. A fault is the logical abstraction of that defect's effect on circuit behavior — a stuck-at-1 fault models a wire shorted to VDD. DFT engineers work with fault models because directly simulating billions of physical defect sites is computationally intractable. One physical defect may cause multiple faults; one fault may model multiple defects.
Q2What is a stuck-at fault, and what physical defect does it model?
A stuck-at fault assumes a net in the circuit is permanently fixed at logic 0 (SA0) or logic 1 (SA1), regardless of what the driving gate outputs. SA1 models a wire shorted to VDD (power). SA0 models a wire shorted to GND, or an open metal wire (which has no driver and floats to a logic-0 due to leakage). The stuck-at model is the most widely used because it is simple to analyze, correlates well with real open/short defects, and achieving >99% SA fault coverage is the standard production test requirement.
Q3What are controllability and observability? Why do they matter for DFT?
Controllability is the ability to set a specific logic value (0 or 1) on an internal net by applying inputs at primary inputs or scan inputs. Observability is the ability to propagate a logic value from an internal net to a primary output where it can be measured. A fault can only be detected if: (1) you can activate it — drive the net to the value opposite to the stuck value (requires controllability), and (2) you can propagate the error to an observable output (requires observability). DFT adds scan chains to make every flip-flop directly controllable (shift in a value) and observable (shift out), solving the fundamental testability problem.
Q4What is fault coverage, and how is it calculated?
Fault coverage is the percentage of modeled faults detected by the test patterns: FC = (Detected Faults / Total Testable Faults) × 100%. Total testable faults excludes structurally untestable faults (ATPG-untestable, tied nets, redundant faults). Industry targets are typically >99% for stuck-at faults and >95% for transition faults. Higher coverage correlates directly with lower DPPM: going from 99.0% to 99.9% SA coverage typically reduces outgoing DPPM by 10×.
Q5What is the difference between a transition fault and a stuck-at fault?
A stuck-at fault models a net permanently fixed at 0 or 1 — a DC (static) defect. A transition fault models a net that fails to switch within the required time — a dynamic (timing) defect. Transition faults model resistive defects like partially-formed vias: a 0→1 transition may happen, but too slowly, causing a setup-time violation when tested at speed. Stuck-at tests can be applied at slow (functional) speed. Transition fault tests must be applied at-speed (at the actual clock frequency) to detect timing failures — this requires special at-speed ATPG (LOC or LOS mode).
Q6What is an untestable fault? Give two examples.
An untestable fault is one that cannot be detected by any test vector due to the circuit's structure — not due to insufficient test patterns. (1) Redundant logic: if a circuit has redundant gate paths that always produce the same output regardless of a fault on one path, the fault cannot be detected. Example: Y = A AND A — a SA0 on either A input cannot be distinguished. (2) Tied nets: if a net is permanently tied to VDD or GND by design (e.g., a constant-1 enable), the stuck-at fault matching the tied value is untestable because the net never changes. Untestable faults are excluded from the denominator in fault coverage calculations.