What is the stuck-at fault model?

The stuck-at fault model assumes that a manufacturing defect causes a net in the circuit to be permanently fixed (stuck) at logic 0 (stuck-at-0, SA0) or logic 1 (stuck-at-1, SA1), regardless of what the circuit's logic drives it to. For a circuit with N nets, there are 2N possible single stuck-at faults. Stuck-at is the most widely used fault model because it's simple to analyze, maps well to real metal opens/shorts, and achieving >99% stuck-at fault coverage (SA coverage) is the industry standard for production test.

What is fault coverage in VLSI testing?

Fault coverage is the percentage of modeled faults that are detected by the test patterns applied to the chip. Formula: Fault Coverage = (Detected Faults / Total Testable Faults) × 100%. Industry targets are typically >99% stuck-at coverage and >95% transition fault coverage. Faults that cannot be tested due to circuit structure are classified as 'untestable' and excluded from the denominator. Higher fault coverage correlates directly with lower DPPM (Defective Parts Per Million) in the field.

What are controllability and observability in DFT?

Controllability is the ability to set a specific logic value (0 or 1) on a net in the circuit by applying inputs at the primary inputs. Observability is the ability to propagate the logic value on a net to a primary output where it can be measured. DFT adds structures (scan chains, test points) to improve both: scan chains make every flip-flop directly controllable (shift in a value) and observable (shift out the value), dramatically increasing fault coverage compared to purely combinational test.

DFT Day 1 — Introduction: Why Test? Fault Models & Fault Coverage

Q: What is the difference between a defect and a fault in DFT?

A defect is a physical imperfection introduced during manufacturing — for example, a metal via that didn't form correctly, a short between two wires, or an extra particle of conductor bridging two nets. A fault is the logical abstraction of that defect's effect on circuit behavior — for example, a stuck-at-1 fault on a net models the behavior of a wire shorted to VDD. DFT engineers work with fault models (abstractions) rather than individual physical defects because there are billions of potential defect sites and modeling each one precisely is impractical.

Why Do Chips Need Testing?

A chip that is correctly designed can still be incorrectly manufactured. Semiconductor fabrication involves hundreds of process steps — photolithography, deposition, etching, chemical-mechanical polishing, implantation — each with its own variability. A particle of contamination landing on a wafer, a lithography focus error, a metal via that didn't fill completely — any of these can create a defect that turns a functional design into a broken chip.

The numbers are sobering. Even at mature process nodes with well-controlled fabs, typical die yields are 60–90%. At leading-edge nodes (3nm, 2nm), yields for large dies start significantly lower and improve over time. Every chip that ships with a manufacturing defect is a potential field return, a safety issue, or a recall.

Testing is the last gate before a chip reaches the customer. The Automatic Test Equipment (ATE) — a machine costing $1–5 million — applies thousands of test vectors to every chip, measures the outputs, and decides: pass or fail. The set of test vectors is generated by ATPG (Automatic Test Pattern Generation) tools, and the probability that a bad chip slips through is determined by the fault coverage of those vectors.

The Core DFT Problem

Modern chips have billions of transistors. You cannot apply enough random test vectors to thoroughly exercise all of them in a reasonable time. DFT solves this by adding special test structures (scan chains, BIST, JTAG) that make the chip's internal state directly accessible — converting an intractable test problem into a manageable one.

Defect vs Fault vs Error vs Failure

DFT engineers use these four terms precisely. Confusing them is a common interview mistake.

Term	Definition	Layer	Example
Defect	Physical imperfection from manufacturing	Physical	Metal particle bridging two wires; unfilled via
Fault	Logical abstraction of a defect's circuit effect	Logical	Net stuck-at-1 due to short to VDD
Error	Incorrect logic value produced at a node	Logical	Output of gate = 1 when it should be 0
Failure	Observable incorrect system behaviour	System	Chip produces wrong output; system crashes

A defect always causes a fault (in the fault model). A fault causes an error only when the faulty net is exercised (driven to the opposite of its stuck value). An error causes a failure only when it propagates to an observable output. A defect can be present without causing a field failure — if it happens to be on a rarely-used path or if the error is masked by other logic. This is why fault coverage targets are not 100% — some faults are structurally impossible to observe (untestable faults).

Defect → Fault → Error → Failure chain

Fault Models

Because directly modeling billions of physical defect sites is intractable, DFT uses fault models — simplified logical abstractions that correlate well with real manufacturing defects. Different fault models detect different classes of physical defects.

Model 1

Stuck-at Fault (SAF)

A net is permanently fixed at logic 0 (SA0) or logic 1 (SA1), regardless of what the driving gate outputs. SA1 models a short to VDD; SA0 models a short to GND or an open metal line (which pulls low). The most widely used fault model — industry requires >99% SA coverage for production test.

net A stuck-at-1: even if driver outputs 0, A reads 1

Model 2

Transition Fault (TF)

A net fails to transition within the required time — slow-to-rise (STR) or slow-to-fall (STF). Models resistive defects: a partially-formed via with high resistance allows a 0→1 transition but too slowly, causing setup-time violations at speed. Caught by at-speed testing. Target: >95% transition fault coverage.

net A STR: 0→1 transition takes 2× longer than spec

Model 3

Bridging Fault (BF)

Two or more nets are unintentionally shorted together. The result depends on the logic driving both nets — could cause wired-AND or wired-OR behaviour (or intermediate voltage in CMOS). Bridging faults model metal shorts between adjacent wires, which become more common at advanced nodes with tighter pitch.

net A and net B shorted → A = B = A AND B (wired-AND)

Model 4

Path Delay Fault (PDF)

The total propagation delay along a specific logic path exceeds the clock period, causing a setup-time violation. Models distributed delay degradation along long paths. Path delay faults are expensive to test (combinatorial explosion of paths) — typically only critical/longest paths are targeted.

8-gate critical path: delay = 1.05× Tclk → setup fail

Stuck-at-1 fault on AND gate input — fault activation and propagation

Controllability and Observability

These are the two fundamental DFT concepts. Together they determine whether a fault can be tested.

Controllability

Controllability is the ability to set a specific logic value on an internal net by applying signals at the chip's primary inputs (or scan inputs). A net buried deep inside sequential logic has low controllability — you may need to clock through many flip-flops to reach a specific state at that net.

To activate a stuck-at-0 fault on net N, you must be able to drive N to logic 1 (so the SA0 fault causes an error — N reads 0 when it should be 1). If you can't controllably set N = 1, the fault is untestable.

Observability

Observability is the ability to propagate the value on an internal net to a primary output where it can be measured by the tester. Deep internal logic has low observability — an error must propagate through many gate stages to reach an output.

The Scan Solution

Scan chains solve both problems simultaneously. In scan mode, every flip-flop in the design is connected as a shift register. To set a flip-flop's value: shift in the value directly (100% controllability). To read a flip-flop's value: shift it out to the scan output (100% observability). This is why scan insertion is the most fundamental DFT technique — it turns every register into a directly accessible test point.

Fault Coverage Metrics

Fault Coverage (FC) = (Detected Faults / Total Testable Faults) × 100% Total Testable Faults = Total Faults − Untestable Faults (AU + TI + BU + RE) ATPG Fault Classifications: DT — Detected (test pattern exercises and detects the fault) PT — Possibly Detected (X-state propagation may detect; depends on tester) AU — ATPG Untestable (structurally impossible to test) TI — Tied (net is tied to constant 0 or 1, fault model doesn't apply) BU — Blocked Untestable (fault path blocked by sequential depth / bus) RE — Redundant (equivalent to a detected fault, same effect) Example: 1,000,000 total SA faults, 15,000 AU/TI/BU/RE → 985,000 testable ATPG detects 975,150 → FC = 975,150 / 985,000 = 99.0% ✓ Pass

Industry Coverage Targets

Fault Model	Typical Target	Why This Level	Tool
Stuck-at (SA)	> 99.0%	Maps directly to open/short defects. 99% correlates to ~100–300 DPPM at typical defect densities.	Tessent, TetraMAX
Transition (TF)	> 95.0%	At-speed; harder to achieve due to launch constraints. 95% is typical, 98%+ for high-reliability.	Tessent LOC/LOS mode
Path Delay (PDF)	Top 1,000–10,000 paths	Combinatorial explosion — target only critical paths. All critical timing paths in STA.	Tessent TDF mode
Bridging	Best effort, layout-aware	Requires layout extraction to identify adjacent nets. Optional but valuable.	Calibre + ATPG
MBIST (memory)	100% (memory cells)	Memory cells are regular → full coverage is achievable with March algorithms.	Tessent MBIST

DFT Concepts in Verilog

Before scan is inserted, this is a simple combinational block. The problem: if there's a SA1 fault on net and_out, you can only detect it if you can observe and_out at the output — which requires sel to be set correctly and the mux to pass through. Deep inside a large design, this observability chain may span dozens of logic levels.

Verilog — Circuit with low observability deep node

// This module has a deep internal node 'and_out'
// Detecting a SA1 on and_out requires:
// 1. Controllability: set a=1, b=0 (or a=0, b=1) to activate fault
// 2. Observability: route the error through mux → y to see it at output
module deep_logic (
  input  a, b, c, d, sel,
  output y
);
  wire and_out;   // internal node — low observability
  wire or_out;

  assign and_out = a & b;   // SA1 fault here: and_out stuck at 1
  assign or_out  = c | d;

  // To observe and_out, sel must = 0 AND
  // and_out must differ from the correct value
  assign y = sel ? or_out : and_out;
endmodule

// Test vector to ACTIVATE SA1 on and_out:
//   Set a=0, b=X (or a=X, b=0) → correct and_out = 0
//   Fault makes and_out = 1 → ERROR
// Test vector to PROPAGATE error to output y:
//   Set sel=0 → y = and_out = 1 (faulty) vs 0 (good) → DETECTED
//   If sel=1 → y = or_out → fault is MASKED (not observable this cycle)

After scan: scan flip-flop makes and_out directly controllable + observable

// ATPG views the circuit with scan FFs replacing regular FFs
// Any flip-flop output can be:
//   - Controlled: shift a known value into the FF via scan chain
//   - Observed:   capture logic value into FF, then shift out

module scan_ff (
  input  clk, scan_en,
  input  d,        // functional data
  input  scan_in,  // scan chain input
  output reg q,
  output scan_out  // connects to next FF's scan_in
);
  assign scan_out = q;

  always @(posedge clk)
    q <= scan_en ? scan_in  // SHIFT MODE: load from chain
                 : d;       // CAPTURE MODE: normal operation
endmodule

// SHIFT MODE (scan_en=1):  SI → FF[0].q → FF[1].q → ... → SO
// CAPTURE MODE (scan_en=0): apply 1 functional clock, capture logic values
// Then SHIFT MODE again to read out captured values at SO

Common Manufacturing Defects & Their Fault Models

Physical Defect	Fault Model	Process Node Risk	Detection Method
Metal open (broken wire)	SA0 (line floats low) or SA1	All nodes; vias most vulnerable	Stuck-at ATPG
Via void / unfilled via	SA0 or Transition Fault (resistive)	Advanced nodes (7nm→)	ATPG + at-speed TF
Metal-to-metal short	Bridging Fault	Increases at advanced nodes (tight pitch)	Bridging ATPG
Gate oxide defect	SA0 / SA1 (transistor always on/off)	All nodes	Stuck-at ATPG
Resistive contact	Transition Fault (slow rise/fall)	All nodes	At-speed TF ATPG
Particle contamination	Bridging or SA	All nodes; worse at smaller nodes	ATPG
CMP over-polishing	SA0 (metal thinned → open)	7nm and below	Stuck-at ATPG

From Fault Coverage to DPPM

DPPM — Defective Parts Per Million — is the number of bad chips expected to pass the test and reach customers. It's directly tied to fault coverage. A simple model:

DPPM ≈ (1 − Fault Coverage) × Defect Rate × 10⁶ Example: Fault Coverage = 99.0% = 0.990 Defect Rate = 5,000 DPPM gross (before test) DPPM_out = (1 − 0.990) × 5,000 = 0.01 × 5,000 = 50 DPPM Going from 99.0% to 99.9% coverage: DPPM_out = (1 − 0.999) × 5,000 = 0.001 × 5,000 = 5 DPPM → 10× improvement in outgoing quality for 0.9% more coverage → This is why every 0.1% of coverage matters at production volume

For automotive and safety-critical applications (ISO 26262), DPPM targets are often <10 DPPM — requiring fault coverage above 99.9% with additional diagnostic coverage requirements.

Day 1 — Interview Questions

Q1What is the difference between a defect and a fault in VLSI testing?

A defect is a physical imperfection introduced during manufacturing — for example, a broken metal wire, an unfilled via, or a particle creating a short between two wires. A fault is the logical abstraction of that defect's effect on circuit behavior — a stuck-at-1 fault models a wire shorted to VDD. DFT engineers work with fault models because directly simulating billions of physical defect sites is computationally intractable. One physical defect may cause multiple faults; one fault may model multiple defects.

Q2What is a stuck-at fault, and what physical defect does it model?

A stuck-at fault assumes a net in the circuit is permanently fixed at logic 0 (SA0) or logic 1 (SA1), regardless of what the driving gate outputs. SA1 models a wire shorted to VDD (power). SA0 models a wire shorted to GND, or an open metal wire (which has no driver and floats to a logic-0 due to leakage). The stuck-at model is the most widely used because it is simple to analyze, correlates well with real open/short defects, and achieving >99% SA fault coverage is the standard production test requirement.

Q3What are controllability and observability? Why do they matter for DFT?

Controllability is the ability to set a specific logic value (0 or 1) on an internal net by applying inputs at primary inputs or scan inputs. Observability is the ability to propagate a logic value from an internal net to a primary output where it can be measured. A fault can only be detected if: (1) you can activate it — drive the net to the value opposite to the stuck value (requires controllability), and (2) you can propagate the error to an observable output (requires observability). DFT adds scan chains to make every flip-flop directly controllable (shift in a value) and observable (shift out), solving the fundamental testability problem.

Q4What is fault coverage, and how is it calculated?

Fault coverage is the percentage of modeled faults detected by the test patterns: FC = (Detected Faults / Total Testable Faults) × 100%. Total testable faults excludes structurally untestable faults (ATPG-untestable, tied nets, redundant faults). Industry targets are typically >99% for stuck-at faults and >95% for transition faults. Higher coverage correlates directly with lower DPPM: going from 99.0% to 99.9% SA coverage typically reduces outgoing DPPM by 10×.

Q5What is the difference between a transition fault and a stuck-at fault?

A stuck-at fault models a net permanently fixed at 0 or 1 — a DC (static) defect. A transition fault models a net that fails to switch within the required time — a dynamic (timing) defect. Transition faults model resistive defects like partially-formed vias: a 0→1 transition may happen, but too slowly, causing a setup-time violation when tested at speed. Stuck-at tests can be applied at slow (functional) speed. Transition fault tests must be applied at-speed (at the actual clock frequency) to detect timing failures — this requires special at-speed ATPG (LOC or LOS mode).

Q6What is an untestable fault? Give two examples.

An untestable fault is one that cannot be detected by any test vector due to the circuit's structure — not due to insufficient test patterns. (1) Redundant logic: if a circuit has redundant gate paths that always produce the same output regardless of a fault on one path, the fault cannot be detected. Example: Y = A AND A — a SA0 on either A input cannot be distinguished. (2) Tied nets: if a net is permanently tied to VDD or GND by design (e.g., a constant-1 enable), the stuck-at fault matching the tied value is untestable because the net never changes. Untestable faults are excluded from the denominator in fault coverage calculations.

Introduction to DFTWhy Test? Fault Models & Coverage