DFT Day 10 — DFT Sign-off: Coverage Targets, DPPM, ATE & Tester Formats

Q: What stuck-at fault coverage is required for production test?

The industry-wide minimum for production test is 99% stuck-at fault coverage. General-purpose consumer chips typically achieve 99.0–99.3%. High-reliability applications such as automotive (ISO 26262 ASIL-D) and medical electronics require 99.5% or higher. Some tier-1 semiconductor companies mandate 99.8% as an internal target. Coverage below 99% is considered insufficient because each missing percentage point translates to a meaningful increase in defective parts reaching customers (DPPM). The coverage figure reported by the tool is: FC = (Detected Faults) / (Total Testable Faults), where untestable faults (due to structural impossibility) are excluded from the denominator.

Q: How do you calculate DPPM from fault coverage?

DPPM (Defective Parts Per Million) depends on both the fault coverage and the defect density of the wafer process. The basic formula is: DPPM = (1 - FC) × Defect_Rate × 1,000,000. For example, with 99% stuck-at coverage (FC = 0.99) and a 0.5% die defect rate: DPPM = (1 - 0.99) × 0.005 × 1,000,000 = 50 DPPM. Improving coverage to 99.5% cuts this to 25 DPPM. Improving to 99.9% cuts it to 5 DPPM. Real DPPM models are more complex and use Williams-Brown yield equations that account for defect clustering, multiple fault models, and test escape probability per defect type. Automotive typically requires DPPM below 1.

Q: What is a test escape and how do you minimize them?

A test escape is a defective die that passes the production test and is shipped to a customer. Test escapes occur when: the defect is not covered by any pattern in the test set (uncovered fault), the fault model does not capture the actual defect mechanism (model mismatch), or the pattern targets the right fault but a test condition (timing, voltage) prevents reliable detection. Minimizing escapes requires: maximizing stuck-at and transition fault coverage, adding multiple fault models (bridging, path delay), using at-speed testing to catch delay defects, applying IDDQ testing at advanced nodes, and using adaptive test escape analysis tools that correlate field returns with test data. Physical failure analysis of escaped parts provides feedback to close the gap.

Q: What is STIL and why is it important?

STIL (Standard Test Interface Language, IEEE 1450) is a portable, ASCII-based language for describing test patterns, timing, and waveform definitions in a tester-independent format. ATPG tools output STIL as an intermediate format; tester-specific translators then convert STIL to ATE-native formats (Teradyne .avc, Advantest .wvf, etc.). STIL is important because it decouples the test pattern generation step from the ATE target — a design team can generate STIL patterns without knowing which ATE the fab will use. It also enables pattern reuse: STIL patterns can be applied to different ATE platforms by simply re-running the conversion. The STIL format includes signal definitions, timing waveforms, scan chain specifications, and vector data.

Q: How does test time affect chip production cost?

ATE time is billed in seconds per die. The loaded cost of a production ATE system (Teradyne Ultraflex, Advantest T2000) when amortized over its lifetime, including facilities and operators, is typically $200–500 per hour, or $0.05–0.14 per second. A chip requiring 500 ms of test time costs $0.025–0.07 per die in ATE cost alone. For a chip sold at $2, this is a significant fraction of the COGS. Scan compression (EDT) reduces test time proportionally to compression ratio: a 64× compression ratio on a 10,000-pattern set reduces effective pattern count to ~156, slashing test time by 64×. Faster ATE clocks, shorter chains, and higher compression all reduce test time and directly improve chip margin.

What is DFT Sign-off?

DFT sign-off is the formal gate that a chip must pass before its test strategy is approved for production. It is analogous to timing sign-off (STA) in the physical design flow: just as no chip ships with unclosed timing violations, no chip ships without meeting its DFT coverage and quality targets. The DFT sign-off review is conducted by the DFT team, reviewed by the design team lead, and approved by the test engineering manager before the design is released to the foundry for tape-out.

Sign-off answers a deceptively simple question: "If we run this test on the ATE, will we reliably find all meaningful defects while not incorrectly rejecting good dies?" Answering it requires demonstrating adequate fault coverage, acceptable pattern count and test time, power compliance during test, ATE pattern readiness, and a clear plan for handling field returns.

99%

Min Stuck-At Coverage (industry)

DPPM Target — Automotive

50 ms

Typical Total Test Time per Die

Sign-off = Manufacturing Quality Gate

DFT sign-off is not a formality — it is the binding quality commitment before silicon investment. A 1% coverage miss at 99% vs 98% can double the DPPM shipped to customers and trigger expensive field recalls.

Fault Coverage Metrics

Fault coverage is the primary DFT metric. It quantifies what fraction of all modeled fault sites are detected by the test pattern set. Different fault models capture different physical defect mechanisms, so coverage is reported separately for each model.

Fault Coverage (FC) = Detected Faults / Total Testable Faults × 100% Detected Faults = faults where at least one pattern produces a wrong output Testable Faults = Total Faults − Untestable Faults (structural impossibility) Untestable Faults = faults that cannot be detected regardless of input patterns

The distinction between "untestable" and "undetected" is critical for sign-off. Untestable faults (also called ATEs — always-testable-exceptions in some tools, or UC — untestable-collateral) are excluded from the denominator because they are physically impossible to detect. Undetected faults (faults the tool could detect with more patterns or time) remain in the denominator and drag down coverage. Sign-off engineers must justify every untestable fault with an analysis showing it cannot correspond to a real defect.

Fault Type	Coverage Formula	Industry Target	Tool Report Name
Stuck-At (SA)	SA_detected / SA_testable	>99% (99.5%+ tier-1)	Stuck-at fault coverage
Transition Fault (TF)	TF_detected / TF_testable	92–96%	Transition fault coverage (at-speed)
Path Delay (PD)	PD_detected / PD_testable	>90% critical paths	Path delay coverage
Bridging	BR_detected / BR_testable	>85% (where modeled)	Bridging fault coverage
Cell Aware (CA)	CA_detected / CA_testable	>95% (advanced nodes)	Cell-aware coverage

For advanced nodes (7 nm and below), cell-aware ATPG is increasingly required. Unlike stuck-at or transition fault models that treat each gate as a black box, cell-aware ATPG uses a detailed transistor-level fault model for each cell in the library. This catches intra-cell defects (broken transistors inside a complex gate) that traditional models miss. Cell-aware patterns are generated by tool vendors as part of the PDK.

DPPM — Defective Parts Per Million

DPPM (Defective Parts Per Million) is the quality metric that customers care about. It measures how many defective chips escape the test and reach the customer's hands, expressed per million chips shipped. DPPM directly correlates with warranty costs, field reliability, and customer trust.

DPPM Calculation

Basic DPPM formula: DPPM = (1 − FC) × Defect_Rate × 1,000,000 Where: FC = Fault Coverage (e.g., 0.99 for 99%) Defect_Rate = fraction of manufactured dies with at least one defect Example (99% coverage, 0.5% defect rate): DPPM = (1 − 0.99) × 0.005 × 1,000,000 = 0.01 × 0.005 × 1,000,000 = 50 DPPM

The exponential improvement near 100% coverage is what makes every fraction of a percent matter. Going from 99% to 99.5% halves the DPPM. Going from 99.5% to 99.9% halves it again. This is why tier-1 companies chase 99.5%+ coverage aggressively — the DPPM improvement at high coverage is non-linear.

Fault Coverage vs DPPM — Exponential Improvement Near 100%

Automotive chips (ISO 26262 ASIL-D) demand DPPM below 1 — which requires not only 99.9%+ stuck-at coverage but also full transition fault coverage, cell-aware patterns, IDDQ screening, and often burn-in test. Consumer chips can typically accept 10–100 DPPM depending on the application and price point.

Test Escape Analysis

A test escape is a defective die that passes production test and is shipped to a customer. Test escapes are the worst possible DFT outcome: they mean real defects reached the field, often causing system failures long after sale.

Root Causes of Test Escapes

Uncovered faults: The defect corresponds to a fault that is in the "undetected" category — the ATPG tool ran out of time or patterns before covering it.
Fault model mismatch: The physical defect is a resistive open or a partial bridge — it behaves differently from a clean stuck-at-0 or stuck-at-1 model. The stuck-at patterns cannot reliably detect it.
Pattern starvation near 99% boundary: The last 0.5–1% of faults often require many more patterns than the first 98%. If the ATPG run was time-limited, this long tail may be underserved.
ATE translation errors: The STIL-to-ATE format conversion introduced timing or logic errors in the patterns. The patterns work in simulation but fail on silicon.
Marginal defects at parametric limits: A defect that is borderline — it does not affect DC behavior but causes failures at speed or at low temperature. Stuck-at patterns (which are applied at slow functional speed) do not find it; only at-speed transition or path-delay patterns do.

Escape Analysis Process

When a field return is analyzed (physical failure analysis, or PFA), the defect location and type are fed back to the DFT team. The team asks: "Is this defect covered by our pattern set?" If not, the pattern set is augmented. This feedback loop is called test escape analysis and is a key input to the DFT roadmap for the next tape-out.

ATPG Efficiency Metrics

Fault coverage alone does not define a good test strategy. A test with 99.5% coverage but 100,000 patterns is impractical — it would take hours per die on ATE. DFT sign-off also requires demonstrating that the pattern set is efficient: coverage achieved with minimal patterns in minimal time.

Metric	Definition	Typical Target	Impact if Missed
Pattern Count (SA)	Number of stuck-at test patterns	<5,000 (compressed)	Test time too long; ATE memory exceeded
Pattern Count (TF)	Number of transition fault patterns	<10,000 (compressed)	At-speed test time balloons
Compression Ratio	Effective patterns / ATE patterns (EDT)	32×–128×	ATE memory / test time limit
ATPG CPU Time	Wall-clock time to generate patterns	<24 hours	Delays tape-out schedule
Fault Simulation Accuracy	Simulation-vs-silicon coverage correlation	>98% match	Coverage claims are unreliable
Untestable % (SA)	Fraction of SA faults classified untestable	<2%	Coverage ceiling too low; must justify each

Typical ATPG Sign-off Report — TetraMAX Summary

# ── Stuck-At Fault Coverage Report ──────────────────────────
Total Faults         : 4,823,412
Detected Faults      : 4,774,178
Untestable Faults    : 43,110  (0.89%)
Undetected Faults    : 6,124   (0.13%)
Fault Coverage       : 99.87%  (detected / testable)
Test Coverage        : 98.99%  (detected / total)

# ── Pattern Metrics ──────────────────────────────────────────
SA Patterns          : 3,847  (EDT compressed, 64x)
TF Patterns (LOC)    : 6,211  (at-speed, 128x)
Scan Chains          : 128    (EDT channels: 2)
Chain Length (max)   : 1,924  FFs

# ── Test Time Estimate ───────────────────────────────────────
SA Test Time         : 74 ms  (at 100 MHz shift, 64x compression)
TF Test Time         : 121 ms (at-speed capture, 128x compression)
Total Estimated      : 195 ms (with overhead)

# ── Sign-off Status ──────────────────────────────────────────
PASS: SA coverage >99% threshold   [99.87% PASS]
PASS: TF coverage >92% threshold   [94.12% PASS]
PASS: Pattern count <10,000        [SA:3847 TF:6211 PASS]
PASS: Test time <500ms budget      [195ms PASS]

ATE — Automatic Test Equipment

Once patterns are generated and sign-off metrics are satisfied, the test must be applied to real silicon on production ATE. Automatic Test Equipment is specialized hardware that sits at the heart of semiconductor manufacturing — every chip in the world passes through an ATE before shipping.

What ATE Does

ATE performs three fundamental functions:

Apply test stimulus: Drive test vectors onto the chip's input pins with precise timing (picosecond accuracy), at-speed (100 MHz to several GHz for RF chips).
Measure response: Capture the chip's output on every pin and compare it to the expected response (GOOD/BAD decision).
Sort dies: Pass dies continue to packaging. Fail dies are marked (inked) on the wafer map. Bins classify fail modes (SA fail, TF fail, leakage fail, functional fail) for yield analysis.

ATE Platform	Vendor	Target Segment	Pin Count	Speed
Ultraflex / UltraFLEXplus	Teradyne	SoC, mobile, data center	Up to 1024	Up to 3.2 GHz
J750	Teradyne	Mixed-signal, MCU, consumer	Up to 1024	200 MHz digital
T2000	Advantest	Memory, SoC	Up to 2048	Up to 800 MHz
V93000 (SmarTest)	Advantest	High-speed digital, automotive	Up to 1024+	Up to 6.4 GHz

ATE Resources and Constraints

ATE resources directly constrain the test strategy. The key constraints are:

Pin count (channels): The chip's scan-in, scan-out, and control pins must fit within the ATE's available channels. Large SoCs may require multiple ATE channel cards.
Pattern memory: ATE stores test patterns in on-board memory. Uncompressed pattern sets for large SoCs can exceed ATE memory. EDT compression is essential to fit patterns within memory budgets (typically 64 MB to 512 MB per tester).
Timing accuracy: At-speed tests require ps-level edge placement accuracy to reliably apply and measure clock edges at 100 MHz–1 GHz.
Power supply channels: Multi-VDD chips require multiple independent supply channels with current monitoring (for IDDQ testing).

Tester Formats — WGL, STIL, VCD

ATPG tools generate patterns in their internal proprietary formats. To apply these patterns on ATE, they must be converted into ATE-compatible formats. This translation step is called pattern retargeting and is one of the final steps in DFT sign-off.

Test Pattern Format Conversion Flow

Format	Standard / Source	Description	Used By
STIL	IEEE 1450	Standard Test Interface Language — portable, ASCII, describes waveforms + vectors + scan chain topology	All ATE flows; intermediate standard
WGL	Synopsys (de facto)	Waveform Generation Language — Synopsys TetraMAX native output; widely supported by Teradyne	Teradyne J750, Ultraflex
AVC	Advantest (ASCII Vector)	Advantest-native ASCII format for T2000 and V93000 platforms	Advantest T2000, V93000
VCD	IEEE 1364	Value Change Dump — simulation dump format; sometimes used as pattern source but not a primary ATE format	Simulation, debug
CTL	IEEE 1149.8 / Tessent	Core Test Language — describes core-level patterns in hierarchical DFT (IP reuse)	Hierarchical DFT, IP integration

The conversion from STIL to ATE-native format is not simply a syntax translation. Timing must be re-specified in the ATE's timing resolution (typically in ps). Scan chain waveforms must be mapped to the ATE's available pin timing domains. Any ATE-unsupported features (e.g., complex multi-clock capture) must be re-implemented in the ATE's test program language. This step is validated by running the ATE patterns in an ATE simulator and comparing responses against the ATPG simulation.

Test Time on ATE

ATE time is a direct manufacturing cost. Every second a die sits under the tester probes adds to its cost of goods sold (COGS). For high-volume chips (hundreds of millions per year), even 10 ms of wasted test time translates to millions of dollars annually.

Test Time per Die = Patterns × (Chain_Length + 1) / Shift_Frequency Example: Patterns = 5,000 Chain Length = 2,000 FFs Shift Freq = 100 MHz Raw Test Time = 5,000 × (2,000 + 1) / 100,000,000 = 5,000 × 2001 / 100,000,000 = 100 ms per die With EDT 64× compression: Effective ATE patterns = 5,000 / 64 ≈ 78 Compressed Test Time = 78 × 2001 / 100,000,000 ≈ 1.56 ms

This example shows why EDT compression is economically critical. A 64× compression ratio reduces test time from 100 ms to 1.56 ms — a 64× reduction in ATE cost. At an ATE operating cost of $300/hour ($0.083/second), the difference between compressed and uncompressed is:

Uncompressed: 100 ms × $0.083/s = $0.0083 per die (ATE fraction)
Compressed: 1.56 ms × $0.083/s = $0.00013 per die

At 100 million dies per year, this is a $830,000 vs $13,000 annual ATE cost difference — a $817,000 saving from compression alone.

DFT Sign-off Checklist

The following is the standard DFT sign-off checklist used at tape-out. Every item must be verified and documented before the design is released to the foundry.

✅

Stuck-at fault coverage > 99.0% — Verified by ATPG fault report. All untestable faults justified with DRC analysis.

✅

Transition fault coverage > 92% — At-speed LOC/LOS patterns generated and fault simulated. Coverage report attached.

✅

All scan chains verified (scan_verify clean) — scan_verify simulation passes with no chain connectivity errors, no missing FFs, no extra FFs.

✅

No X-sources untreated — All X-sources in the design are either blocked in test mode, masked via X-masking, or covered by EDT X-tolerance. Zero unexpected X values in scan_out simulation.

✅

Power during test within limits — Toggle rate analysis shows peak shift toggle rate below 20% (or design-specific limit). Power-aware ATPG constraints applied and verified.

✅

ATPG patterns LVS-clean — Patterns generated on post-LVS netlist. No gate-level simulation mismatches vs RTL-level simulation.

✅

ATE format generated and validated — STIL patterns converted to target ATE format (WGL / AVC). ATE simulation (pattern simulation on tester model) passes.

✅

Test time within ATE slot budget — Total estimated test time per die is below the agreed ATE slot time (typically 200–500 ms for SoC test). EDT compression ratio verified.

✅

Failure logging format agreed with manufacturing — Wafer map format, bin definitions (SA fail, TF fail, functional fail), and data logging format agreed with manufacturing and test engineering. Fail-log analysis setup ready for first silicon debug.

✅

DPPM estimate meets product target — DPPM calculated from coverage and process defect rate. For automotive: <1 DPPM (with cell-aware and additional screens). For consumer: <100 DPPM.

Sign-off Complete: When all checklist items pass, the DFT sign-off document is signed by the DFT lead, verified by the design manager, and archived with the tape-out package. This document is also required for post-silicon debug — it records exactly what was tested and how.

Interview FAQ — DFT Sign-off

What stuck-at fault coverage is required for production test?

The industry-wide minimum for production test is 99% stuck-at fault coverage. Consumer chips typically achieve 99.0–99.3%. High-reliability applications (automotive ISO 26262 ASIL-D, medical, aerospace) require 99.5% or higher. Some tier-1 companies set an internal target of 99.8%. The formula is: FC = Detected_Faults / Testable_Faults × 100%. Untestable faults (structurally impossible to detect) are excluded from the denominator but must be individually justified through DRC analysis to confirm they cannot correspond to real manufacturing defects. Coverage below 99% fails DFT sign-off and must be resolved before tape-out.

How do you calculate DPPM from fault coverage?

The basic formula is: DPPM = (1 − FC) × Defect_Rate × 1,000,000. For example, with 99% stuck-at coverage (FC = 0.99) and a 0.5% defect rate: DPPM = 0.01 × 0.005 × 1,000,000 = 50 DPPM. At 99.5% coverage: DPPM = 0.005 × 0.005 × 1,000,000 = 25 DPPM. At 99.9%: 5 DPPM. The improvement is linear in (1−FC), so each 0.1% coverage gain halves the DPPM when FC is near 99%. Real DPPM calculations use more sophisticated models (Williams-Brown yield equations, defect clustering factors, multi-model coverage weighting) but the basic intuition holds: every fraction of a percent of coverage matters enormously at high volumes.

What is a test escape and how do you minimize them?

A test escape is a defective die that passes production test and reaches the customer. Causes include: uncovered faults (ATPG did not generate patterns for them), fault model mismatch (defect behaves differently from the model), marginal defects that only manifest at speed or temperature, and ATE translation errors. Minimization strategies: maximize stuck-at and transition fault coverage, add cell-aware ATPG for advanced nodes, use at-speed testing (LOC/LOS) to catch delay defects, apply IDDQ screening for resistive defects, perform burn-in for early-life failures, and implement feedback loops from field returns to update the test set. The most important ongoing action is physical failure analysis of field returns to identify which defect types are escaping and why.

What is STIL and why is it important?

STIL (Standard Test Interface Language, IEEE 1450) is a portable, ASCII-based language for describing test patterns, timing waveforms, and scan chain specifications in a tester-independent format. It is the standard interchange format between ATPG tools and ATE platforms. ATPG tools (TetraMAX, Tessent) output STIL; ATE-specific translators convert STIL to tester-native formats (WGL for Teradyne, AVC/WVF for Advantest). STIL is important because it decouples pattern generation from the ATE target: the design team can generate and validate patterns without knowing which specific ATE the fab will use. It also enables pattern reuse across multiple ATE platforms and test generations. STIL files include signal definitions, timing domains, waveform shapes, and the complete test vector data.

How does test time affect chip production cost?

ATE time is billed per second, typically at $200–500 per hour ($0.055–0.14 per second) for high-end SoC testers. A chip needing 500 ms of test time costs $0.028–$0.07 per die in ATE cost. For a $3 chip at 100 million annual volume, test cost can be 5–15% of COGS. EDT scan compression reduces test time proportionally to the compression ratio: 64× compression on a 10,000-pattern set reduces effective patterns to ~156 and cuts test time by 64×. Faster ATE shift clocks (200 MHz vs 100 MHz) halve test time. Multiple die-per-insertion (DPI) — testing multiple dies simultaneously — is the most powerful lever: testing 4 dies at once effectively divides the per-die ATE cost by 4. All of these are DFT sign-off considerations because an excessively long test time can make a product economically unviable, even if technically correct.

← Day 9: DFT for Low Power Day 11: JTAG & Boundary Scan →

DFT Sign-offCoverage Targets · DPPM · ATE & Tester Formats