Why Stuck-at Fault Testing Is Not Enough
For decades, the stuck-at fault model was the workhorse of digital test. It assumes a net is permanently stuck at logic 0 (stuck-at-0, SA0) or logic 1 (stuck-at-1, SA1), regardless of the circuit's intended drive. Stuck-at patterns can be generated and applied at slow scan frequency — the ATE shifts patterns in slowly and captures responses. Simple, effective, and cheap to apply.
But as process nodes shrink below 90 nm, a new class of manufacturing defect becomes dominant: delay defects. These are faults where the logic value is correct in steady state, but the signal fails to switch within the required clock period. The gate eventually reaches the right logic level — just not fast enough. At slow scan frequency, such a net looks perfectly healthy. At functional (at-speed) frequency, it causes a timing failure.
Common physical root causes of delay defects include:
- Resistive vias — partial opens in a contact or via increase RC delay of interconnects
- Weak shorts / bridging — partial shorts between nearby nets increase drive resistance
- Thin-oxide gate leakage — defective gate oxide alters transistor drive strength
- Fin variation — in FinFET processes, fin width variation changes drive current
- Local interconnect resistance — at N3/N2, BEOL RC variation between adjacent cells is significant
Stuck-at testing detects permanent faults (wrong logic value). At-speed testing detects timing faults (correct value, wrong timing). Both are required for comprehensive manufacturing test coverage at modern process nodes.
A chip that passes all stuck-at tests but has a resistive via on a critical path may work perfectly at room temperature and nominal voltage — but fail at the operating corner (low-voltage, high-temperature). This is the classic test escape that causes field returns. At-speed testing with the transition fault model closes this gap.
The Transition Fault Model
The transition fault model is the industry-standard dynamic fault model. It extends stuck-at by adding timing awareness. Two fault types exist per net:
| Fault Type | Abbreviation | Transition Failed | Physical Interpretation |
|---|---|---|---|
| Slow-to-Rise | STR | 0 → 1 too slow | Net fails to rise from low to high within the clock period; drive strength too weak or load too large |
| Slow-to-Fall | STF | 1 → 0 too slow | Net fails to fall from high to low within the clock period; PMOS leakage, resistive NMOS, or capacitive loading |
For a design with N nets, the transition fault universe contains 2N faults (one STR + one STF per net) — the same count as stuck-at faults. However, detection conditions differ significantly: to detect a transition fault on a net, you must cause the net to switch in the failing direction at functional frequency and observe the result downstream.
How Transition Faults Differ from Stuck-at
| Property | Stuck-at Fault | Transition Fault |
|---|---|---|
| Nature | Static — logic value is wrong | Dynamic — logic value eventually correct, timing wrong |
| Visible at scan frequency? | Yes — always visible | No — only at functional frequency |
| Test application speed | Slow (shift frequency) | Must be at or near functional clock rate |
| Fault count (N nets) | 2N | 2N |
| Detection requirement | Set net to wrong value, propagate, observe | Force net to switch in failing direction at speed, observe |
| Physical defects targeted | Shorts, opens (complete) | Partial opens, resistive vias, weak drives, delay path defects |
At-Speed Test Requirements
Transition fault testing imposes requirements that stuck-at testing does not. The fundamental constraint: the capture clock must run at functional frequency. A resistive via that adds 200 ps of extra delay on a 1 GHz path (1 ns period) will not be caught if the test clock runs at 100 MHz (10 ns period) — the slow clock gives the defective net plenty of time to settle.
The ATE (Automatic Test Equipment) must therefore support at-speed operation — the ability to generate precise clock edges at the design's functional frequency (e.g., 1 GHz, 2 GHz, or higher) while still being able to shift data in and out at slower scan rates. This requires high-speed driver circuitry, precise timing calibration, and low-jitter clock sources on the tester.
The Two-Cycle At-Speed Test Framework
At-speed transition fault testing always follows a two-cycle protocol:
- Launch cycle — A clock edge (or shift operation) forces the target net to begin its transition (0→1 or 1→0). This initiates the switching event that the transition fault would prevent completing in time.
- Capture cycle — One clock period later (at functional frequency), a second clock edge captures the logic value at downstream flip-flops. If the net completed its transition in time, the correct value is captured. If not (the defect slowed the transition), an incorrect value is captured and the ATPG comparer flags a failure.
Between the two at-speed cycles, the combinational cloud must propagate the transition from the launch flip-flop all the way to the capture flip-flop before the capture edge arrives. The time budget is exactly one functional clock period.
LOC — Launch on Capture
Launch-on-Capture (LOC) is the simpler and more widely used at-speed scheme. In LOC, the normal scan shift operation (running at slow scan frequency) performs double duty: the last shift cycle of the scan load also serves as the launch cycle of the at-speed test.
LOC Operation Sequence
- Slow-speed scan shift: load initialisation values into all scan flip-flops over N shift cycles.
- Final shift cycle (still at scan frequency): this last shift propagates data through the scan chain one more time — and simultaneously acts as the launch clock for the at-speed test. The combinational logic sees the launched transition and begins switching.
- Capture cycle at functional frequency: exactly one functional clock period later, a high-speed capture clock edge fires. Downstream flip-flops capture the combinational logic output.
- Slow-speed scan shift-out: the captured values are shifted out and compared against the expected response.
In LOC, the ATE only needs to produce one functional-frequency clock edge (the capture clock). Everything else runs at slow scan frequency. This makes LOC compatible with most modern ATE platforms.
The drawback of LOC is limited launch-state controllability. Because the launch state is derived from the last scan shift, the launch flip-flop values are constrained by what can be loaded via the scan chain. Some transition faults require a specific launch state that conflicts with the scan chain ordering — these faults may be untestable under LOC. This reduces achievable coverage compared to LOS.
LOS — Launch on Shift
Launch-on-Shift (LOS) applies two consecutive functional-frequency clock cycles. The first at-speed cycle is the launch; the second is the capture. This gives far greater flexibility in the launch state — the ATPG tool can choose any valid functional state as the launch condition, not just what falls out of a scan shift.
LOS Operation Sequence
- Slow-speed scan shift: load the pre-launch state into all scan flip-flops.
- Launch cycle at functional frequency: one high-speed clock edge fires, computing the launch state through the combinational logic and storing it in the scan flip-flops. The combinational cloud simultaneously begins propagating transitions.
- Capture cycle at functional frequency: exactly one functional clock period later, a second high-speed clock edge captures the result.
- Slow-speed scan shift-out: compare captured values against expected.
LOS requires the ATE to produce two back-to-back high-frequency clock edges with functional timing — a more demanding ATE requirement. Not all legacy testers support LOS, but modern high-speed testers (Advantest V93000, Teradyne UltraFLEX) handle it.
LOC vs LOS Comparison
| Property | LOC (Launch-on-Capture) | LOS (Launch-on-Shift) |
|---|---|---|
| ATE requirement | 1 high-speed clock edge (capture only) | 2 consecutive high-speed clock edges |
| Launch state control | Limited — constrained by scan chain order | High — ATPG can set arbitrary launch state |
| Transition fault coverage | Good (typically 88–93%) | Better (typically 92–97%) |
| Pattern generation complexity | Moderate | Higher — harder to find valid launch+capture pairs |
| Hold violation risk | Lower | Higher — both cycles at speed |
| Industry adoption | Very common (default in most flows) | Used when coverage targets demand it |
| Tool support | Tessent, TetraMax, Encounter Test | Tessent, TetraMax (requires LOS mode enable) |
At-Speed Test Waveform: LOC and LOS
Hold-Time Violations at At-Speed
Hold-time violations are a critical concern unique to at-speed testing. In functional operation, the physical design team inserts hold buffers to prevent fast paths from violating hold time at downstream flip-flops. But in test mode, the scan enable and test clock create path configurations that don't exist in functional mode — and these configurations can expose hold-risky paths that were never buffered.
Why At-Speed Capture Creates Hold Risk
During the capture cycle, the functional-frequency clock edge launches data from flip-flops into the combinational logic cloud. Fast combinational paths — those with very short propagation delays — can push data through to downstream flip-flops before the hold window closes. If the propagation delay from FF-A to FF-B is shorter than the hold time of FF-B, FF-B captures the new data instead of the intended old data — a classic hold violation.
ATPG Hold DRC and Pattern Constraints
Modern ATPG tools (Siemens Tessent, Synopsys TetraMax) perform Hold DRC (Design Rule Checking) during transition fault pattern generation. The tool models all paths in the netlist and identifies hold-critical path pairs — cases where the launch-to-capture propagation delay through the combinational logic is shorter than the downstream FF's hold time requirement.
The tool then constrains pattern generation to avoid sensitising these hold-critical paths. Specifically:
- Don't set the launch FF value such that data propagates through the hold-critical path
- Alternatively, mask (X-fill) those capture FF outputs so a hold violation doesn't cause a false failure
- In some flows, add on-chip hold buffers in the scan path to increase minimum path delay
At-Speed ATPG Constraints
Running transition fault ATPG requires configuring the tool with constraints that differ significantly from stuck-at ATPG. The key constraint categories:
Clock Domain Constraints
The ATPG tool must know which flip-flops belong to which clock domain, which clock is the at-speed clock (launch/capture), and what the functional clock period is. For multi-domain designs, only one clock domain is exercised at speed per pattern set — other domains use slow or gated clocks to avoid cross-domain hold issues.
# Set functional clock period for at-speed capture create_clock -period 1.0 -name clk [get_ports CLK] # Define at-speed test mode in Tessent set_dft_signal -view existing_dft \ -type Capture_Clock \ -port CLK \ -capture_procedure single_clock # Select LOC scheme (default in Tessent) set_transition_fault_options \ -launch launch_on_capture # Hold margin: add 100 ps guard band for hold DRC set_transition_fault_options \ -hold_margin 0.1 # Coverage target for sign-off set_fault_coverage_goal transition 95
False Path Handling
Timing false paths in the design (paths that are logically sensitisable but temporally excluded in STA) must also be communicated to ATPG. If ATPG generates a pattern that sensitises a false path, the captured result may differ from the simulation model, causing a false failure in silicon. Most flows use the same SDC file for both STA and ATPG to ensure consistency.
Coverage Targets
Industry sign-off requirements vary by application:
| Application | Transition Fault Coverage Target | Notes |
|---|---|---|
| Consumer / Mobile SoC | > 92% | Cost-sensitive; some coverage traded for test time |
| Server / HPC Processor | > 95% | Field reliability critical; longer test time acceptable |
| Automotive (ISO 26262) | > 95–97% | Safety-critical; DPPM targets < 1 |
| Space / Mil-Aero | > 98% | N-detect strategies, SDD testing mandatory |
Advanced: Small Delay Defects (SDD)
Standard transition fault testing uses a single-detect strategy: one pattern that sensitises a path sufficient to flip the launch-to-capture data value. For large timing slack paths, a single-detect pattern may sensitise a non-critical path — even if the defect on the critical path adds a small delay, the test pattern won't catch it because the non-critical path has enough margin to hide the delay.
Small Delay Defects (SDDs) are manufacturing defects that add a small amount of extra propagation delay — perhaps 50–200 ps — to a path. Under nominal conditions (room temperature, nominal VDD), the path still meets timing. But under worst-case conditions (low VDD, high temperature, process corner) the added delay causes a timing violation. SDDs cause field failures under stress — a nightmare for high-reliability applications.
Why SDDs Are Worse at Advanced Nodes
At TSMC N3 and Samsung SF3:
- Gate lengths shrink to ~10 nm effective, making drive strength extremely sensitive to fin width variation
- BEOL interconnect pitches at M1–M3 are < 30 nm, making local RC variation large relative to total path delay
- Contact-over-active-gate (COAG) structures make partial via opens more likely
- Multi-Vt cell mixing increases sensitivity to local process variation in critical paths
N-Detect Strategy for SDD
The solution is N-detect testing (also called multiple-detect or robust testing). Instead of finding one pattern per fault, the ATPG generates N ≥ 3 patterns that each sensitise the fault through different paths. This forces the ATPG to use tighter paths — paths with less slack — increasing the probability that a small delay defect on a critical path is sensitised at least once.
At nodes below 16 nm: require N-detect with N ≥ 3 for transition faults, plus tightened timing constraints (e.g., 90% of cycle period instead of 100%) to stress marginal paths. Leading foundries such as TSMC provide recommended DFT guidelines for each node.
SDD testing increases pattern count by 3–9× compared to single-detect transition fault testing, but the DPPM improvement at advanced nodes justifies the added test time cost. For a 1 GHz design with 100,000 patterns at single-detect, N=5 detect may produce 400,000–500,000 patterns — still feasible on modern high-pin-count ATE with scan compression.
Interview FAQ: At-Speed Testing
What is the difference between stuck-at and transition fault models?
The stuck-at model assumes a net is permanently stuck at logic 0 or 1 — a static, timing-independent fault. It can be tested at slow scan frequency. The transition fault model targets nets that fail to switch fast enough within the clock period (slow-to-rise or slow-to-fall). It is a dynamic, timing-dependent model that must be tested at or near functional frequency. Stuck-at targets permanent opens/shorts; transition fault targets delay defects like resistive vias and weak drives that pass stuck-at tests but fail at speed.
What is the difference between LOC and LOS at-speed testing?
In LOC (Launch-on-Capture), the last scan shift cycle launches the transition and one subsequent functional-speed clock edge captures the result. Only one high-speed clock edge is needed. In LOS (Launch-on-Shift), two consecutive functional-frequency clock cycles are applied — the first launches, the second captures. LOS achieves higher fault coverage because launch state is more controllable, but demands more capable ATE and creates higher hold violation risk. LOC is the default in most production flows; LOS is used when coverage targets require it.
Why can at-speed testing cause hold violations?
During at-speed capture, the clock fires at functional frequency. Fast combinational paths (short propagation delay) can propagate data from the launch flip-flop to a downstream flip-flop before that FF's hold window has closed — a hold violation. In functional mode, hold violations are fixed by buffer insertion. But in test mode, scan infrastructure creates path configurations that don't exist functionally, exposing short paths that were never buffered. ATPG tools run hold DRC analysis and constrain patterns to avoid sensitising hold-critical paths, or X-fill the affected outputs.
What is the relationship between transition fault coverage and DPPM?
DPPM (Defective Parts Per Million) measures field-escape rate. Higher transition fault coverage directly reduces DPPM from timing-related defects. At nodes below 28 nm, delay defects (resistive vias, weak drives, fin variation) dominate field failures. Going from 90% to 95% transition coverage can reduce timing-related field escapes by 50% or more, depending on defect density and yield. Automotive applications require < 1 DPPM, which drives transition coverage targets above 97% with N-detect strategies.
What are small delay defects and why are they important at advanced nodes?
Small delay defects (SDDs) add a small amount of extra propagation delay (50–200 ps) that doesn't cause immediate failure under nominal conditions but causes timing failures under voltage/temperature stress. At nodes like TSMC N3 and Samsung SF3, SDDs are more common due to fin width variation, local interconnect resistance, and COAG via sensitivity. Standard single-detect transition fault testing may miss SDDs because it uses non-critical paths. N-detect (multiple-detect) strategies with N ≥ 3 force ATPG to use tighter paths, improving SDD coverage and reducing field failures under stress.