DFT Day 5 — At-Speed Testing: Transition Faults, LOC vs LOS & Hold Violations

Why Stuck-at Fault Testing Is Not Enough

For decades, the stuck-at fault model was the workhorse of digital test. It assumes a net is permanently stuck at logic 0 (stuck-at-0, SA0) or logic 1 (stuck-at-1, SA1), regardless of the circuit's intended drive. Stuck-at patterns can be generated and applied at slow scan frequency — the ATE shifts patterns in slowly and captures responses. Simple, effective, and cheap to apply.

But as process nodes shrink below 90 nm, a new class of manufacturing defect becomes dominant: delay defects. These are faults where the logic value is correct in steady state, but the signal fails to switch within the required clock period. The gate eventually reaches the right logic level — just not fast enough. At slow scan frequency, such a net looks perfectly healthy. At functional (at-speed) frequency, it causes a timing failure.

Common physical root causes of delay defects include:

Resistive vias — partial opens in a contact or via increase RC delay of interconnects
Weak shorts / bridging — partial shorts between nearby nets increase drive resistance
Thin-oxide gate leakage — defective gate oxide alters transistor drive strength
Fin variation — in FinFET processes, fin width variation changes drive current
Local interconnect resistance — at N3/N2, BEOL RC variation between adjacent cells is significant

The Core Problem

Stuck-at testing detects permanent faults (wrong logic value). At-speed testing detects timing faults (correct value, wrong timing). Both are required for comprehensive manufacturing test coverage at modern process nodes.

A chip that passes all stuck-at tests but has a resistive via on a critical path may work perfectly at room temperature and nominal voltage — but fail at the operating corner (low-voltage, high-temperature). This is the classic test escape that causes field returns. At-speed testing with the transition fault model closes this gap.

The Transition Fault Model

The transition fault model is the industry-standard dynamic fault model. It extends stuck-at by adding timing awareness. Two fault types exist per net:

Fault Type	Abbreviation	Transition Failed	Physical Interpretation
Slow-to-Rise	STR	0 → 1 too slow	Net fails to rise from low to high within the clock period; drive strength too weak or load too large
Slow-to-Fall	STF	1 → 0 too slow	Net fails to fall from high to low within the clock period; PMOS leakage, resistive NMOS, or capacitive loading

For a design with N nets, the transition fault universe contains 2N faults (one STR + one STF per net) — the same count as stuck-at faults. However, detection conditions differ significantly: to detect a transition fault on a net, you must cause the net to switch in the failing direction at functional frequency and observe the result downstream.

How Transition Faults Differ from Stuck-at

Property	Stuck-at Fault	Transition Fault
Nature	Static — logic value is wrong	Dynamic — logic value eventually correct, timing wrong
Visible at scan frequency?	Yes — always visible	No — only at functional frequency
Test application speed	Slow (shift frequency)	Must be at or near functional clock rate
Fault count (N nets)	2N	2N
Detection requirement	Set net to wrong value, propagate, observe	Force net to switch in failing direction at speed, observe
Physical defects targeted	Shorts, opens (complete)	Partial opens, resistive vias, weak drives, delay path defects

At-Speed Test Requirements

Transition fault testing imposes requirements that stuck-at testing does not. The fundamental constraint: the capture clock must run at functional frequency. A resistive via that adds 200 ps of extra delay on a 1 GHz path (1 ns period) will not be caught if the test clock runs at 100 MHz (10 ns period) — the slow clock gives the defective net plenty of time to settle.

The ATE (Automatic Test Equipment) must therefore support at-speed operation — the ability to generate precise clock edges at the design's functional frequency (e.g., 1 GHz, 2 GHz, or higher) while still being able to shift data in and out at slower scan rates. This requires high-speed driver circuitry, precise timing calibration, and low-jitter clock sources on the tester.

The Two-Cycle At-Speed Test Framework

At-speed transition fault testing always follows a two-cycle protocol:

Launch cycle — A clock edge (or shift operation) forces the target net to begin its transition (0→1 or 1→0). This initiates the switching event that the transition fault would prevent completing in time.
Capture cycle — One clock period later (at functional frequency), a second clock edge captures the logic value at downstream flip-flops. If the net completed its transition in time, the correct value is captured. If not (the defect slowed the transition), an incorrect value is captured and the ATPG comparer flags a failure.

Between the two at-speed cycles, the combinational cloud must propagate the transition from the launch flip-flop all the way to the capture flip-flop before the capture edge arrives. The time budget is exactly one functional clock period.

LOC — Launch on Capture

Launch-on-Capture (LOC) is the simpler and more widely used at-speed scheme. In LOC, the normal scan shift operation (running at slow scan frequency) performs double duty: the last shift cycle of the scan load also serves as the launch cycle of the at-speed test.

LOC Operation Sequence

Slow-speed scan shift: load initialisation values into all scan flip-flops over N shift cycles.
Final shift cycle (still at scan frequency): this last shift propagates data through the scan chain one more time — and simultaneously acts as the launch clock for the at-speed test. The combinational logic sees the launched transition and begins switching.
Capture cycle at functional frequency: exactly one functional clock period later, a high-speed capture clock edge fires. Downstream flip-flops capture the combinational logic output.
Slow-speed scan shift-out: the captured values are shifted out and compared against the expected response.

LOC Key Insight

In LOC, the ATE only needs to produce one functional-frequency clock edge (the capture clock). Everything else runs at slow scan frequency. This makes LOC compatible with most modern ATE platforms.

The drawback of LOC is limited launch-state controllability. Because the launch state is derived from the last scan shift, the launch flip-flop values are constrained by what can be loaded via the scan chain. Some transition faults require a specific launch state that conflicts with the scan chain ordering — these faults may be untestable under LOC. This reduces achievable coverage compared to LOS.

LOS — Launch on Shift

Launch-on-Shift (LOS) applies two consecutive functional-frequency clock cycles. The first at-speed cycle is the launch; the second is the capture. This gives far greater flexibility in the launch state — the ATPG tool can choose any valid functional state as the launch condition, not just what falls out of a scan shift.

LOS Operation Sequence

Slow-speed scan shift: load the pre-launch state into all scan flip-flops.
Launch cycle at functional frequency: one high-speed clock edge fires, computing the launch state through the combinational logic and storing it in the scan flip-flops. The combinational cloud simultaneously begins propagating transitions.
Capture cycle at functional frequency: exactly one functional clock period later, a second high-speed clock edge captures the result.
Slow-speed scan shift-out: compare captured values against expected.

LOS requires the ATE to produce two back-to-back high-frequency clock edges with functional timing — a more demanding ATE requirement. Not all legacy testers support LOS, but modern high-speed testers (Advantest V93000, Teradyne UltraFLEX) handle it.

LOC vs LOS Comparison

Property	LOC (Launch-on-Capture)	LOS (Launch-on-Shift)
ATE requirement	1 high-speed clock edge (capture only)	2 consecutive high-speed clock edges
Launch state control	Limited — constrained by scan chain order	High — ATPG can set arbitrary launch state
Transition fault coverage	Good (typically 88–93%)	Better (typically 92–97%)
Pattern generation complexity	Moderate	Higher — harder to find valid launch+capture pairs
Hold violation risk	Lower	Higher — both cycles at speed
Industry adoption	Very common (default in most flows)	Used when coverage targets demand it
Tool support	Tessent, TetraMax, Encounter Test	Tessent, TetraMax (requires LOS mode enable)

At-Speed Test Waveform: LOC and LOS

At-Speed Test Timing — Scan Shift, LOC Capture, and LOS Two-Cycle

Hold-Time Violations at At-Speed

Hold-time violations are a critical concern unique to at-speed testing. In functional operation, the physical design team inserts hold buffers to prevent fast paths from violating hold time at downstream flip-flops. But in test mode, the scan enable and test clock create path configurations that don't exist in functional mode — and these configurations can expose hold-risky paths that were never buffered.

Why At-Speed Capture Creates Hold Risk

During the capture cycle, the functional-frequency clock edge launches data from flip-flops into the combinational logic cloud. Fast combinational paths — those with very short propagation delays — can push data through to downstream flip-flops before the hold window closes. If the propagation delay from FF-A to FF-B is shorter than the hold time of FF-B, FF-B captures the new data instead of the intended old data — a classic hold violation.

Hold Violation in Test Mode is NOT Functional — Hold violations during at-speed ATPG patterns do not mean the design has a hold violation in functional mode. They are test-mode artifacts caused by the scan infrastructure exposing short paths that are otherwise masked. However, they cause test escapes and false failures if not handled correctly.

ATPG Hold DRC and Pattern Constraints

Modern ATPG tools (Siemens Tessent, Synopsys TetraMax) perform Hold DRC (Design Rule Checking) during transition fault pattern generation. The tool models all paths in the netlist and identifies hold-critical path pairs — cases where the launch-to-capture propagation delay through the combinational logic is shorter than the downstream FF's hold time requirement.

The tool then constrains pattern generation to avoid sensitising these hold-critical paths. Specifically:

Don't set the launch FF value such that data propagates through the hold-critical path
Alternatively, mask (X-fill) those capture FF outputs so a hold violation doesn't cause a false failure
In some flows, add on-chip hold buffers in the scan path to increase minimum path delay

At-Speed ATPG Constraints

Running transition fault ATPG requires configuring the tool with constraints that differ significantly from stuck-at ATPG. The key constraint categories:

Clock Domain Constraints

The ATPG tool must know which flip-flops belong to which clock domain, which clock is the at-speed clock (launch/capture), and what the functional clock period is. For multi-domain designs, only one clock domain is exercised at speed per pattern set — other domains use slow or gated clocks to avoid cross-domain hold issues.

Tessent ATPG — Transition Fault Setup (SDC-style snippet)

# Set functional clock period for at-speed capture
create_clock -period 1.0 -name clk [get_ports CLK]

# Define at-speed test mode in Tessent
set_dft_signal -view existing_dft \
    -type Capture_Clock \
    -port CLK \
    -capture_procedure single_clock

# Select LOC scheme (default in Tessent)
set_transition_fault_options \
    -launch launch_on_capture

# Hold margin: add 100 ps guard band for hold DRC
set_transition_fault_options \
    -hold_margin 0.1

# Coverage target for sign-off
set_fault_coverage_goal transition 95

False Path Handling

Timing false paths in the design (paths that are logically sensitisable but temporally excluded in STA) must also be communicated to ATPG. If ATPG generates a pattern that sensitises a false path, the captured result may differ from the simulation model, causing a false failure in silicon. Most flows use the same SDC file for both STA and ATPG to ensure consistency.

Coverage Targets

Industry sign-off requirements vary by application:

Application	Transition Fault Coverage Target	Notes
Consumer / Mobile SoC	> 92%	Cost-sensitive; some coverage traded for test time
Server / HPC Processor	> 95%	Field reliability critical; longer test time acceptable
Automotive (ISO 26262)	> 95–97%	Safety-critical; DPPM targets < 1
Space / Mil-Aero	> 98%	N-detect strategies, SDD testing mandatory

Advanced: Small Delay Defects (SDD)

Standard transition fault testing uses a single-detect strategy: one pattern that sensitises a path sufficient to flip the launch-to-capture data value. For large timing slack paths, a single-detect pattern may sensitise a non-critical path — even if the defect on the critical path adds a small delay, the test pattern won't catch it because the non-critical path has enough margin to hide the delay.

Small Delay Defects (SDDs) are manufacturing defects that add a small amount of extra propagation delay — perhaps 50–200 ps — to a path. Under nominal conditions (room temperature, nominal VDD), the path still meets timing. But under worst-case conditions (low VDD, high temperature, process corner) the added delay causes a timing violation. SDDs cause field failures under stress — a nightmare for high-reliability applications.

Why SDDs Are Worse at Advanced Nodes

At TSMC N3 and Samsung SF3:

Gate lengths shrink to ~10 nm effective, making drive strength extremely sensitive to fin width variation
BEOL interconnect pitches at M1–M3 are < 30 nm, making local RC variation large relative to total path delay
Contact-over-active-gate (COAG) structures make partial via opens more likely
Multi-Vt cell mixing increases sensitivity to local process variation in critical paths

N-Detect Strategy for SDD

The solution is N-detect testing (also called multiple-detect or robust testing). Instead of finding one pattern per fault, the ATPG generates N ≥ 3 patterns that each sensitise the fault through different paths. This forces the ATPG to use tighter paths — paths with less slack — increasing the probability that a small delay defect on a critical path is sensitised at least once.

SDD Rule of Thumb

At nodes below 16 nm: require N-detect with N ≥ 3 for transition faults, plus tightened timing constraints (e.g., 90% of cycle period instead of 100%) to stress marginal paths. Leading foundries such as TSMC provide recommended DFT guidelines for each node.

SDD testing increases pattern count by 3–9× compared to single-detect transition fault testing, but the DPPM improvement at advanced nodes justifies the added test time cost. For a 1 GHz design with 100,000 patterns at single-detect, N=5 detect may produce 400,000–500,000 patterns — still feasible on modern high-pin-count ATE with scan compression.

Interview FAQ: At-Speed Testing

What is the difference between stuck-at and transition fault models?

The stuck-at model assumes a net is permanently stuck at logic 0 or 1 — a static, timing-independent fault. It can be tested at slow scan frequency. The transition fault model targets nets that fail to switch fast enough within the clock period (slow-to-rise or slow-to-fall). It is a dynamic, timing-dependent model that must be tested at or near functional frequency. Stuck-at targets permanent opens/shorts; transition fault targets delay defects like resistive vias and weak drives that pass stuck-at tests but fail at speed.

What is the difference between LOC and LOS at-speed testing?

In LOC (Launch-on-Capture), the last scan shift cycle launches the transition and one subsequent functional-speed clock edge captures the result. Only one high-speed clock edge is needed. In LOS (Launch-on-Shift), two consecutive functional-frequency clock cycles are applied — the first launches, the second captures. LOS achieves higher fault coverage because launch state is more controllable, but demands more capable ATE and creates higher hold violation risk. LOC is the default in most production flows; LOS is used when coverage targets require it.

Why can at-speed testing cause hold violations?

During at-speed capture, the clock fires at functional frequency. Fast combinational paths (short propagation delay) can propagate data from the launch flip-flop to a downstream flip-flop before that FF's hold window has closed — a hold violation. In functional mode, hold violations are fixed by buffer insertion. But in test mode, scan infrastructure creates path configurations that don't exist functionally, exposing short paths that were never buffered. ATPG tools run hold DRC analysis and constrain patterns to avoid sensitising hold-critical paths, or X-fill the affected outputs.

What is the relationship between transition fault coverage and DPPM?

DPPM (Defective Parts Per Million) measures field-escape rate. Higher transition fault coverage directly reduces DPPM from timing-related defects. At nodes below 28 nm, delay defects (resistive vias, weak drives, fin variation) dominate field failures. Going from 90% to 95% transition coverage can reduce timing-related field escapes by 50% or more, depending on defect density and yield. Automotive applications require < 1 DPPM, which drives transition coverage targets above 97% with N-detect strategies.

What are small delay defects and why are they important at advanced nodes?

Small delay defects (SDDs) add a small amount of extra propagation delay (50–200 ps) that doesn't cause immediate failure under nominal conditions but causes timing failures under voltage/temperature stress. At nodes like TSMC N3 and Samsung SF3, SDDs are more common due to fin width variation, local interconnect resistance, and COAG via sensitivity. Standard single-detect transition fault testing may miss SDDs because it uses non-critical paths. N-detect (multiple-detect) strategies with N ≥ 3 force ATPG to use tighter paths, improving SDD coverage and reducing field failures under stress.

← Day 4: ATPG Algorithms Day 6: LBIST →

At-Speed TestingTransition Faults · LOC · LOS · Hold Violations