DFT Day 9 — DFT for Low Power: X-Masking, Power-Aware ATPG & Scan Segmentation

Q: Why does scan shift cause higher switching activity than functional operation?

During functional operation, the state transitions of flip-flops are governed by the actual workload, and many FFs may hold their value between clock cycles. During scan shift, every scan clock causes the entire chain to shift — each FF takes the value of its predecessor. Because the shift data is essentially pseudo-random (test patterns target specific fault detection, not low-toggle-rate shifts), a large fraction of FFs will transition on every shift clock. Studies show scan shift achieves 40–50% toggle rate per cycle on average, versus 5–15% toggle rate during functional operation. This 2–5x increase in switching activity drives proportionally higher dynamic power, causing IR drop, electromigration stress, and thermal stress during test.

Q: What is X-masking and when is it needed?

X-masking is a DFT technique that selectively blocks ('masks') scan outputs that contain X (unknown) values before those outputs reach a compactor or MISR. X-states arise from uninitialized memories, power-gated domains, tri-state buses, clock-domain crossing elements, or any circuit whose value is genuinely unknown at the time of capture. If X values enter an MISR in LBIST or a compactor in EDT, they corrupt the accumulated signature, causing false fails. X-masking can be implemented in hardware (an AND-mask register per chain output) or handled in software by the ATPG tool, which identifies expected X locations and generates mask bits as part of the test pattern. X-masking is needed whenever the design contains X sources that cannot be eliminated by other means such as test-mode control logic.

Q: How does power-aware ATPG limit switching during scan?

Power-aware ATPG adds a constraint to the pattern generation engine that limits the toggle rate during scan shift. The tool monitors, per shift clock, how many FFs transition from 0 to 1 or 1 to 0. When the toggle count would exceed the specified threshold (commonly 15–20% of chain length), the tool selects alternative test cube assignments or inserts low-switching-activity fill to bring the toggle count below the limit. Some tools also support low-capture-power constraints: during the functional capture cycle, simultaneous transitions in the combinational logic are limited by constraining the number of FF outputs that change between the pre-capture state and the capture state. EDT decompressors can be configured to produce correlated output vectors that naturally limit switching.

Q: What is scan segmentation and why is it used in multi-power-domain designs?

Scan segmentation means dividing the full scan chain into independent segments, each confined to a single power domain. In a multi-VDD or UPF design, different blocks may be power-gated (supply completely cut off) during test. If a scan chain crosses a power domain boundary and the downstream domain is off, the chain is broken — shift data is lost and the scan response is garbage. By segmenting the chain per domain and gating the scan enable (SE) signal into dormant segments, each segment can be independently shifted when its domain is powered. Isolation cells at domain crossings ensure that outputs from a powered domain driving into a gate-off domain do not produce spurious X states. Scan enable gating logic or a dedicated test power sequencing protocol manages which segment is active at any given time.

Q: How do retention registers affect DFT?

Retention registers are special flip-flops with a shadow latch that preserves state through a power-off event. They are used in power-gated domains to retain critical state across power-down cycles. For DFT, retention registers must be included in the scan chain just like ordinary FFs — otherwise they are uncontrollable and unobservable. The challenge is that a retention register has an extra pin (SAVE, RESTORE, or RETAINN) that controls the shadow-latch transfer. During scan test, the DFT tool must ensure: the retention mode is not accidentally triggered during shift (which would corrupt scan data), and the retention pin is driven correctly by the test mode control logic. ATPG models the retention register as a multi-port cell; test patterns that target faults inside the retention register's combinational shadow logic require specific SAVE/RESTORE sequencing.

Why Power Matters in DFT

Every VLSI engineer learns early that scan shift consumes far more power than normal functional operation. The gap is not marginal — measurements on production SoCs consistently show that scan shift causes 2–5× the switching activity of the chip's worst-case functional workload. For a chip designed with a 1 W thermal envelope, running the scan chains at full speed can briefly draw 3–5 W. The consequences are severe:

IR drop: The surge in dynamic current causes the supply voltage to sag at distant corners of the die. If IR drop exceeds 10% of VDD, flip-flops near the affected region may miscapture, turning good logic into a test fail — even though the silicon has no defect.
Electromigration (EM): Unusually high current densities in power-grid metal lines during scan can exceed EM reliability limits, shortening the chip's long-term reliability even if it does not fail at test time.
Thermal stress: Concentrated switching in a specific region heats that region faster than the package can dissipate. Scan patterns that repeatedly toggle the same logic cluster can produce localized hot spots.
Test-induced yield loss (TIYL): The most insidious effect. A chip with no real defects fails the test purely because scan-induced stress (IR drop, thermal) temporarily pushes it outside its operating window. TIYL reduces the apparent yield without any wafer-processing problem.

Modern SoC design rules at 7 nm and below mandate power-aware DFT as a sign-off requirement — not an optional refinement. DFT engineers must demonstrate that switching activity during scan stays within the power grid's safe operating window before the design is taped out.

Key Insight

Scan shift is essentially random data streaming through every flip-flop on every clock edge. Without deliberate constraints, the toggle rate approaches 50% — the statistical average for uncorrelated binary data. Functional logic rarely reaches even 20% toggle rate because data values are correlated and many paths are idle.

Switching Activity: Functional vs Scan Shift

X-States and X-Masking

In digital simulation, logic values are not just 0 or 1. They can also be X — an unknown or don't-care value. X-states arise from multiple sources in real designs:

Uninitialized memories: SRAMs and register files have indeterminate content at power-on.
Tri-state buses: Bus outputs floating when no driver is enabled.
Power-gated domains: Flip-flops inside a shut-off power domain output X after the domain is powered down.
Clock domain crossings (CDC): Metastability in synchronizers can produce X at simulation time.
Functional don't-cares: Some outputs are genuinely not important in certain operating modes.

X-states are toxic to DFT in two distinct ways. First, in LBIST with MISR (Multiple Input Signature Register): the MISR accumulates a compressed signature of all scan outputs over many capture cycles. If a single X propagates into the MISR, it corrupts the signature. The chip will show a mismatching signature even if all real faults are absent — a false fail. Second, in ATPG compaction: EDT and other compactors rely on combining multiple fault-detecting vectors. If an X appears in a scan cell output that the compactor is combining, the combination is invalid.

Rule of Thumb: Never allow an X source to propagate to a scan output (scan_out) or a compactor input without explicitly masking or blocking it. Unmasked X values guarantee false failures and wasted ATE time.

What X-Masking Does

X-masking selectively blocks scan chain outputs that are known to contain X values before they enter the signature accumulator or compactor. The mask acts as a qualifier: a '0' mask bit silences that scan output (the value is ignored), while a '1' mask bit passes the output through for comparison.

X-Masking Architecture

X-Masking Implementation

There are two primary implementation approaches, each with different tradeoffs in hardware cost versus ATE memory.

Hardware X-Masking

A dedicated AND-mask register is added — one flip-flop per scan output. During the mask-shift phase, the ATE loads the mask values into this register. Each scan output is then ANDed with its mask bit before reaching the compactor or MISR. A '0' mask bit completely suppresses that chain output. Hardware X-masking is the most robust approach because it is pattern-independent — the mask register is loaded once per test segment, not once per pattern.

Software X-Masking (ATPG-based)

The ATPG tool's fault simulation engine identifies, for each test pattern, which scan output bits will contain X values when the pattern is applied to the design. The mask bits are encoded as part of the test pattern itself and are included in the ATE stimulus file. The ATE applies the mask bits to the mask register just before unloading each scan chain.

Method	How It Works	Hardware Cost	ATE Memory Impact	Best For
Hardware X-masking	AND-mask register per scan output; loaded by ATE before test	1 FF per scan output	Low (mask is static per segment)	LBIST, EDT compaction
Software X-masking	ATPG encodes mask bits into pattern; ATE shifts them per pattern	None	High (mask bits per pattern per chain)	Stuck-at ATPG with few X sources
X-tolerance (EDT)	EDT decompressor has X-tolerance; X at decompressor output fills without corrupting	EDT hardware already present	None extra	EDT-compressed designs

Power-Aware ATPG

Standard ATPG has one goal: detect as many faults as possible with as few patterns as possible. It has no awareness of how much power those patterns consume when shifted through the scan chains. Power-aware ATPG adds a second objective: ensure that the switching activity during every shift clock stays below a specified toggle rate threshold.

Toggle Rate Constraint

A typical constraint is: "No more than 15% of all scan cells may transition (0→1 or 1→0) on any single shift clock." The ATPG tool tracks, for each tentative don't-care fill assignment, the resulting toggle count increment. When the cumulative toggle count for a shift clock would exceed the threshold, the tool backtracks and selects a different fill value — one that doesn't cause an additional transition.

Low-Capture-Power Constraints

The capture cycle — the single functional clock applied after loading the test pattern — is often the highest-power moment in a scan test. All the loaded pattern values propagate through the combinational logic simultaneously. Power-aware ATPG can apply low-capture-power constraints that limit the number of simultaneously switching primary inputs and internal flip-flop outputs during capture. This reduces the instantaneous current spike at the capture edge.

ATPG Constraint — Power-Aware Settings (Synopsys TetraMAX / Tessent)

## TetraMAX power-aware ATPG constraints
## Limit shift switching activity to 15% of chain length
set_dft_signal -type ScanClock -view existing_dft \
    -port clk -timing {45 55}

## Low power shift constraint
set_atpg_constraint -shift_power_limit 15   ; # 15% max toggle rate
set_atpg_constraint -capture_power_limit 20  ; # 20% max capture toggle

## Tessent equivalent
set_pattern_filtering -low_power_shift on
set_pattern_filtering -shift_power_factor 0.15
set_pattern_filtering -capture_power_factor 0.20

## Run ATPG with power constraint
run_atpg -effort high

The tradeoff: applying power constraints increases pattern count. When the tool cannot use its preferred don't-care fill (because it would cause too many toggles), it may need a separate pattern to cover faults that could have been combined. Industry experience shows a 15–20% power limit increases pattern count by 10–40% compared to unconstrained ATPG. This is an acceptable tradeoff given the alternative is TIYL.

Toggle Rate Analysis

Before applying power-aware ATPG, DFT engineers run toggle rate analysis to characterize the baseline switching activity of the scan chains under the nominal (unconstrained) pattern set. This analysis reports:

Per-pattern average toggle rate across all chains
Per-scan-cell toggle count (identifies "hot" cells that transition frequently)
Maximum single-shift-clock toggle count (the worst-case IR drop moment)
Total energy estimate per pattern (useful for thermal budgeting)

Toggle Rate (per shift clock) = (Transitions on that clock) ÷ (Total scan cells) Average Toggle Rate = (Sum of all transitions across all shift clocks) ÷ (Total cells × Total shift clocks) Target: Average Toggle Rate < 20% Peak Toggle Rate (any single clock) < 30%

Design Node	Typical Unconstrained Toggle Rate	Industry Target	Impact if Exceeded
28 nm and above	35–45%	<25%	IR drop, minor TIYL risk
16/14 nm FinFET	40–50%	<20%	Significant IR drop, EM risk
7/5 nm	45–55%	<15%	High TIYL risk, thermal excursion
3 nm and below	50%+	<10–12%	Mandatory constraint; test without it fails sign-off

Scan Segmentation for Power Domains

Multi-VDD SoC designs partition the chip into several power domains — independent regions that can be powered up or down independently via power switches (PSW). This is the foundational technique for reducing leakage in idle blocks. The challenge for DFT: a scan chain that crosses a power domain boundary is physically broken when the downstream domain is off.

Critical Rule: Scan chains must never cross a power domain boundary without explicit isolation and segmentation logic. An unmanaged cross-domain chain will produce all-X scan responses when the downstream domain is power-gated, causing 100% false-fail on every test.

The Segmentation Solution

The DFT tool (Tessent Shell, Synopsys DFT Compiler) automatically segments scan chains per power domain when given a UPF/CPF power intent file. Each segment is a self-contained mini scan chain confined to one domain. The segments connect through the ATE's scan-in/scan-out pins — not through on-chip daisy-chaining across domain boundaries.

Scan Segmentation — Two Power Domains

When Domain B is powered off, the DFT controller deasserts its scan enable (SE_B) and bypasses or skips that segment. Only Domain A's scan segment is shifted. Test coverage for Domain B is achieved in a separate power-on test mode where Domain B is brought up.

UPF-Aware DFT

The Unified Power Format (UPF, IEEE 1801) is the industry-standard way to express power intent — power domains, supply nets, power switches, isolation cells, level shifters, and retention registers. DFT tools that read UPF files are called UPF-aware, and they adjust their scan insertion and ATPG behavior accordingly.

Isolation Cells

At the boundary of a power-gated domain, isolation cells clamp outputs to a safe value (0 or 1) when the domain is off. During ATPG, the tool must model isolation cell behavior: when Domain B is off, its outputs to Domain A are clamped to the isolation value (not X). This allows ATPG to generate correct patterns for logic in Domain A that receives signals from Domain B through isolation cells.

Level Shifters

Signals crossing between VDD1 and VDD2 supply domains pass through level shifters that translate voltage levels. Level shifters are modeled as transparent buffers (functionally) during ATPG, but their internal structure must be tested too — typically via ATPG patterns that set their inputs to 0 and 1 and verify propagation through the shifted output.

Retention Registers

Retention registers are flip-flops with a shadow latch (balloon latch) that preserves data through power-down events. When the domain powers down, a SAVE signal transfers the FF's state to the shadow latch. When the domain powers back up, a RESTORE signal copies it back. For DFT:

Retention FFs must be included in the scan chain (they need to be tested like any other FF).
The SAVE/RESTORE control must be inactive during scan shift — otherwise, asserting SAVE would transfer scan data to the shadow latch, corrupting both.
ATPG generates specific patterns to test the retention path itself: SAVE, power-down, RESTORE, and verify the retained value.

Clock Gating in DFT

Clock gating is the primary dynamic power reduction technique in synchronous designs. An ICG (Integrated Clock Gate) cell — a latch-based gate that suppresses the clock when the enable is deasserted — sits between the clock distribution network and the flip-flops in a given cluster. For test, clock gating creates a serious problem: if a clock gate is closed during scan shift, the FFs it feeds will not receive any shift clock pulses. Those FFs become invisible to the scan chain — they cannot be loaded or unloaded.

Test Mode Override

The solution is simple in concept but must be implemented carefully: during test mode, the scan enable (SE) signal is connected into the clock gate's enable path so that SE=1 forces the clock gate open. The standard approach is to OR the ICG enable with scan_enable:

Clock Gate Test Override — Verilog

// Integrated Clock Gate (ICG) with test override
module icg_cell (
  input  EN,          // functional enable
  input  SE,          // scan enable (test mode)
  input  CLK,         // clock in
  output GCLK         // gated clock out
);
  reg en_latch;
  // Latch: transparent when CLK=0 (level-sensitive hold)
  always @(*) begin
    if (~CLK) en_latch = EN | SE;  // SE forces gate open in test mode
  end
  assign GCLK = en_latch & CLK;
endmodule

// In the design hierarchy, ICG is instantiated as:
// icg_cell u_icg (.EN(func_en), .SE(scan_enable), .CLK(clk), .GCLK(gated_clk));
// During scan shift: scan_enable=1 → GCLK follows CLK → all FFs receive shift clocks
// During functional: scan_enable=0 → GCLK gated by func_en as normal

Missing clock gate test override is a common DFT bug. The symptom is a chain of FFs that appear stuck — they shift in a constant value regardless of scan_in. The DRC (Design Rule Check) step in scan insertion tools catches this and flags every ICG that lacks SE override as an untestable clock gate violation.

EDT Low-Power Features

Mentor Tessent's EDT (Embedded Deterministic Test) architecture includes built-in low-power features beyond what standard ATPG provides. Understanding these features is important for DFT engineers using EDT-compressed scan in low-power SoCs.

Correlated Pattern Generation

EDT's decompressor generates multiple scan chain inputs from a small number of ATE channels through a linear decompression network. The decompressor introduces correlations between what different chains receive. Low-power EDT exploits this: by choosing decompressor seeds that produce correlated bit streams, adjacent scan cells (which are often in the same clock domain) receive similar values. Since similar adjacent values mean fewer transitions, the correlated seeds naturally reduce switching activity without sacrificing fault coverage.

X-Tolerance in EDT

EDT's decompressor is inherently X-tolerant up to a design-time parameter called the X-tolerance level. If the design has, say, 5% of scan bits containing X values, and the EDT X-tolerance is set to 8%, the decompressor can accommodate all X sources without requiring explicit per-pattern X-masking. This simplifies the test flow significantly.

EDT Feature	What It Does	Power Benefit	Coverage Impact
Correlated seeds	Adjacent chains get similar values from decompressor	15–30% toggle rate reduction	Slight increase in pattern count
X-tolerance	Decompressor absorbs X values; no extra masking needed	Avoids mask register power overhead	None (coverage maintained)
Low-power shift (LPSHIFT)	Insert shift-pauses; shift slowly in high-power zones	Thermal peak reduction	None (coverage same, test time longer)
Segment gating	Power-gate inactive EDT segments during shift	30–50% power for gated segment	None

The combination of power-aware ATPG constraints, EDT correlated decompression, and scan segmentation can reduce test-mode power consumption to within 1.2–1.5× of functional power — a dramatic improvement over unconstrained scan's 3–5× overhead.

Interview FAQ — DFT for Low Power

Why does scan shift cause higher switching activity than functional operation?

During functional operation, flip-flop transitions are driven by the actual workload. Many FFs hold their value between cycles (data correlations, idle paths), so the average toggle rate is 5–15%. During scan shift, the scan chain is essentially a long shift register carrying pseudo-random test pattern data. Because ATPG patterns are designed to exercise all fault sites — not to minimize transitions — adjacent bits in the shifted data are uncorrelated. Statistically, uncorrelated binary data causes a 50% toggle rate per shift clock (each bit has a 50% chance of being different from its predecessor). Without constraints, scan shift reaches 40–50% toggle rate, which is 2–5× higher than functional. This drives proportionally higher dynamic power, causing IR drop and thermal stress.

What is X-masking and when is it needed?

X-masking selectively blocks scan chain outputs that contain X (unknown) values before those outputs reach a compactor or MISR. X-states come from uninitialized memories, power-gated domains, tri-state buses, or clock-domain crossing elements. If X values reach an MISR in LBIST, they corrupt the signature and cause false fails. If X values reach an EDT compactor, they invalidate combined patterns. X-masking is needed whenever X sources propagate to scan output pins and cannot be eliminated by other means (test-mode initialization, power sequencing, or X-bounding). Hardware X-masking uses an AND-mask register; software X-masking has the ATPG tool encode mask bits into each pattern.

How does power-aware ATPG limit switching during scan?

Power-aware ATPG adds a toggle rate constraint to the ATPG engine's don't-care fill algorithm. When filling unspecified (don't-care) scan bit positions, the tool tracks how many transitions the fill would cause on each shift clock. If a fill assignment would push the toggle count above the threshold (e.g., 15% of chain length), the tool selects a different fill value — one that does not add a transition. This is possible because don't-care bits can be assigned 0 or 1 without affecting fault detection; the tool simply assigns whichever value produces fewer transitions. Power-aware fill increases pattern count by 10–40% because some faults that could have been combined in one pattern (using free don't-care bits) now require separate patterns with constrained fills.

What is scan segmentation and why is it used in multi-power-domain designs?

Scan segmentation divides the full scan chain into independent sub-chains (segments), each confined to a single power domain. In multi-VDD / UPF designs, a power domain can be fully shut off during test (power switch open, domain supply at 0 V). If a scan chain crosses into a shut-off domain, the chain is physically broken — shift data is lost and the output is all-X. Scan segmentation solves this by never connecting chains across domain boundaries on-chip. Each segment has its own scan-in, scan-out, and scan-enable path. The DFT controller (or ATE) activates each segment only when its power domain is on. Isolation cells at domain boundaries prevent powered domains from being corrupted by floating outputs from off domains.

How do retention registers affect DFT?

Retention registers are flip-flops with a shadow latch that preserves state through power-off events. For DFT, they must be included in the scan chain — they are testable FFs like any other. The key constraint is that the SAVE and RESTORE control signals must be held inactive during scan shift and capture; asserting SAVE during shift would transfer the shifting data into the shadow latch, and asserting RESTORE would overwrite the shifted data — both corrupt the scan test. The DFT tool must tie these controls appropriately in test mode. ATPG also needs to model the retention register's internal structure to generate patterns that test the shadow path: specifically, patterns that SAVE a value, simulate power-off, RESTORE, and then verify the retained value propagated correctly through the retention latch.

← Day 8: At-Speed Testing Day 10: DFT Sign-off →

DFT for Low PowerX-Masking · Power-Aware ATPG · Scan Segmentation