DFT Day 3 — Scan Insertion & Compression: EDT, X-Tolerance & Compression Ratio

What is Scan Insertion?

In Day 2 we saw how a scan flip-flop (scan FF) differs from a standard FF — it adds a 2:1 mux before the D input, controlled by the scan enable (SE) signal. Scan insertion is the automated EDA flow step where every standard flip-flop in the synthesized netlist is replaced by its scan-capable equivalent, and the resulting scan cells are stitched together into one or more contiguous shift-register chains.

The two dominant tools performing scan insertion are Siemens Tessent (formerly Mentor Tessent) and Synopsys TetraMAX (now folded into the VC Formal/DFT suite). Both accept a gate-level netlist, a scan library (specifying which scan cell replaces which standard cell), and a set of DFT constraints — and produce an augmented netlist ready for ATPG.

At the gate level, a mux-scan flip-flop looks like a standard D flip-flop with an additional input. The scan data input SI (or scan_in) feeds the mux when SE=1. The functional data input D feeds the mux when SE=0. The output Q is shared for both modes.

Verilog — Mux-Scan Flip-Flop (behavioral model)

// Mux-scan flip-flop — industry-standard DFT cell
module scan_dff (
  input  clk,      // clock
  input  d,        // functional data
  input  si,       // scan-in (from previous FF in chain)
  input  se,       // scan enable: 1=shift, 0=capture
  output reg q,    // functional output
  output     so    // scan-out = q (feeds next FF's si)
);
  wire d_mux = se ? si : d;  // mux: select scan or functional

  always @(posedge clk)
    q <= d_mux;

  assign so = q;  // scan-out daisy-chains to next FF's si
endmodule

The critical observation: during normal functional operation (SE=0), the scan infrastructure is completely transparent — the mux simply passes D through. The overhead is one mux worth of logic per FF (approximately 2–5% area penalty across the full design) plus the routing of the SI/SE signals.

Key Concept — Scan Enable Timing

The SE signal must be held stable before the rising clock edge and must not glitch during the capture cycle. A glitch on SE during capture would cause some FFs to capture scan data instead of functional data, corrupting the test response. DFT sign-off tools perform SE glitch analysis as part of the scan verification flow.

Scan Chain Architecture

Once every FF has been converted to a scan FF, the EDA tool stitches them into scan chains — long shift registers that snake through the design. The first FF in a chain connects its SI port to a scan-in primary port (driven by the ATE). The last FF's SO port connects to a scan-out primary port (sampled by the ATE). Every intermediate FF feeds the next one: FF_k.SO → FF_k+1.SI.

A design with 10,000 flip-flops could be organized as 10 chains of 1,000 FFs each, or 100 chains of 100 FFs each. The tradeoff is fundamental: fewer, longer chains mean fewer ATE I/O pins but longer shift time; more, shorter chains reduce shift time but require more ATE channels.

4-FF Scan Chain — Shift Mode (SE=1)

Chain Length	Shift Cycles / Pattern	ATE Pins Required	Best For
100 FFs / chain	100 cycles	Many (1 per chain)	Fast test, high pin-count ATE
1,000 FFs / chain	1,000 cycles	Moderate	Balanced — typical pre-compression
10,000 FFs / chain	10,000 cycles	Few	Low-pin ATE; slow without compression

EDT — Embedded Deterministic Test

Without compression, testing a chip with 50,000 FFs organized in 50 chains of 1,000 FFs requires 50 ATE output pins and 50 ATE input pins. Each test pattern takes 1,000 shift cycles. For 5,000 patterns, that is 5 million clock cycles of shift time — even at 100 MHz, that's 50 ms per device. At high-volume manufacturing with millions of units, test cost becomes a significant fraction of chip cost.

EDT (Embedded Deterministic Test), commercialized by Mentor Graphics (now Siemens EDA) as the Tessent EDT product, solves this by placing two small logic blocks on-chip:

Decompressor — sits between the ATE's few external channels and the chip's many internal scan chains. It expands N external bits into M internal bits deterministically.
Compactor (Compressor) — sits between the chip's many internal scan chains and the ATE's few output channels. It reduces M response bits to N output bits.

The key insight is that most bits of a deterministic ATPG pattern are "don't care" (X) — only a small number of positions must be set to specific values to sensitize and propagate a fault. The decompressor uses this to reconstruct the full pattern from a compressed seed. This is why the technique is called deterministic — the decompressor is not a random generator; it computes an exact expansion of each seed.

EDT vs LFSR-Based BIST

LFSR-based BIST (Built-In Self-Test) also uses hardware to generate patterns on-chip, but the patterns are pseudo-random — not targeted at specific faults. EDT's decompressor generates deterministic patterns that map exactly to ATPG-computed test vectors, achieving 99%+ coverage vs ~85–92% for LFSR BIST.

EDT Architecture Deep Dive

Decompressor — XOR Network Expanding External to Internal

The decompressor receives N external channel bits (say 16 bits from ATE) every shift clock cycle and expands them to drive M internal scan chain heads (say 512 chains). It is structurally similar to a PRPG (Pseudo-Random Pattern Generator) but is programmable. The expansion is achieved through a sparse XOR network: each internal chain head is driven by the XOR of a small subset of external channels.

For example, internal chain 0 might be driven by XOR(ext[0], ext[3], ext[11]). Internal chain 1 might be driven by XOR(ext[2], ext[7]). This network is designed by the EDT tool such that the set of equations is solvable — given any target pattern for the internal chains, there exists a compressed seed (the external channel values) that decompresses to it.

Compactor — XOR Network Reducing Internal to External

The compactor (also called the compressor or MISR-like network) XORs together the outputs of many internal scan chains and delivers the result to a few external output pins. A MISR (Multiple-Input Signature Register) is the classic form: an LFSR that accumulates the XOR of all scan chain outputs, producing a final signature. If any scan response bit differs from expected, the MISR signature changes — the ATE detects the mismatch.

X-Tolerance — Why Unknowns Are Dangerous

The XOR compaction has a fundamental vulnerability: X states (unknown values). If a flip-flop in a scan chain responds with X (because it feeds from an uninitialized memory, a floating tri-state bus, or an analog IP output), that X propagates through the XOR network and corrupts the accumulated signature. This is called X-aliasing — a bad chip may produce the same corrupted signature as a good chip with the same X sources, hiding real defects.

EDT handles X-tolerance through two mechanisms:

X-aware ATPG — The ATPG tool models X sources during pattern generation and avoids assigning values to internal chains known to produce X responses.
XOR network design — The compactor's XOR network is designed so that X contributions from one chain are XOR-cancelled by the known response of another chain (when the X source is predictable).

EDT Architecture — 16 External ATE Channels → 512 Internal Scan Chains → 16 ATE Outputs

Compression Ratio Calculation

The compression ratio is the single most important figure of merit for scan compression. It tells you how much faster the compressed test runs compared to an uncompressed full-scan test with the same fault coverage.

Compression Ratio = Internal scan chains / External ATE channels Example: 512 internal scan chains / 16 external ATE pins = 32:1 compression Test time reduction ≈ Compression Ratio Uncompressed: 512 patterns × 1000 shift cycles = 512,000 cycles Compressed: 512 patterns × (1000/32) cycles = 16,000 cycles

External Channels	Internal Chains	Compression Ratio	Approx. Test Time Savings	Area Overhead
32 pins	512 chains	16:1	~94%	~0.5%
16 pins	512 chains	32:1	~97%	~1.0%
8 pins	512 chains	64:1	~98.4%	~1.5%
4 pins	512 chains	128:1	~99.2%	~2.5%

In practice, 32:1 is the most common compression ratio in production designs — it balances test time reduction against X-tolerance limits and decompressor area. Higher ratios are achievable but require more sophisticated X-masking and larger decompressor/compactor networks. The area overhead scales roughly logarithmically because the XOR network grows as O(M log M) rather than O(M²).

Test Time is Test Cost

ATE time is billed at $0.50–$5.00 per second depending on the tester platform. For a chip with 50 million FFs, reducing test time from 10 seconds to 0.3 seconds (32:1 compression) directly saves several dollars per unit — at 10M units/year, that's tens of millions of dollars annually.

Scan Chain Reordering

After all scan FFs are identified and organized into chains, the placement tool may reorder which FF is connected to which within a chain. The default stitching order from synthesis is typically topological (based on netlist hierarchy), which often means FF1 is in one corner of the chip and FF2 is physically across the chip — requiring long routing wires for the SI/SO connections.

Scan chain reordering reorganizes the stitching so that physically adjacent flip-flops are adjacent in the scan chain. The same set of FFs form the same chains, but the internal order changes. This has two significant benefits:

Routing reduction — SI/SO connections become short local wires instead of long cross-chip routes. This reduces routing congestion and can improve timing closure.
Scan power reduction — During shift mode, every FF toggles on every clock cycle. When adjacent FFs in the chain have no functional relationship, their values are essentially random — causing maximum switching activity in the combinational logic between them. Proximity-based reordering groups functionally related FFs together, reducing the switching activity during scan shift.

Scan power during shift can reach 3–5× the chip's normal power consumption — a genuine reliability and test fixture concern. Techniques beyond reordering include scan segmentation (inserting enable gates to clock only active chain segments) and low-power scan cell types that reduce toggle rates.

Scan Insertion Flow — Step by Step

A production scan insertion flow in Tessent (Siemens) involves these stages. The tool reads the synthesized netlist and produces a scan-inserted netlist ready for place-and-route and subsequent ATPG.

Tessent TCL — Scan Insertion Flow

# ── Step 1: Set up Tessent context ──────────────────────────
set_context dft -design_id top_chip

# ── Step 2: Read synthesized netlist and libraries ──────────
read_verilog     top_chip_synth.v
read_cell_library {tsmc7nm_tt.lib tsmc7nm_scan.lib}
set_current_design top_chip

# ── Step 3: Define DFT constraints ──────────────────────────
add_clock  0 clk -period 2.0
add_input_constraint  rst_n 0    ;# reset active low held
add_scan_enable scan_en         ;# SE port name

# ── Step 4: Define EDT compression parameters ───────────────
set_edt_options \
  -num_channels 16 \
  -num_scan_chains 512 \
  -x_tolerance 0.15  ;# allow 15% X bits in compactor

# ── Step 5: Insert scan cells and EDT logic ─────────────────
insert_test_logic

# ── Step 6: Verify scan connectivity ────────────────────────
verify_test_logic
report_scan_chains -summary

# ── Step 7: Write out the scan-inserted netlist ─────────────
write_design -verilog top_chip_scan.v
write_design -sdc    top_chip_scan.sdc

The verify_test_logic step is critical — it simulates the scan chain by shifting a known pattern and checking connectivity, flag any broken chains before the design goes to P&R. Fixing broken chains post-layout is far more expensive than fixing them at this netlist stage.

Flow Step	Tool Action	Output
1. Set DFT constraints	Define clocks, resets, SE port, black boxes	Constraint database
2. Cell replacement	Replace std FFs with scan FFs from library	Augmented netlist
3. Chain stitching	Connect SI/SO ports into chains	Scan chains defined
4. EDT insertion	Add decompressor + compactor logic blocks	EDT-compressed netlist
5. Scan verify	Simulate scan shift, check chain continuity	Chain integrity report
6. Write output	Export scan-inserted Verilog + SDC	Scan netlist for P&R

Interview Questions & Answers

What is EDT and why is it used instead of full scan without compression?

EDT (Embedded Deterministic Test) places a hardware decompressor between the ATE's external channels and the chip's internal scan chains, and a compactor on the output side. A small number of external ATE pins (e.g., 16) can drive hundreds of internal scan chains simultaneously, because most pattern bits are "don't care" and can be derived from a compressed seed. This reduces test time proportional to the compression ratio (32x, 64x) and allows complex chips to be tested on lower pin-count (cheaper) ATE equipment. Without compression, you need one ATE channel per scan chain — for a chip with 512 chains, that requires 512 ATE I/O pins, which is impractical and expensive.

What causes X-tolerance issues in EDT compactors?

X states (unknown logic values) arise from scan cells that are reachable from uninitialized memory arrays, tri-state buses in high-Z state, analog/mixed-signal IP blocks, or gated clock domains that aren't active during test. When these X values enter the XOR compactor, they corrupt the accumulated signature — a process called X-aliasing. The compactor cannot distinguish between a real fault response and an X-corrupted response, so faults may be masked. EDT addresses this by designing the XOR network to cancel predictable X sources, by using X-aware ATPG that avoids sensitizing X-producing paths, and by setting an X-tolerance threshold that limits how many X bits the compactor can safely handle per pattern.

How is compression ratio calculated?

Compression Ratio = (Number of internal scan chains) / (Number of external ATE channels). For example: 512 internal scan chains / 16 external ATE pins = 32:1 compression. The test time reduction approximately equals the compression ratio because the decompressor fills 32 chains simultaneously in the same shift time that would previously fill only one chain. A 32:1 compression ratio reduces a 50-second test to under 2 seconds. The area overhead for the decompressor and compactor logic is typically 0.5–2% of total chip area.

What is the difference between deterministic and pseudo-random test patterns?

Pseudo-random patterns are generated algorithmically (LFSR) without targeting specific faults. They cover faults quickly up to ~85–92% and then plateau — random-pattern-resistant faults (those requiring very specific input combinations) are left undetected. Deterministic patterns are computed by ATPG algorithms (D-algorithm, PODEM, FAN) that explicitly target each undetected fault and compute the exact input assignments needed to detect it. Deterministic patterns achieve 99%+ stuck-at coverage. EDT is deterministic: the decompressor computes an exact expansion of each compressed seed, reproducing the full ATPG-computed pattern inside the chip even though only a few external bits are transferred per clock cycle.

What happens if a scan chain has a break?

A scan chain break disconnects the SI→Q→SO shift path at some point. Symptoms during test: (1) the scan continuity test fails — a walking-1 pattern shifted in does not appear correctly at scan-out; (2) fault coverage drops dramatically for faults on flip-flops beyond the break point, because they are no longer loadable or readable through scan. To locate the break, a binary-search walking-1 test pinpoints which FF position the break is at. Common causes: missing net connection after an ECO (Engineering Change Order), incorrect cell replacement where the scan port was not re-connected, or a place-and-route open on the SI routing wire. Breaks must be fixed at netlist level before tapeout.

← Day 2: Scan Design Fundamentals Day 4: ATPG →

Scan Insertion & CompressionEDT, X-Tolerance & Compression Ratio