What is Scan Insertion?
In Day 2 we saw how a scan flip-flop (scan FF) differs from a standard FF — it adds a 2:1 mux before the D input, controlled by the scan enable (SE) signal. Scan insertion is the automated EDA flow step where every standard flip-flop in the synthesized netlist is replaced by its scan-capable equivalent, and the resulting scan cells are stitched together into one or more contiguous shift-register chains.
The two dominant tools performing scan insertion are Siemens Tessent (formerly Mentor Tessent) and Synopsys TetraMAX (now folded into the VC Formal/DFT suite). Both accept a gate-level netlist, a scan library (specifying which scan cell replaces which standard cell), and a set of DFT constraints — and produce an augmented netlist ready for ATPG.
At the gate level, a mux-scan flip-flop looks like a standard D flip-flop with an additional input. The scan data input SI (or scan_in) feeds the mux when SE=1. The functional data input D feeds the mux when SE=0. The output Q is shared for both modes.
// Mux-scan flip-flop — industry-standard DFT cell module scan_dff ( input clk, // clock input d, // functional data input si, // scan-in (from previous FF in chain) input se, // scan enable: 1=shift, 0=capture output reg q, // functional output output so // scan-out = q (feeds next FF's si) ); wire d_mux = se ? si : d; // mux: select scan or functional always @(posedge clk) q <= d_mux; assign so = q; // scan-out daisy-chains to next FF's si endmodule
The critical observation: during normal functional operation (SE=0), the scan infrastructure is completely transparent — the mux simply passes D through. The overhead is one mux worth of logic per FF (approximately 2–5% area penalty across the full design) plus the routing of the SI/SE signals.
The SE signal must be held stable before the rising clock edge and must not glitch during the capture cycle. A glitch on SE during capture would cause some FFs to capture scan data instead of functional data, corrupting the test response. DFT sign-off tools perform SE glitch analysis as part of the scan verification flow.
Scan Chain Architecture
Once every FF has been converted to a scan FF, the EDA tool stitches them into scan chains — long shift registers that snake through the design. The first FF in a chain connects its SI port to a scan-in primary port (driven by the ATE). The last FF's SO port connects to a scan-out primary port (sampled by the ATE). Every intermediate FF feeds the next one: FFk.SO → FFk+1.SI.
A design with 10,000 flip-flops could be organized as 10 chains of 1,000 FFs each, or 100 chains of 100 FFs each. The tradeoff is fundamental: fewer, longer chains mean fewer ATE I/O pins but longer shift time; more, shorter chains reduce shift time but require more ATE channels.
| Chain Length | Shift Cycles / Pattern | ATE Pins Required | Best For |
|---|---|---|---|
| 100 FFs / chain | 100 cycles | Many (1 per chain) | Fast test, high pin-count ATE |
| 1,000 FFs / chain | 1,000 cycles | Moderate | Balanced — typical pre-compression |
| 10,000 FFs / chain | 10,000 cycles | Few | Low-pin ATE; slow without compression |
EDT — Embedded Deterministic Test
Without compression, testing a chip with 50,000 FFs organized in 50 chains of 1,000 FFs requires 50 ATE output pins and 50 ATE input pins. Each test pattern takes 1,000 shift cycles. For 5,000 patterns, that is 5 million clock cycles of shift time — even at 100 MHz, that's 50 ms per device. At high-volume manufacturing with millions of units, test cost becomes a significant fraction of chip cost.
EDT (Embedded Deterministic Test), commercialized by Mentor Graphics (now Siemens EDA) as the Tessent EDT product, solves this by placing two small logic blocks on-chip:
- Decompressor — sits between the ATE's few external channels and the chip's many internal scan chains. It expands N external bits into M internal bits deterministically.
- Compactor (Compressor) — sits between the chip's many internal scan chains and the ATE's few output channels. It reduces M response bits to N output bits.
The key insight is that most bits of a deterministic ATPG pattern are "don't care" (X) — only a small number of positions must be set to specific values to sensitize and propagate a fault. The decompressor uses this to reconstruct the full pattern from a compressed seed. This is why the technique is called deterministic — the decompressor is not a random generator; it computes an exact expansion of each seed.
LFSR-based BIST (Built-In Self-Test) also uses hardware to generate patterns on-chip, but the patterns are pseudo-random — not targeted at specific faults. EDT's decompressor generates deterministic patterns that map exactly to ATPG-computed test vectors, achieving 99%+ coverage vs ~85–92% for LFSR BIST.
EDT Architecture Deep Dive
Decompressor — XOR Network Expanding External to Internal
The decompressor receives N external channel bits (say 16 bits from ATE) every shift clock cycle and expands them to drive M internal scan chain heads (say 512 chains). It is structurally similar to a PRPG (Pseudo-Random Pattern Generator) but is programmable. The expansion is achieved through a sparse XOR network: each internal chain head is driven by the XOR of a small subset of external channels.
For example, internal chain 0 might be driven by XOR(ext[0], ext[3], ext[11]). Internal chain 1 might be driven by XOR(ext[2], ext[7]). This network is designed by the EDT tool such that the set of equations is solvable — given any target pattern for the internal chains, there exists a compressed seed (the external channel values) that decompresses to it.
Compactor — XOR Network Reducing Internal to External
The compactor (also called the compressor or MISR-like network) XORs together the outputs of many internal scan chains and delivers the result to a few external output pins. A MISR (Multiple-Input Signature Register) is the classic form: an LFSR that accumulates the XOR of all scan chain outputs, producing a final signature. If any scan response bit differs from expected, the MISR signature changes — the ATE detects the mismatch.
X-Tolerance — Why Unknowns Are Dangerous
The XOR compaction has a fundamental vulnerability: X states (unknown values). If a flip-flop in a scan chain responds with X (because it feeds from an uninitialized memory, a floating tri-state bus, or an analog IP output), that X propagates through the XOR network and corrupts the accumulated signature. This is called X-aliasing — a bad chip may produce the same corrupted signature as a good chip with the same X sources, hiding real defects.
EDT handles X-tolerance through two mechanisms:
- X-aware ATPG — The ATPG tool models X sources during pattern generation and avoids assigning values to internal chains known to produce X responses.
- XOR network design — The compactor's XOR network is designed so that X contributions from one chain are XOR-cancelled by the known response of another chain (when the X source is predictable).
Compression Ratio Calculation
The compression ratio is the single most important figure of merit for scan compression. It tells you how much faster the compressed test runs compared to an uncompressed full-scan test with the same fault coverage.
| External Channels | Internal Chains | Compression Ratio | Approx. Test Time Savings | Area Overhead |
|---|---|---|---|---|
| 32 pins | 512 chains | 16:1 | ~94% | ~0.5% |
| 16 pins | 512 chains | 32:1 | ~97% | ~1.0% |
| 8 pins | 512 chains | 64:1 | ~98.4% | ~1.5% |
| 4 pins | 512 chains | 128:1 | ~99.2% | ~2.5% |
In practice, 32:1 is the most common compression ratio in production designs — it balances test time reduction against X-tolerance limits and decompressor area. Higher ratios are achievable but require more sophisticated X-masking and larger decompressor/compactor networks. The area overhead scales roughly logarithmically because the XOR network grows as O(M log M) rather than O(M²).
ATE time is billed at $0.50–$5.00 per second depending on the tester platform. For a chip with 50 million FFs, reducing test time from 10 seconds to 0.3 seconds (32:1 compression) directly saves several dollars per unit — at 10M units/year, that's tens of millions of dollars annually.
Scan Chain Reordering
After all scan FFs are identified and organized into chains, the placement tool may reorder which FF is connected to which within a chain. The default stitching order from synthesis is typically topological (based on netlist hierarchy), which often means FF1 is in one corner of the chip and FF2 is physically across the chip — requiring long routing wires for the SI/SO connections.
Scan chain reordering reorganizes the stitching so that physically adjacent flip-flops are adjacent in the scan chain. The same set of FFs form the same chains, but the internal order changes. This has two significant benefits:
- Routing reduction — SI/SO connections become short local wires instead of long cross-chip routes. This reduces routing congestion and can improve timing closure.
- Scan power reduction — During shift mode, every FF toggles on every clock cycle. When adjacent FFs in the chain have no functional relationship, their values are essentially random — causing maximum switching activity in the combinational logic between them. Proximity-based reordering groups functionally related FFs together, reducing the switching activity during scan shift.
Scan power during shift can reach 3–5× the chip's normal power consumption — a genuine reliability and test fixture concern. Techniques beyond reordering include scan segmentation (inserting enable gates to clock only active chain segments) and low-power scan cell types that reduce toggle rates.
Scan Insertion Flow — Step by Step
A production scan insertion flow in Tessent (Siemens) involves these stages. The tool reads the synthesized netlist and produces a scan-inserted netlist ready for place-and-route and subsequent ATPG.
# ── Step 1: Set up Tessent context ────────────────────────── set_context dft -design_id top_chip # ── Step 2: Read synthesized netlist and libraries ────────── read_verilog top_chip_synth.v read_cell_library {tsmc7nm_tt.lib tsmc7nm_scan.lib} set_current_design top_chip # ── Step 3: Define DFT constraints ────────────────────────── add_clock 0 clk -period 2.0 add_input_constraint rst_n 0 ;# reset active low held add_scan_enable scan_en ;# SE port name # ── Step 4: Define EDT compression parameters ─────────────── set_edt_options \ -num_channels 16 \ -num_scan_chains 512 \ -x_tolerance 0.15 ;# allow 15% X bits in compactor # ── Step 5: Insert scan cells and EDT logic ───────────────── insert_test_logic # ── Step 6: Verify scan connectivity ──────────────────────── verify_test_logic report_scan_chains -summary # ── Step 7: Write out the scan-inserted netlist ───────────── write_design -verilog top_chip_scan.v write_design -sdc top_chip_scan.sdc
The verify_test_logic step is critical — it simulates the scan chain by shifting a known pattern and checking connectivity, flag any broken chains before the design goes to P&R. Fixing broken chains post-layout is far more expensive than fixing them at this netlist stage.
| Flow Step | Tool Action | Output |
|---|---|---|
| 1. Set DFT constraints | Define clocks, resets, SE port, black boxes | Constraint database |
| 2. Cell replacement | Replace std FFs with scan FFs from library | Augmented netlist |
| 3. Chain stitching | Connect SI/SO ports into chains | Scan chains defined |
| 4. EDT insertion | Add decompressor + compactor logic blocks | EDT-compressed netlist |
| 5. Scan verify | Simulate scan shift, check chain continuity | Chain integrity report |
| 6. Write output | Export scan-inserted Verilog + SDC | Scan netlist for P&R |