What is Scan Design?
In Day 1 we established the core DFT problem: internal flip-flops have poor controllability and observability, making it hard to test the logic between them. Scan design is the solution — the single most important DFT technique used in virtually every digital chip manufactured today.
The idea is elegant: replace every regular flip-flop with a scan flip-flop that has an extra input multiplexer. When in test mode, all scan FFs connect together to form a long shift register — a scan chain. The tester can now shift in any bit pattern (setting every FF to a known state) and shift out the captured result (reading every FF's value). Every register in the chip is now directly controllable and observable.
Connect all flip-flops into a shift register during test mode — so you can load any value into any FF (controllability) and read out any FF's value (observability) one bit at a time through a serial chain.
Scan Flip-Flop Types
Three architectures are used in industry. The mux-scan FF dominates modern ASIC and SoC design.
Mux-Scan FF in Verilog
// Mux-Scan Flip-Flop — the standard scan cell in ASIC/SoC design // Ports: D (functional), SI (scan_in), SE (scan_enable), CLK, Q, SO (= Q) module mux_scan_ff #(parameter W = 1) ( input clk, input se, // scan enable: 1 = shift mode input [W-1:0] d, // functional data input [W-1:0] si, // scan input (from prev FF in chain) output reg [W-1:0] q, // data output output [W-1:0] so // scan output (feeds next FF's SI) ); assign so = q; // scan-out is always Q always @(posedge clk) q <= se ? si // SE=1: SHIFT MODE — capture from scan chain : d; // SE=0: CAPTURE MODE — capture functional data endmodule
// Scan insertion tools (Tessent, TetraMAX) auto-connect FFs like this // SI connects: tester scan_in → FF0.si → FF1.si → FF2.si → FF3.si → tester scan_out module scan_chain_4 ( input clk, se, input scan_in, // tester → first FF in chain input [3:0] d, // functional inputs output scan_out, // last FF → tester output[3:0] q ); wire [3:0] so; // inter-FF connections // FF0: scan_in feeds SI; FF0.Q feeds FF1.SI mux_scan_ff ff0 (.clk(clk), .se(se), .d(d[0]), .si(scan_in), .q(q[0]), .so(so[0])); mux_scan_ff ff1 (.clk(clk), .se(se), .d(d[1]), .si(so[0]), .q(q[1]), .so(so[1])); mux_scan_ff ff2 (.clk(clk), .se(se), .d(d[2]), .si(so[1]), .q(q[2]), .so(so[2])); mux_scan_ff ff3 (.clk(clk), .se(se), .d(d[3]), .si(so[2]), .q(q[3]), .so(so[3])); assign scan_out = so[3]; // last FF → tester endmodule
Shift Mode vs Capture Mode
The complete test cycle for one ATPG pattern has three phases: Shift → Capture → Shift. Understanding this sequence precisely is critical for DFT interviews and for understanding hold-time issues in scan designs.
| Phase | SE | Clocks | What Happens | Purpose |
|---|---|---|---|---|
| 1. Shift IN | 1 | N cycles | Test pattern bits shift from scan_in through all N FFs to scan_out | Load test stimulus into all FFs |
| 2. Capture | 0 | 1 cycle | SE drops to 0; one functional clock applied; combinational logic evaluates; results captured into FFs | Exercise logic and capture response |
| 3. Shift OUT | 1 | N cycles | SE returns to 1; captured response shifts out through scan_out to tester | Read response; compare to expected |
Note: Phases 1 and 3 overlap for pipelined testers — while shifting out the current pattern's response, the next pattern's bits are being shifted in simultaneously. This is called scan interleaving and halves the scan overhead.
Scan Chain Architecture
A real design has millions of flip-flops. They are split into multiple scan chains to reduce test time and improve routing. Each chain runs independently — it has its own scan_in and scan_out port, connected to the tester.
Scan Chain Length Trade-offs
The number of scan chains and their length directly determine test time and routing complexity. The key formula:
| Parameter | More Chains | Fewer Chains |
|---|---|---|
| Test time | ✅ Shorter (shorter chains shift faster) | ❌ Longer |
| Tester pin count | ❌ More SI/SO pins needed | ✅ Fewer pins |
| Routing overhead | ❌ More scan routing wires | ✅ Less routing |
| Power during test | ❌ More simultaneous switching | ✅ Lower peak power |
| Fault diagnosis | ✅ Easier to isolate (shorter chains) | ❌ Harder to locate fault |
In practice, scan compression (Day 3) solves the test-time problem without requiring hundreds of chains — a 64-chain compressed design can test at the speed of 2,000+ internal chains.
Full Scan vs Partial Scan
| Full Scan | Partial Scan | |
|---|---|---|
| Definition | Every FF replaced by scan FF | Only selected FFs are scan FFs |
| Fault Coverage | >99% stuck-at achievable | Typically 80–90% (limited by non-scan FFs) |
| ATPG complexity | Simple — all state is controllable | Complex — non-scan FFs require sequential ATPG |
| Area overhead | 5–15% (mux + scan routing) | Lower area overhead |
| Test time | Determined by chain length | Shorter shift but more patterns needed |
| Industry usage | Standard — >95% of designs | Rare; used only when area is critical |
Key Scan Design Rules (DFT Rule Checks)
DFT tools run a set of design rule checks (DRC) before scan insertion. Common violations that must be fixed in RTL or synthesis:
| DFT Rule | Violation | Fix |
|---|---|---|
| No combinational loops | Combinational feedback path — ATPG can't determine stable state | Break loop with a register |
| No gated clocks on FF | Clock gating blocks scan shift — FF can't shift if CLK is gated off | Bypass clock gate during test (TE signal) |
| No async resets from internal logic | Internal signal resets FF during test — corrupts scan data | Tie async reset inactive during test mode |
| No multi-driven nets | Multiple drivers — X-state propagates, ATPG can't control | Fix bus contention, use tri-state properly |
| SE must be tree-balanced | Long SE routing → SE arrives at different FFs at different times → clock skew during capture | Buffer SE through balanced tree (like CTS) |
| No latches (or convert to FFs) | Latches are transparent — ATPG treats them as ambiguous sequential elements | Convert latches to FFs or isolate with scan FFs around them |
Scan Chain Testbench in Verilog
module tb_scan; reg clk, se, scan_in; reg [3:0] d; wire scan_out; wire [3:0] q; scan_chain_4 dut (.clk(clk), .se(se), .scan_in(scan_in), .d(d), .scan_out(scan_out), .q(q)); always #5 clk = ~clk; // 100 MHz task scan_shift(input [3:0] pattern); integer i; se = 1; // enter shift mode for (i = 3; i >= 0; i = i - 1) begin scan_in = pattern[i]; // MSB first @(posedge clk); #1; end endtask task capture; se = 0; // enter capture mode @(posedge clk); #1; // one functional clock endtask initial begin clk=0; se=0; scan_in=0; d=4'hA; // Step 1: Shift in test pattern 4'b1010 scan_shift(4'b1010); $display("After shift-in: q = %b (expect 1010)", q); // Step 2: Capture — one functional clock (se=0) capture(); $display("After capture: q = %b", q); // Step 3: Shift out — read response scan_shift(4'b0000); // shift in zeros while reading out $display("Captured response shifted out via scan_out"); $finish; end endmodule