Home DFT Course Day 2 — Scan Design Fundamentals
DFT Course · Day 02 of 12

Scan Design Fundamentals
Scan FF · Scan Chain · Shift & Capture

By EcrioniX · Updated June 2026 · ~45 min read
Mux-Scan FF Scan Enable (SE) Shift Mode Capture Mode Scan Chain Scan-In / Scan-Out Full Scan Chain Length

What is Scan Design?

In Day 1 we established the core DFT problem: internal flip-flops have poor controllability and observability, making it hard to test the logic between them. Scan design is the solution — the single most important DFT technique used in virtually every digital chip manufactured today.

The idea is elegant: replace every regular flip-flop with a scan flip-flop that has an extra input multiplexer. When in test mode, all scan FFs connect together to form a long shift register — a scan chain. The tester can now shift in any bit pattern (setting every FF to a known state) and shift out the captured result (reading every FF's value). Every register in the chip is now directly controllable and observable.

The Scan Idea in One Sentence

Connect all flip-flops into a shift register during test mode — so you can load any value into any FF (controllability) and read out any FF's value (observability) one bit at a time through a serial chain.

Scan Flip-Flop Types

Three architectures are used in industry. The mux-scan FF dominates modern ASIC and SoC design.

Type 1
Mux-Scan FF
Standard D-FF with a 2:1 mux at the D input. SE=1 selects scan_in; SE=0 selects functional D. Most common type — used by Synopsys, Cadence, Tessent scan insertion flows.
✓ Simple, minimal overhead, 1 extra mux
✗ SE routing adds one net per FF
Type 2
Clocked-Scan FF
Uses two separate clock edges — one for functional capture, one for scan shift. No mux needed; the scan path is selected by which clock is active. Avoids mux delay but requires two clock nets.
✓ No mux in timing path
✗ Dual-clock routing, harder CTS
Type 3
LSSD (IBM)
Level-Sensitive Scan Design — uses two latches (master-slave) with separate scan clock (C). Fully hazard-free, used in IBM mainframe/server chips. JEDEC-style race-free guarantee.
✓ Race-free, glitch-immune
✗ 2× latch area, complex clocking
Mux-Scan Flip-Flop — internal structure
D (func) SI (scan_in) SE MUX 0 1 SE D_in D-FF posedge clk CLK Q SO / next SI SE Truth Table SE = 0 (functional): D_in ← D SE = 1 (scan/shift): D_in ← SI (scan_in)

Mux-Scan FF in Verilog

Verilog — Mux-Scan Flip-Flop (industry standard)
// Mux-Scan Flip-Flop — the standard scan cell in ASIC/SoC design
// Ports: D (functional), SI (scan_in), SE (scan_enable), CLK, Q, SO (= Q)
module mux_scan_ff #(parameter W = 1) (
  input              clk,
  input              se,        // scan enable: 1 = shift mode
  input  [W-1:0]    d,         // functional data
  input  [W-1:0]    si,        // scan input (from prev FF in chain)
  output reg [W-1:0] q,         // data output
  output [W-1:0]    so         // scan output (feeds next FF's SI)
);
  assign so = q;   // scan-out is always Q

  always @(posedge clk)
    q <= se ? si   // SE=1: SHIFT MODE — capture from scan chain
             : d;   // SE=0: CAPTURE MODE — capture functional data
endmodule
Verilog — Scan chain of 4 FFs (how tool connects them)
// Scan insertion tools (Tessent, TetraMAX) auto-connect FFs like this
// SI connects: tester scan_in → FF0.si → FF1.si → FF2.si → FF3.si → tester scan_out
module scan_chain_4 (
  input       clk, se,
  input       scan_in,       // tester → first FF in chain
  input [3:0] d,             // functional inputs
  output      scan_out,      // last FF → tester
  output[3:0] q
);
  wire [3:0] so;             // inter-FF connections

  // FF0: scan_in feeds SI; FF0.Q feeds FF1.SI
  mux_scan_ff ff0 (.clk(clk), .se(se), .d(d[0]), .si(scan_in), .q(q[0]), .so(so[0]));
  mux_scan_ff ff1 (.clk(clk), .se(se), .d(d[1]), .si(so[0]),   .q(q[1]), .so(so[1]));
  mux_scan_ff ff2 (.clk(clk), .se(se), .d(d[2]), .si(so[1]),   .q(q[2]), .so(so[2]));
  mux_scan_ff ff3 (.clk(clk), .se(se), .d(d[3]), .si(so[2]),   .q(q[3]), .so(so[3]));

  assign scan_out = so[3];   // last FF → tester
endmodule

Shift Mode vs Capture Mode

The complete test cycle for one ATPG pattern has three phases: Shift → Capture → Shift. Understanding this sequence precisely is critical for DFT interviews and for understanding hold-time issues in scan designs.

PhaseSEClocksWhat HappensPurpose
1. Shift IN1N cyclesTest pattern bits shift from scan_in through all N FFs to scan_outLoad test stimulus into all FFs
2. Capture01 cycleSE drops to 0; one functional clock applied; combinational logic evaluates; results captured into FFsExercise logic and capture response
3. Shift OUT1N cyclesSE returns to 1; captured response shifts out through scan_out to testerRead response; compare to expected

Note: Phases 1 and 3 overlap for pipelined testers — while shifting out the current pattern's response, the next pattern's bits are being shifted in simultaneously. This is called scan interleaving and halves the scan overhead.

Scan timing diagram — shift → capture → shift sequence
Signal T1 T2 T3 T4 CAP T6 T7 T8 SHIFT IN (SE=1) CAPTURE SHIFT OUT (SE=1) CLK SE 1 0 1 SI b0 b1 b2 b3 SO cap r0 r1 r2

Scan Chain Architecture

A real design has millions of flip-flops. They are split into multiple scan chains to reduce test time and improve routing. Each chain runs independently — it has its own scan_in and scan_out port, connected to the tester.

Multi-chain scan architecture — 3 chains running in parallel
ATE Tester SI[0..2] SO[0..2] FF[0..N₀] FF[N₀+1..] Chain 0 — N₀ FFs SO[0] FF chain 1 — N₁ FFs SO[1] FF chain 2 — N₂ FFs SO[2] All chains shift in parallel → test time ÷ N_chains

Scan Chain Length Trade-offs

The number of scan chains and their length directly determine test time and routing complexity. The key formula:

Test time per pattern = (Chain_length × shift_period) + (1 × capture_period) ≈ Chain_length × (1/scan_clock_freq) Total test time = N_patterns × Chain_length / scan_freq Example: 1M FFs, 1 chain, 100 MHz scan clock, 10,000 patterns → 1,000,000 × 10,000 / 100MHz = 100 seconds per chip — UNACCEPTABLE Same design, 100 chains (10,000 FFs each): → 10,000 × 10,000 / 100MHz = 1 second per chip — acceptable → Adding more chains reduces test time but increases SI/SO pin count
ParameterMore ChainsFewer Chains
Test time✅ Shorter (shorter chains shift faster)❌ Longer
Tester pin count❌ More SI/SO pins needed✅ Fewer pins
Routing overhead❌ More scan routing wires✅ Less routing
Power during test❌ More simultaneous switching✅ Lower peak power
Fault diagnosis✅ Easier to isolate (shorter chains)❌ Harder to locate fault

In practice, scan compression (Day 3) solves the test-time problem without requiring hundreds of chains — a 64-chain compressed design can test at the speed of 2,000+ internal chains.

Full Scan vs Partial Scan

Full ScanPartial Scan
DefinitionEvery FF replaced by scan FFOnly selected FFs are scan FFs
Fault Coverage>99% stuck-at achievableTypically 80–90% (limited by non-scan FFs)
ATPG complexitySimple — all state is controllableComplex — non-scan FFs require sequential ATPG
Area overhead5–15% (mux + scan routing)Lower area overhead
Test timeDetermined by chain lengthShorter shift but more patterns needed
Industry usageStandard — >95% of designsRare; used only when area is critical
Industry standard is full scan. Scan compression (EDT) solves the test time concern without sacrificing coverage. Partial scan is now rarely justified — the coverage loss and ATPG complexity outweigh the small area saving.

Key Scan Design Rules (DFT Rule Checks)

DFT tools run a set of design rule checks (DRC) before scan insertion. Common violations that must be fixed in RTL or synthesis:

DFT RuleViolationFix
No combinational loopsCombinational feedback path — ATPG can't determine stable stateBreak loop with a register
No gated clocks on FFClock gating blocks scan shift — FF can't shift if CLK is gated offBypass clock gate during test (TE signal)
No async resets from internal logicInternal signal resets FF during test — corrupts scan dataTie async reset inactive during test mode
No multi-driven netsMultiple drivers — X-state propagates, ATPG can't controlFix bus contention, use tri-state properly
SE must be tree-balancedLong SE routing → SE arrives at different FFs at different times → clock skew during captureBuffer SE through balanced tree (like CTS)
No latches (or convert to FFs)Latches are transparent — ATPG treats them as ambiguous sequential elementsConvert latches to FFs or isolate with scan FFs around them

Scan Chain Testbench in Verilog

Verilog — Testbench: shift in pattern, capture, shift out
module tb_scan;
  reg  clk, se, scan_in;
  reg  [3:0] d;
  wire      scan_out;
  wire [3:0] q;

  scan_chain_4 dut (.clk(clk), .se(se), .scan_in(scan_in),
                    .d(d), .scan_out(scan_out), .q(q));

  always #5 clk = ~clk;  // 100 MHz

  task scan_shift(input [3:0] pattern);
    integer i;
    se = 1;                            // enter shift mode
    for (i = 3; i >= 0; i = i - 1) begin
      scan_in = pattern[i];            // MSB first
      @(posedge clk); #1;
    end
  endtask

  task capture;
    se = 0;                            // enter capture mode
    @(posedge clk); #1;               // one functional clock
  endtask

  initial begin
    clk=0; se=0; scan_in=0; d=4'hA;

    // Step 1: Shift in test pattern 4'b1010
    scan_shift(4'b1010);
    $display("After shift-in:  q = %b (expect 1010)", q);

    // Step 2: Capture — one functional clock (se=0)
    capture();
    $display("After capture:   q = %b", q);

    // Step 3: Shift out — read response
    scan_shift(4'b0000);  // shift in zeros while reading out
    $display("Captured response shifted out via scan_out");

    $finish;
  end
endmodule

Day 2 — Interview Questions

Q1What is a scan flip-flop, and how does it differ from a regular D flip-flop?
A scan flip-flop is a D flip-flop with an additional 2:1 multiplexer at the data input, controlled by a scan enable (SE) signal. When SE=0 (functional mode), the mux selects the normal D input — the FF behaves identically to a regular DFF. When SE=1 (shift/test mode), the mux selects the scan input (SI) from the previous FF in the scan chain, forming a shift register. The only overhead vs a regular DFF is the mux (1 gate) and one additional routing net (scan_in). This small overhead gives complete controllability and observability to every register in the design.
Q2What is the purpose of scan enable (SE), and what happens if SE has hold-time violations?
SE is the global control signal that switches all scan FFs between functional mode (SE=0) and scan shift mode (SE=1). SE must be routed to every scan FF in the design and must be stable well before the clock edge that initiates the shift or capture phase. If SE has hold-time violations (it arrives too late at some FFs relative to the clock), some FFs may capture from D instead of SI during shift, or vice versa during capture — corrupting the test pattern or response. SE is typically buffered through a balanced tree (similar to a clock tree) to ensure skew is within hold margins.
Q3Describe the complete scan test cycle for one ATPG pattern.
The scan test cycle has three phases: (1) Shift-in: SE=1, N clock pulses shift the N-bit test pattern from the tester's scan_in port through all N FFs in the chain. Each clock shifts bits by one position. (2) Capture: SE=0, exactly one functional clock is applied. The combinational logic between FFs evaluates based on the loaded pattern, and the results are captured into the FFs. (3) Shift-out: SE=1 again, N clock pulses shift the captured response out through scan_out to the tester. The tester compares the shifted-out response against the expected (fault-free) response to determine pass/fail. While shifting out, the next pattern is simultaneously shifted in (scan interleaving).
Q4Why are multiple scan chains used instead of one long chain?
Test time scales linearly with chain length: T = N_FFs / N_chains × N_patterns / scan_freq. A chip with 1M FFs in a single chain at 100 MHz and 10K patterns takes 100 seconds per chip — unacceptable for production. Using 100 chains of 10K FFs each reduces test time to 1 second. Multiple chains also improve: (1) fault isolation (easier to narrow down which chain has the defect), (2) routing (shorter scan connections between FFs), and (3) flexibility (chains can be grouped by power domain, clock domain, or hierarchical block). The trade-off is more SI/SO tester pins — typically managed by I/O muxing or scan compression.
Q5What DFT issue does clock gating cause, and how is it fixed?
Clock gating (ICG cells) disables the clock to a group of FFs when they are idle, saving power. In scan shift mode, if a scan FF's clock is gated off, the shift register is broken — bits can't propagate through the FF. The fix is to bypass the clock gate during test mode using a Test Enable (TE) signal: in the ICG cell, the gate condition is `CLK_gated = CLK AND (enable OR TE)`. When TE=1 (test mode), all clock gates are forced open, ensuring every scan FF receives the scan shift clock. TE is typically tied to the same signal as SE or controlled independently by the DFT controller.
Q6What is the difference between full scan and partial scan? Which is preferred in industry?
Full scan replaces every flip-flop in the design with a scan FF and connects them all into scan chains — giving complete controllability and observability of all sequential state. Stuck-at fault coverage >99% is achievable. Partial scan only applies scan FFs to a subset of registers, leaving some as regular DFFs. Non-scan FFs have poor controllability/observability, limiting fault coverage to 80–90% and requiring slower sequential ATPG for patterns. Full scan is overwhelmingly preferred in industry today. Scan compression (EDT) handles the test time concern (Day 3), removing the original motivation for partial scan. Partial scan is only considered in extreme area-constrained designs like some analog-mixed-signal blocks.
← Day 1: Fault Models Day 3: Scan Compression →