Testbench Architecture
A complete verification environment for the accelerator consists of: a driver that generates stimulus on the AXI4-Lite control port, a monitor that observes the AXI4 data bus, a reference model (software C) that computes expected results, and a scoreboard that compares actual vs expected.
SystemVerilog — Accelerator testbench with scoreboard
module accel_tb; parameter N = 4; logic clk = 0; always #5 clk = ~clk; logic rst = 1; initial #20 rst = 0; // DUT signals logic [31:0] ctrl_addr, ctrl_wdata, ctrl_rdata; logic ctrl_wen, ctrl_ren, ctrl_wready, ctrl_rvalid; logic accel_irq; // DUT systolic_soc_top #(.N(N)) dut ( .clk(clk), .rst(rst), .s_axilite_awaddr(ctrl_addr), .s_axilite_wdata(ctrl_wdata), .s_axilite_wvalid(ctrl_wen), .s_axilite_wready(ctrl_wready), .s_axilite_araddr(ctrl_addr), .s_axilite_rdata(ctrl_rdata), .s_axilite_arvalid(ctrl_ren), .s_axilite_rvalid(ctrl_rvalid), .irq(accel_irq) ); // Shared memory model (4KB) logic [7:0] mem[0:4095]; // Reference model function automatic void ref_matmul( input logic [7:0] A[N][N], B[N][N], output logic [31:0] C[N][N] ); for (int i=0; ifor (int j=0; j begin C[i][j] = 0; for (int k=0; k signed'(A[i][k]) * signed'(B[k][j]); end endfunction task automatic run_test(input logic [7:0] A[N][N], B[N][N]); logic [31:0] C_ref[N][N], C_dut[N][N]; // Load A and B into memory model at 0x000 and 0x100 foreach(A[i,j]) mem[i*N+j] = A[i][j]; foreach(B[i,j]) mem[256+i*N+j] = B[i][j]; // Configure accelerator via AXI4-Lite writes axi_write(32'h08, N); // REG_N axi_write(32'h10, 32'h00000000); // REG_A_ADDR axi_write(32'h18, 32'h00000100); // REG_B_ADDR axi_write(32'h20, 32'h00000200); // REG_C_ADDR axi_write(32'h00, 32'h1); // CTRL_START // Wait for IRQ or poll @(posedge accel_irq); ref_matmul(A, B, C_ref); // Compare foreach(C_ref[i,j]) begin C_dut[i][j] = {mem[512+i*N*4+j*4+3], mem[512+i*N*4+j*4+2], mem[512+i*N*4+j*4+1], mem[512+i*N*4+j*4]}; assert(C_dut[i][j] === C_ref[i][j]) else $error("MISMATCH [%0d][%0d]: got %0d, exp %0d", i, j, C_dut[i][j], C_ref[i][j]); end $display("PASS: N=%0d matmul verified", N); endtask initial begin @(negedge rst); repeat(5) @(posedge clk); run_test('{'{1,2,0,0},'{3,4,0,0},'{0,0,0,0},'{0,0,0,0}}, '{'{5,6,0,0},'{7,8,0,0},'{0,0,0,0},'{0,0,0,0}}); $finish; end endmodule
SVA Assertions for AXI4 Protocol
SystemVerilog Assertions — AXI4-Lite protocol checks
// AXI4-Lite protocol properties // Once AWVALID asserted, hold until AWREADY property axi_awvalid_stable; @(posedge clk) disable iff(rst) $rose(awvalid) |-> awvalid throughout (awready [->1]); endproperty AWV_STABLE: assert property(axi_awvalid_stable); // WVALID must not assert before AWVALID (write data before address) property axi_no_data_before_addr; @(posedge clk) disable iff(rst) wvalid && !awvalid |-> ##[0:$] awvalid; endproperty NO_DATA_FIRST: assert property(axi_no_data_before_addr); // BVALID must eventually de-assert after BREADY property axi_bvalid_clears; @(posedge clk) disable iff(rst) bvalid && bready |=> !bvalid; endproperty BVAL_CLR: assert property(axi_bvalid_clears); // RVALID must assert within 256 cycles of ARVALID+ARREADY property axi_rvalid_deadline; @(posedge clk) disable iff(rst) (arvalid && arready) |-> ##[1:256] rvalid; endproperty RVAL_DL: assert property(axi_rvalid_deadline); // STATUS[BUSY] must go low within 10000 cycles of START property accel_completes; @(posedge clk) disable iff(rst) $rose(reg_start) |-> ##[1:10000] (STATUS_DONE == 1); endproperty ACCEL_DONE: assert property(accel_completes);
Day 12 — Interview Questions
Q1What is the difference between a directed test and a constrained-random test?
A directed test has a fixed, manually written stimulus targeting a specific scenario (e.g., matrix size N=4, all-ones inputs). It is deterministic and easy to debug but covers only the scenarios the engineer thought to test. A constrained-random test uses a randomisation engine (SystemVerilog's randomize()) to generate stimulus within specified constraints (e.g., N between 1 and 64, weights in range −128 to 127). This explores the input space automatically and finds corner cases the engineer didn't anticipate. The combination is best practice: constrained-random for broad coverage + directed tests for known corner cases (N=1, N=max, zero matrices, max-value overflow). Coverage-driven verification uses functional coverage metrics to guide which random scenarios to generate more of.
Q2What is an SVA (SystemVerilog Assertion) and how does it differ from a testbench check?
An SVA is a temporal logic property embedded in the RTL or testbench that specifies a timing relationship that must hold throughout simulation. Unlike a testbench check (a procedural if/assert statement checked at one point in time), an SVA uses the property/assert construct to check a condition across clock cycles: e.g., "if AWVALID is asserted, it must remain asserted until AWREADY." SVAs are evaluated every clock cycle by the simulator engine, so violations are caught the moment they occur — not just when the testbench happens to check. SVAs can also be passed to formal verification tools, which prove the property holds for all possible inputs without simulation. This makes SVAs both a simulation monitor and a formal specification language.
Q3What is functional coverage and how do you use it to know when verification is complete?
Functional coverage measures how much of the design's intended behaviour space has been exercised by the testbench. It is defined using covergroups and coverpoints in SystemVerilog: e.g., a coverpoint for matrix_N with bins [1:4], [5:16], [17:64], and a cross-coverage between matrix_N and whether DMA used interrupt mode vs polling. The simulator tracks which bins have been hit during simulation. Verification is considered complete when: (1) All functional coverage bins reach their target hit count, (2) Code coverage (line, branch, toggle) is above threshold (typically 95%+), (3) All SVA assertions pass, (4) Zero simulation failures in regression. The key insight: code coverage alone is not enough — you can have 100% line coverage but never test N=1 (degenerate case) if the testbench only generates N=4.
Q4What is formal verification and when should you use it for an accelerator?
Formal verification uses mathematical proof (model checking, SAT/SMT solvers) to exhaustively check whether a property holds for all possible inputs and state sequences — without simulation. It is particularly valuable for: (1) Protocol compliance — prove the AXI4 handshake is always correct, not just for the tested cases, (2) Safety properties — prove the STATUS register is never simultaneously BUSY and DONE, (3) Liveness — prove the accelerator always eventually asserts DONE after START (no deadlock), (4) Reset correctness — prove all registers reach their reset values. Formal has a state-space explosion problem for large designs but is very effective for small, isolated modules (the DMA FSM, the AXI4-Lite slave register block) where it can achieve 100% proof closure that simulation never could.
Q5How do you verify the DMA engine's AXI4 burst correctness?
DMA burst verification requires checking: (1) Burst length matches the configured transfer size (ARLEN = N_bytes/bus_width − 1), (2) Address increments correctly for INCR burst type (ARBURST=01), (3) The DMA never exceeds a 4 KB address boundary in a single burst (AXI4 rule), (4) ARVALID remains asserted until ARREADY, and RREADY is asserted to accept every beat. In simulation: use an AXI4 VIP (Verification IP) as the memory model — it implements protocol checking built-in and responds to bursts. Add SVA to check ARLEN consistency and the 4 KB boundary rule. For formal: bind the DMA module to a small property set and prove the burst protocol properties. Also verify data integrity: the data written to the output scratchpad must exactly match the DRAM source bytes, using a scoreboard that compares byte-by-byte.
Q6What is the role of a UVM monitor and how does it connect to the scoreboard?
A UVM monitor is a passive component that observes the signal-level activity on an interface (e.g., AXI4 bus) without driving any signals. It reconstructs higher-level transactions from the bus activity: when it sees AWVALID+AWREADY, it captures the address; when it sees WVALID+WREADY for all beats, it captures the data; it packages these into a transaction object and sends it via a TLM (Transaction Level Modeling) port (analysis port) to the scoreboard. The scoreboard receives transactions from multiple monitors (one on the AXI input side, one on the AXI output side), feeds input transactions to the reference model, and compares reference output transactions to actual output transactions. This decoupled architecture means the scoreboard is bus-protocol-independent — it only sees transactions, not signal toggling.