HomeHBM3 ControllerModule 17 — SV Testbench & Verification
🧪 Phase 5 · Module 17 of 18

HBM3 Controller SV Testbench & Verification

A complete SystemVerilog testbench for the HBM3 controller: AXI4 master BFM, scoreboard with associative-array expected-data tracking, functional coverage groups for bank access patterns and refresh timing, 8 SVA protocol assertions, and 10 runnable test scenarios from single write/read to bandwidth benchmarking.

📁 tb_hbm3_top.sv 10 Test Scenarios SVA Assertions AXI4 BFM Scoreboard Functional Coverage

Testbench Architecture Overview

The testbench is structured as a self-checking simulation environment. Every test produces either a PASS or FAIL verdict printed at the end of simulation via $display, with error counts from $error calls accumulated throughout. The block structure is:

AXI4 Master BFM Drives AW/W/AR channels Reads B/R responses HBM3 Controller DUT hbm3_pc_ctrl.v (all 16 modules) DRAM Model hbm3_dram_model.sv (Module 16) AXI4 CA+DQ DQ rd data AXI4 R Scoreboard expected_data[addr] vs actual read resp Coverage Model bank/BG/page/refresh covergroups SVA Assertions 8 protocol properties bind to DUT Test Control — 10 scenarios called from initial block in tb_hbm3_top

The testbench top module (tb_hbm3_top) instantiates the DUT and DRAM model, generates the clock, applies reset, and calls test tasks sequentially from a single initial block. All checking is automatic — no manual waveform inspection required for pass/fail determination.

AXI4 Master BFM

The BFM drives AXI4 write and read transactions to the controller's host interface. It implements the full AXI4 handshake: AWVALID/AWREADY, WVALID/WREADY, BVALID/BREADY for writes; ARVALID/ARREADY, RVALID/RREADY for reads. It also logs each transaction to the scoreboard.

systemverilog — AXI4 BFM tasks
// AXI4 interface signals (connected to DUT)
logic [31:0] m_awaddr, m_araddr;
logic [7:0]  m_awlen,  m_arlen;
logic [2:0]  m_awsize, m_arsize;
logic [1:0]  m_awburst,m_arburst;
logic        m_awvalid,m_arvalid;
logic        m_awready,m_arready;
logic [31:0] m_wdata;
logic [3:0]  m_wstrb;
logic        m_wvalid, m_wready, m_wlast;
logic [1:0]  m_bresp;
logic        m_bvalid, m_bready;
logic [31:0] m_rdata;
logic [1:0]  m_rresp;
logic        m_rvalid, m_rready, m_rlast;

// Expected data scoreboard
logic [31:0] sb_exp [logic [31:0]];
int          sb_mismatches = 0;
int          sb_checks     = 0;

// AXI4 Write Transaction
task automatic axi4_write(
  input logic [31:0] addr,
  input logic [31:0] data,
  input logic [3:0]  strb = 4'hF
);
  // Address channel
  @(posedge clk);
  m_awaddr  <= addr;
  m_awlen   <= 8'd0;       // Single beat
  m_awsize  <= 3'd2;       // 4 bytes
  m_awburst <= 2'd1;       // INCR
  m_awvalid <= 1'b1;
  wait(m_awready); @(posedge clk);
  m_awvalid <= 1'b0;
  // Data channel
  m_wdata  <= data;
  m_wstrb  <= strb;
  m_wlast  <= 1'b1;
  m_wvalid <= 1'b1;
  wait(m_wready); @(posedge clk);
  m_wvalid <= 1'b0;
  m_wlast  <= 1'b0;
  // Response channel
  m_bready <= 1'b1;
  wait(m_bvalid);
  if (m_bresp != 2'b00)
    $error("[TB] Write error: BRESP=%0b addr=0x%08h", m_bresp, addr);
  @(posedge clk);
  m_bready <= 1'b0;
  // Record expected data in scoreboard
  sb_exp[addr] = data;
endtask

// AXI4 Burst Write (INCR, len+1 beats)
task automatic axi4_burst_write(
  input logic [31:0] base_addr,
  input int          num_beats,
  input logic [31:0] base_data
);
  // Address channel
  @(posedge clk);
  m_awaddr  <= base_addr;
  m_awlen   <= num_beats - 1;
  m_awsize  <= 3'd2;
  m_awburst <= 2'd1;  // INCR
  m_awvalid <= 1'b1;
  wait(m_awready); @(posedge clk);
  m_awvalid <= 1'b0;
  // Data beats
  for (int i = 0; i < num_beats; i++) begin
    m_wdata  <= base_data + i;
    m_wstrb  <= 4'hF;
    m_wlast  <= (i == num_beats-1);
    m_wvalid <= 1'b1;
    wait(m_wready); @(posedge clk);
    sb_exp[base_addr + i*4] = base_data + i;
  end
  m_wvalid <= 1'b0;
  m_wlast  <= 1'b0;
  m_bready <= 1'b1;
  wait(m_bvalid); @(posedge clk);
  if (m_bresp != 2'b00)
    $error("[TB] Burst write error: BRESP=%0b", m_bresp);
  m_bready <= 1'b0;
endtask

// AXI4 Read Transaction
task automatic axi4_read(
  input  logic [31:0] addr,
  output logic [31:0] data
);
  @(posedge clk);
  m_araddr  <= addr;
  m_arlen   <= 8'd0;
  m_arsize  <= 3'd2;
  m_arburst <= 2'd1;
  m_arvalid <= 1'b1;
  wait(m_arready); @(posedge clk);
  m_arvalid <= 1'b0;
  m_rready  <= 1'b1;
  wait(m_rvalid && m_rlast);
  data = m_rdata;
  if (m_rresp != 2'b00)
    $error("[TB] Read error: RRESP=%0b addr=0x%08h", m_rresp, addr);
  @(posedge clk);
  m_rready <= 1'b0;
  // Scoreboard check
  if (sb_exp.exists(addr)) begin
    sb_checks++;
    if (data !== sb_exp[addr]) begin
      sb_mismatches++;
      $error("[TB][SB] MISMATCH addr=0x%08h exp=0x%08h got=0x%08h",
             addr, sb_exp[addr], data);
    end
  end
endtask

Scoreboard Design

The scoreboard is embedded directly into the BFM tasks via a SystemVerilog associative array sb_exp[logic [31:0]]. Every axi4_write call stores the written data keyed by address. Every axi4_read call looks up the expected value and compares with the received data.

systemverilog — scoreboard summary at end of simulation
// Called at end of each test
task automatic scoreboard_report(input string test_name);
  $display("=== SCOREBOARD REPORT: %s ===", test_name);
  $display("  Checks    : %0d", sb_checks);
  $display("  Mismatches: %0d", sb_mismatches);
  if (sb_mismatches == 0)
    $display("  RESULT    : *** PASS ***");
  else
    $display("  RESULT    : *** FAIL *** (%0d errors)", sb_mismatches);
  sb_checks     = 0;
  sb_mismatches = 0;
  sb_exp.delete();  // Clear for next test
endtask

Functional Coverage Groups

Functional coverage tracks what has been exercised, not whether data was correct. The coverage model is defined in a separate module bound to the DUT, so it can observe internal signals without modifying the testbench.

systemverilog — coverage groups
// Bind-module: hbm3_coverage.sv
// bind hbm3_pc_ctrl hbm3_coverage u_cov (.*);

module hbm3_coverage (
  input logic        i_clk,
  input logic [4:0]  bank_accessed,    // from DUT internal
  input logic [1:0]  page_policy_hit,  // 00=miss, 01=hit, 10=conflict
  input logic        refresh_active,
  input logic        wr_burst_active,
  input logic        power_down_entry,
  input logic        ecc_sb_corrected,
  input logic [1:0]  burst_len_code    // 00=BL4, 01=BL8, 10=BL16, 11=BL32
);

  // 1. Bank distribution coverage
  covergroup cg_bank_access @(posedge i_clk);
    cp_bank: coverpoint bank_accessed {
      bins bank_low[]  = {[0:7]};
      bins bank_mid[]  = {[8:23]};
      bins bank_high[] = {[24:31]};
    }
  endgroup

  // 2. Page policy coverage
  covergroup cg_page_policy @(posedge i_clk);
    cp_policy: coverpoint page_policy_hit {
      bins page_miss     = {2'b00};
      bins page_hit      = {2'b01};
      bins page_conflict = {2'b10};
    }
    cp_bank_x_policy: cross cp_bank, cp_policy;
  endgroup

  // 3. Refresh during active burst
  covergroup cg_refresh_timing @(posedge i_clk);
    cp_ref_vs_burst: coverpoint {refresh_active, wr_burst_active} {
      bins ref_idle    = {2'b10};   // Refresh during idle
      bins ref_burst   = {2'b11};   // Refresh collision with write burst (critical)
      bins no_ref      = {2'b00};   // No refresh
    }
  endgroup

  // 4. Power state transitions
  covergroup cg_power @(posedge i_clk);
    cp_pd_entry: coverpoint power_down_entry {
      bins pd_entered  = {1'b1};
      bins pd_not      = {1'b0};
    }
  endgroup

  // 5. Burst length
  covergroup cg_burst_len @(posedge i_clk);
    cp_bl: coverpoint burst_len_code {
      bins bl4  = {2'b00};
      bins bl8  = {2'b01};
      bins bl16 = {2'b10};
      bins bl32 = {2'b11};
    }
  endgroup

  // 6. ECC correction
  covergroup cg_ecc @(posedge i_clk);
    cp_ecc_sb: coverpoint ecc_sb_corrected {
      bins sb_error_corrected = {1'b1};
    }
  endgroup

  cg_bank_access   cg_ba  = new();
  cg_page_policy   cg_pp  = new();
  cg_refresh_timing cg_rt = new();
  cg_power         cg_pw  = new();
  cg_burst_len     cg_bl  = new();
  cg_ecc           cg_ec  = new();

  final begin
    $display("=== COVERAGE SUMMARY ===");
    $display("  Bank Access   : %.1f%%", cg_ba.get_coverage());
    $display("  Page Policy   : %.1f%%", cg_pp.get_coverage());
    $display("  Refresh Timing: %.1f%%", cg_rt.get_coverage());
    $display("  Power States  : %.1f%%", cg_pw.get_coverage());
    $display("  Burst Length  : %.1f%%", cg_bl.get_coverage());
    $display("  ECC Correction: %.1f%%", cg_ec.get_coverage());
  end

endmodule
Coverage GroupBinsClosure RequirementPriority
cg_bank_access32 bank binsAll 32 banks hit ≥1×HIGH
cg_page_policy (hit×bank)3 × 3 crossHit/miss/conflict in each BGHIGH
cg_refresh_timing3 binsref_burst bin hit ≥5×HIGH
cg_power2 binspd_entered hit ≥1×MED
cg_burst_len4 binsAll 4 burst lengths testedMED
cg_ecc1 binsb_error_corrected hit ≥3×LOW

SVA Protocol Assertions

Eight SystemVerilog Assertion properties are bound to the DUT and check AXI4 and DRAM protocol invariants on every simulation clock. They fire instantly on violation without requiring any test task to check explicitly.

systemverilog — 8 SVA properties
// hbm3_sva.sv — Protocol assertions, bound to hbm3_pc_ctrl
module hbm3_sva (
  input logic        clk, rst_n,
  input logic        m_awvalid, m_awready,
  input logic        m_wvalid,  m_wready, m_wlast,
  input logic        m_bvalid,  m_bready,
  input logic        m_arvalid, m_arready,
  input logic        m_rvalid,  m_rready, m_rlast,
  input logic        o_refresh_req, i_refresh_ack,
  input logic [1:0]  bank_state_0,  // Bank 0 state for demo
  input logic        wq_full,
  input logic        o_cke
);

  // P1: AWVALID must not deassert before AWREADY (AXI4 rule)
  property p_awvalid_stable;
    @(posedge clk) disable iff (!rst_n)
    (m_awvalid && !m_awready) |=> m_awvalid;
  endproperty
  assert property (p_awvalid_stable)
    else $error("[SVA] P1 FAIL: AWVALID deasserted before AWREADY");

  // P2: WVALID must not deassert before WREADY
  property p_wvalid_stable;
    @(posedge clk) disable iff (!rst_n)
    (m_wvalid && !m_wready) |=> m_wvalid;
  endproperty
  assert property (p_wvalid_stable)
    else $error("[SVA] P2 FAIL: WVALID deasserted before WREADY");

  // P3: ARVALID must not deassert before ARREADY
  property p_arvalid_stable;
    @(posedge clk) disable iff (!rst_n)
    (m_arvalid && !m_arready) |=> m_arvalid;
  endproperty
  assert property (p_arvalid_stable)
    else $error("[SVA] P3 FAIL: ARVALID deasserted before ARREADY");

  // P4: BVALID must eventually be followed by BREADY (no response stall > 1000 cy)
  property p_bvalid_timeout;
    @(posedge clk) disable iff (!rst_n)
    m_bvalid |-> ##[1:1000] m_bready;
  endproperty
  assert property (p_bvalid_timeout)
    else $error("[SVA] P4 FAIL: BVALID asserted >1000 cycles without BREADY");

  // P5: RVALID must eventually be followed by RREADY
  property p_rvalid_timeout;
    @(posedge clk) disable iff (!rst_n)
    m_rvalid |-> ##[1:1000] m_rready;
  endproperty
  assert property (p_rvalid_timeout)
    else $error("[SVA] P5 FAIL: RVALID asserted >1000 cycles without RREADY");

  // P6: Refresh request must be acknowledged within 100 cycles
  property p_refresh_ack;
    @(posedge clk) disable iff (!rst_n)
    o_refresh_req |-> ##[1:100] i_refresh_ack;
  endproperty
  assert property (p_refresh_ack)
    else $error("[SVA] P6 FAIL: Refresh request not acknowledged within 100 cycles");

  // P7: Controller must not drop CKE while WQ is non-empty
  property p_cke_wq;
    @(posedge clk) disable iff (!rst_n)
    (!o_cke) |-> !wq_full;
  endproperty
  assert property (p_cke_wq)
    else $error("[SVA] P7 FAIL: CKE deasserted while write queue is full");

  // P8: After reset deassertion, AWREADY must be high within 32 cycles
  property p_post_reset_ready;
    @(posedge clk)
    $rose(rst_n) |-> ##[1:32] m_awready;
  endproperty
  assert property (p_post_reset_ready)
    else $error("[SVA] P8 FAIL: AWREADY not ready within 32 cycles of reset release");

endmodule

// Bind to DUT
bind hbm3_pc_ctrl hbm3_sva u_sva (
  .clk          (i_clk),
  .rst_n        (i_rst_n),
  .m_awvalid    (i_axi_awvalid),
  .m_awready    (o_axi_awready),
  .m_wvalid     (i_axi_wvalid),
  .m_wready     (o_axi_wready),
  .m_wlast      (i_axi_wlast),
  .m_bvalid     (o_axi_bvalid),
  .m_bready     (i_axi_bready),
  .m_arvalid    (i_axi_arvalid),
  .m_arready    (o_axi_arready),
  .m_rvalid     (o_axi_rvalid),
  .m_rready     (i_axi_rready),
  .m_rlast      (o_axi_rlast),
  .o_refresh_req(o_ref_req),
  .i_refresh_ack(i_ref_ack),
  .bank_state_0 (bank_fsm[0]),
  .wq_full      (wq_full_int),
  .o_cke        (o_dram_cke)
);

10 Test Scenarios

Each test is a self-contained task called from the top-level initial block. The scoreboard is cleared between tests.

Test 1 — Single Write + Read Verify

Writes one 32-bit word to address 0x0000_1000 and reads it back. Simplest sanity check: controller accepts one AXI4 write, issues ACT+WR+PRE to DRAM model, then accepts AXI4 read and returns correct data.

Test 2 — Sequential Burst (Address Sweep)

Writes 256 sequential words starting at 0x0000_0000 with addresses incrementing by 4. Then reads all 256 back in order. Exercises the full AXI4 burst engine and verifies that multiple rows in the same bank can be written and read without data aliasing.

Test 3 — Random Address (Bank Conflict Stress)

Issues 512 writes to pseudo-random addresses chosen to hit different banks. Then reads back all 512 in random order. Stresses the scheduler's ability to handle bank conflicts, issue precharge+activate sequences, and maintain correct read-after-write ordering.

Test 4 — Write-Then-Read Ordering

Issues a write to address A, immediately followed by a read to the same address A before the write has been drained to DRAM. Verifies that the controller returns correct data (from the write queue bypass) and does not return stale DRAM content.

Test 5 — Refresh During Active Transaction

Starts a long AXI4 burst write, then forces o_refresh_req high mid-burst. Verifies that the controller pauses burst traffic, issues REF to the DRAM model, waits tRFC, then resumes without data corruption. Closes the ref_burst coverage bin.

Test 6 — Write Drain (Fill WQ Above Watermark)

Floods the AXI4 write channel with 64 back-to-back writes faster than the DRAM can accept them. Verifies that the write queue fills, backpressure is applied on WREADY, the controller drains the queue in bank-efficient order, and no write is lost.

Test 7 — Page Hit Rate Measurement

Issues 1024 writes, all to the same bank and row (different columns). Counts how many result in page hits vs misses using the coverage model. Expected: first access is a miss, all 1023 subsequent accesses are hits (open-page policy). Useful for verifying page policy logic from Module 5.

Test 8 — Power-Down Entry and Exit

After flushing the write queue, deasserts CKE (via a direct control signal in the testbench, simulating the power management module). Waits 200 cycles in power-down. Reasserts CKE and issues a write. Verifies that the controller re-initialises correctly and data integrity is maintained.

Test 9 — ECC Single-Bit Error Injection + Correction Verify

Writes known data to a location. Directly flips one bit in the DRAM model's associative array (simulating a single-bit soft error). Issues a read and verifies that the ECC engine from Module 8 returns the original (corrected) data, not the corrupted value. Checks that the ECC correction interrupt is asserted.

Test 10 — Bandwidth Benchmark

Issues 4096 consecutive 32-bit AXI4 writes (16 KB) to sequential addresses. Records the start cycle at the first AWVALID and the end cycle when the last BVALID is received. Computes effective bandwidth = 16,384 bytes / (elapsed_cycles × 0.5 ns) and prints result in GB/s.

systemverilog — Tests 1, 5, 9, 10 implementations
// TEST 1: Single write + read
task automatic test1_single_wr_rd;
  logic [31:0] rdata;
  $display("\n--- TEST 1: Single Write+Read ---");
  axi4_write(32'h0000_1000, 32'hA5A5_1234);
  repeat(100) @(posedge clk);  // Allow DRAM pipeline
  axi4_read(32'h0000_1000, rdata);
  scoreboard_report("Test1_SingleWrRd");
endtask

// TEST 5: Refresh collision
task automatic test5_refresh_collision;
  logic [31:0] rdata;
  $display("\n--- TEST 5: Refresh During Burst ---");
  // Start a burst write (64 beats)
  fork
    axi4_burst_write(32'h0010_0000, 64, 32'hBEEF_0000);
  join_none
  // Inject refresh request after 20 cycles
  repeat(20) @(posedge clk);
  force tb_hbm3_top.dut.i_ref_req = 1'b1;
  repeat(5)  @(posedge clk);
  release tb_hbm3_top.dut.i_ref_req;
  // Wait for burst to finish
  wait(m_bvalid); @(posedge clk);
  // Verify all data
  for (int i = 0; i < 64; i++) begin
    axi4_read(32'h0010_0000 + i*4, rdata);
  end
  scoreboard_report("Test5_RefreshCollision");
endtask

// TEST 9: ECC single-bit error injection
task automatic test9_ecc_inject;
  logic [31:0] rdata;
  logic [24:0] flip_addr;
  $display("\n--- TEST 9: ECC SBE Injection ---");
  // Write known data
  axi4_write(32'h0020_0000, 32'hDEAD_CAFE);
  repeat(200) @(posedge clk);
  // Corrupt bit 7 in the DRAM model's memory array directly
  flip_addr = 25'h0004_000;  // Bank 0, row 1, col 0 (mapped)
  force tb_hbm3_top.u_dram.mem[flip_addr][7] = ~tb_hbm3_top.u_dram.mem[flip_addr][7];
  release tb_hbm3_top.u_dram.mem[flip_addr][7];
  // Read back — ECC should correct it
  axi4_read(32'h0020_0000, rdata);
  if (rdata !== 32'hDEAD_CAFE)
    $error("[T9] ECC failed: got 0x%08h expected 0xDEAD_CAFE", rdata);
  else
    $display("[T9] ECC SBE correction verified OK");
  scoreboard_report("Test9_ECCInject");
endtask

// TEST 10: Bandwidth benchmark
task automatic test10_bandwidth;
  logic [63:0] t_start, t_end;
  real bw_gbps;
  localparam int BEATS = 4096;
  $display("\n--- TEST 10: Bandwidth Benchmark ---");
  t_start = cycle_cnt_tb;
  axi4_burst_write(32'h0040_0000, BEATS, 32'h1234_0000);
  wait(m_bvalid); @(posedge clk);
  t_end = cycle_cnt_tb;
  // BW = bytes / (cycles * 0.5ns)
  bw_gbps = (BEATS * 4.0) / ((t_end - t_start) * 0.5e-9) / 1.0e9;
  $display("[T10] Wrote %0d bytes in %0d cycles", BEATS*4, t_end-t_start);
  $display("[T10] Effective BW = %.2f GB/s (per pseudo-channel)", bw_gbps);
  $display("[T10] Theoretical peak = 8.0 GB/s @ 2 GHz x 32-bit");
  scoreboard_report("Test10_Bandwidth");
endtask

Running the Simulation

The testbench compiles cleanly with Icarus Verilog, Synopsys VCS, and Mentor ModelSim/Questa. Choose the command for your tool:

shell — Icarus Verilog (open-source)
# Compile all source files
iverilog -g2012 -o sim.out \
  hbm3_dram_model.sv \
  hbm3_pc_ctrl.v \
  hbm3_sva.sv \
  hbm3_coverage.sv \
  tb_hbm3_top.sv

# Run simulation
vvp sim.out

# With VCD waveform dump
vvp sim.out +dumpfile=waves.vcd
shell — Synopsys VCS
vcs -sverilog -full64 -timescale=1ns/1ps \
  +lint=TFIPC \
  -assert enable_diag \
  +cover=bcef \
  hbm3_dram_model.sv hbm3_pc_ctrl.v \
  hbm3_sva.sv hbm3_coverage.sv tb_hbm3_top.sv \
  -o simv

./simv +vcs+lic+wait -cm line+cond+fsm+tgl+branch
shell — Questa / ModelSim
vlib work
vmap work work

vlog -sv -timescale "1ns/1ps" \
  hbm3_dram_model.sv hbm3_pc_ctrl.v \
  hbm3_sva.sv hbm3_coverage.sv tb_hbm3_top.sv

vsim -c tb_hbm3_top \
  -do "run -all; coverage save -onexit cov.ucdb; quit -f"

# View coverage
vcover report cov.ucdb -details

Coverage Closure Strategy

Coverage closure is the systematic process of reaching 100% on all defined coverage bins. The recommended closure order:

  1. Run Tests 1–4 first — closes basic bank access bins and most page policy bins (hit/miss). Expect ~55% overall.
  2. Run Test 5 — closes the critical ref_burst bin in cg_refresh_timing. Expect ~68%.
  3. Run Test 7 — drives page hit rate; confirms cg_page_policy hit bin is saturated. ~75%.
  4. Run Test 8 — closes power-down entry bin in cg_power. ~82%.
  5. Run Test 9 — closes ECC SBE correction bin. ~90%.
  6. Run Test 3 with more seeds — random address test with varied random seeds to close remaining bank bins. ~98%.
  7. Targeted directed tests for any remaining unclosed bins (e.g. BL4 burst length if not hit). ~100%.
100% functional coverage does not mean the design is bug-free. It means you have exercised every feature you defined coverage for. Bugs in features not captured by coverage bins will not be caught. Review unclosed bins analytically before declaring closure.

Full Testbench Top Module

systemverilog — tb_hbm3_top.sv skeleton
`timescale 1ns/1ps

module tb_hbm3_top;

  // Clock + Reset
  logic clk = 0;
  logic rst_n = 0;
  always #0.25 clk = ~clk;  // 2 GHz (0.5 ns period)

  // Cycle counter (for bandwidth measurement)
  logic [63:0] cycle_cnt_tb = 0;
  always_ff @(posedge clk) cycle_cnt_tb <= cycle_cnt_tb + 1;

  // AXI4 interface wires (declared above in BFM section)
  // ... [omitted for brevity — see BFM section] ...

  // DUT + DRAM model wires
  logic [7:0]  dram_ca;
  logic        dram_cke;
  logic [31:0] dram_dq_in, dram_dq_out;
  logic [3:0]  dram_dm;
  logic        dram_dqs_wr, dram_dqs_rd;

  // DUT instantiation
  hbm3_pc_ctrl dut (
    .i_clk          (clk),
    .i_rst_n        (rst_n),
    .i_axi_awvalid  (m_awvalid),
    .o_axi_awready  (m_awready),
    .i_axi_awaddr   (m_awaddr),
    .i_axi_awlen    (m_awlen),
    .i_axi_awsize   (m_awsize),
    .i_axi_awburst  (m_awburst),
    .i_axi_wvalid   (m_wvalid),
    .o_axi_wready   (m_wready),
    .i_axi_wdata    (m_wdata),
    .i_axi_wstrb    (m_wstrb),
    .i_axi_wlast    (m_wlast),
    .o_axi_bvalid   (m_bvalid),
    .i_axi_bready   (m_bready),
    .o_axi_bresp    (m_bresp),
    .i_axi_arvalid  (m_arvalid),
    .o_axi_arready  (m_arready),
    .i_axi_araddr   (m_araddr),
    .i_axi_arlen    (m_arlen),
    .i_axi_arsize   (m_arsize),
    .i_axi_arburst  (m_arburst),
    .o_axi_rvalid   (m_rvalid),
    .i_axi_rready   (m_rready),
    .o_axi_rdata    (m_rdata),
    .o_axi_rresp    (m_rresp),
    .o_axi_rlast    (m_rlast),
    .o_dram_cke     (dram_cke),
    .o_dram_ca      (dram_ca),
    .o_dram_dq      (dram_dq_in),
    .o_dram_dm      (dram_dm),
    .o_dram_dqs_wr  (dram_dqs_wr),
    .i_dram_dq      (dram_dq_out),
    .i_dram_dqs_rd  (dram_dqs_rd)
  );

  // DRAM model
  hbm3_dram_model #(.CL_DEFAULT(70)) u_dram (
    .i_clk    (clk),
    .i_cke    (dram_cke),
    .i_ca     (dram_ca),
    .i_dq_in  (dram_dq_in),
    .i_dm     (dram_dm),
    .i_dqs_wr (dram_dqs_wr),
    .o_dq_out (dram_dq_out),
    .o_dqs_rd (dram_dqs_rd)
  );

  // Test sequence
  initial begin
    $display("=== HBM3 Controller SV Testbench Start ===");
    // Reset
    rst_n = 0;
    m_awvalid = 0; m_wvalid = 0; m_bready = 1;
    m_arvalid = 0; m_rready = 1;
    repeat(16) @(posedge clk);
    rst_n = 1;
    repeat(4) @(posedge clk);

    // Run all 10 tests
    test1_single_wr_rd;
    test2_seq_burst;
    test3_random_addr;
    test4_wr_rd_order;
    test5_refresh_collision;
    test6_wq_drain;
    test7_page_hit_rate;
    test8_power_down;
    test9_ecc_inject;
    test10_bandwidth;

    $display("\n=== ALL TESTS COMPLETE. Check scoreboard + SVA for failures. ===");
    $finish;
  end

  // Timeout watchdog
  initial begin
    #5_000_000;  // 5 ms simulation limit
    $error("[TB] TIMEOUT: Simulation exceeded 5 ms");
    $finish;
  end

endmodule

Frequently Asked Questions

What is the difference between a BFM and a full UVM agent?

A Bus Functional Model (BFM) is a task-based module that drives protocol-compliant stimulus onto a bus interface. It understands the AXI4 handshake and generates correct AWVALID/WVALID/ARVALID signals. A full UVM agent adds a sequencer, driver, monitor, and factory pattern — enabling reuse across projects and layered scoreboards. For this course, a BFM covers 95% of the verification value at 10% of the UVM boilerplate. UVM is appropriate when the same testbench must scale to a chip-level environment.

What does the scoreboard check?

The scoreboard maintains an associative array keyed by AXI address, storing expected data from every write. When a read response arrives, it looks up the expected value and compares against the actual data. A mismatch triggers $error with the address, expected value, and actual value. It also counts total checks and mismatches, printing a summary at the end of each test scenario.

How do SVA assertions complement functional simulation?

SVA properties check protocol invariants continuously throughout simulation — not just when a test task is running. For example, the AWVALID-stability property fires on every clock edge, catching brief glitches that a task-based BFM might miss between task calls. SVA is particularly powerful for catching corner cases that occur in gaps between test scenarios.

What is coverage closure and when is it complete?

Coverage closure is the process of running tests until every defined coverage bin has been hit the required number of times. In our model, closure means all 32 banks accessed, page hit/miss/conflict for each bank group, refresh-during-burst hit at least 5 times, power-down entry verified, all burst lengths tested, and ECC correction exercised. Closure does not mean the design is bug-free — only that the defined feature space has been exercised.

How do I measure effective bandwidth from simulation?

Record the simulation cycle at the first AWVALID of the benchmark run and the cycle when the last BVALID is received. Bandwidth = (total bytes transferred) / (elapsed cycles × clock period). At 2 GHz with 32-bit AXI4 and CL=70, theoretical peak is 8 GB/s per pseudo-channel. Effective bandwidth is lower due to ACT+PRE overhead (56 cycles per row open), refresh stalls (440 cycles per tREFI), and AXI4 handshake gaps.