HomeHBM3 ControllerModule 16 — DRAM Memory Model
🔬 Phase 5 · Module 16 of 18

HBM3 DRAM Behavioral Memory Model

A simulation-only SystemVerilog model of HBM3 DRAM. Decodes CA packets, enforces bank state machines, checks timing violations with $error assertions, and returns read data after CL cycles through a pipeline. Drop-in replacement for real HBM3 silicon in your testbench.

📁 hbm3_dram_model.sv Simulation Only — Not Synthesizable CL = 70 cy 32 Banks 64 MB Array JEDEC JESD238

Why You Need a DRAM Behavioral Model

You cannot simulate a complete HBM3 controller without something on the other end of the CA bus. Real HBM3 silicon obviously cannot be connected in RTL simulation. A simple dual-port SRAM is insufficient — it accepts any access at any time, ignoring the entire DRAM protocol: bank activation, tRCD, CAS latency, precharge, refresh blackout windows, and mode register state. A controller that violates tRCD on a simple SRAM just works; on real silicon, it corrupts data silently.

A behavioral DRAM model fills this gap. It is a simulation-only SystemVerilog module that:

This model is used exclusively in simulation. It connects directly to the HBM3 controller's DFI/CA outputs in the Module 17 testbench. The controller sees it as indistinguishable from real HBM3 DRAM, from a protocol correctness standpoint.

Where This Fits in the Verification Flow

The complete simulation stack from Module 17 is: AXI4 Master BFM → HBM3 Controller DUT → hbm3_dram_model (this module) → Scoreboard. The model is the "oracle" — it stores what was written and returns it on read. The scoreboard compares model output to expected data from the BFM transaction log. Any mismatch is a controller bug.

Model Architecture

The model has four cooperating subsystems:

CA Decoder

Samples i_ca[7:0] on every rising edge of i_clk when i_cke is high. Accumulates two consecutive cycles into a command packet. Bits [7:6] of cycle 0 select the command class; remaining bits and cycle 1 carry addresses. Output: decoded command type + bank + row/column address.

Bank State Machine

32 independent 3-state FSMs: IDLE → ACTIVE (on ACT, stores open row address) → IDLE (on PRE) with REFRESHING side-state during REF. Timestamps each state transition for timing violation checks.

Memory Array

SystemVerilog associative array logic [31:0] mem [logic [24:0]]. Indexed by 25-bit {bank[4:0], row[13:0], col[5:0]}. Sparse — only written locations consume simulator memory. Default read value is 32'hDEAD_BEEF to catch uninitialized reads.

Timing Checker + CL Pipeline

Stores cycle-stamp of last ACT/PRE/WR/RD per bank. On each new command, computes elapsed cycles and asserts if any JEDEC minimum is violated. A 71-entry shift register pipelines read data from storage to o_dq_out after CL cycles.

Block Diagram

i_ca[7:0] i_cke i_dq_in[31:0] i_dm[3:0] i_dqs_wr CA Decoder 2-cycle packet → CMD+ADDR 32-Bank FSM IDLE/ACTIVE/REFRESH + Timing Checker Mem Array assoc array [24:0] → [31:0] data CL Pipeline 70 stages o_dq_out[31:0] o_dqs_rd Mode Reg Shadow MR0–MR7 copies $error assertions tRCD/tRAS/tRP/tRC

Port Table

All ports are named from the controller's perspective — inputs are signals the controller drives, outputs are signals the model drives back to the controller.

PortDirWidthMeaning
i_clkinput12 GHz HBM3 clock. All state transitions on rising edge.
i_ckeinput1Clock Enable. Model ignores CA bus when CKE=0 (power-down mode).
i_ca[7:0]input8Command/Address bus. Packed 2-cycle packets per JEDEC JESD238 protocol.
i_dq_in[31:0]input32Write data bus from controller. Captured on i_dqs_wr rising edge.
i_dm[3:0]input4Data Mask — one bit per byte. Masked bytes are not written to the array.
i_dqs_wrinput1Write DQS strobe. Write data is captured on rising edge of this signal.
o_dq_out[31:0]output32Read data to controller. Valid CL cycles after the RD command.
o_dqs_rdoutput1Read DQS strobe driven by model to indicate valid read data on o_dq_out.

Memory Array Implementation

Modeling 64 MB of DRAM as a flat Verilog register array would allocate 512 Mb (64 million 32-bit words) of simulator memory even if the test writes only a handful of locations. That is impractical. Instead, the model uses a SystemVerilog associative array which only allocates entries for addresses that have been explicitly written.

systemverilog — memory array declaration
// 25-bit address = {bank[4:0], row[13:0], col[5:0]}
// 32-bit data word (controller uses 32-bit wide pseudo-channel)
logic [31:0] mem [logic [24:0]];

// Default uninitialized-read sentinel
localparam logic [31:0] UNINIT_SENTINEL = 32'hDEAD_BEEF;

// Write (masked)
task automatic do_write(
  input logic [4:0]  bank,
  input logic [13:0] row,
  input logic [5:0]  col,
  input logic [31:0] wdata,
  input logic [3:0]  dm
);
  logic [24:0] addr;
  logic [31:0] existing;
  addr = {bank, row, col};
  existing = mem.exists(addr) ? mem[addr] : UNINIT_SENTINEL;
  // Apply data mask — dm[n]=1 means mask byte n (do NOT write)
  if (!dm[0]) existing[ 7: 0] = wdata[ 7: 0];
  if (!dm[1]) existing[15: 8] = wdata[15: 8];
  if (!dm[2]) existing[23:16] = wdata[23:16];
  if (!dm[3]) existing[31:24] = wdata[31:24];
  mem[addr] = existing;
endtask

// Read
function automatic logic [31:0] do_read(
  input logic [4:0]  bank,
  input logic [13:0] row,
  input logic [5:0]  col
);
  logic [24:0] addr;
  addr = {bank, row, col};
  return mem.exists(addr) ? mem[addr] : UNINIT_SENTINEL;
endfunction

The 25-bit address is composed as {bank[4:0], row[13:0], col[5:0]}. With 32 banks (5 bits), 16,384 rows (14 bits), and 64 columns (6 bits), this covers the full HBM3 pseudo-channel address space of 32 × 16,384 × 64 × 4 bytes = 128 MB per pseudo-channel (though the default parameter limits actual modeled capacity to 64 MB).

The UNINIT_SENTINEL value 0xDEAD_BEEF is intentional. If the controller issues a RD to an address that was never written, the scoreboard in Module 17 will see 0xDEAD_BEEF and flag an unexpected data mismatch — catching the bug immediately rather than silently passing zeroes.

CA Command Decoder

HBM3's CA bus uses two-cycle command packets. The model accumulates two consecutive rising-edge samples when CKE is asserted, then dispatches to the appropriate bank handler. Bits [7:6] of the first cycle are the command class discriminator:

CA[7:6] Cy0CA[5:4] Cy0CommandAddress Payload
2'b00ACT — Activate Rowbank[4:0] in cy0[4:0]; row[13:0] across cy0+cy1
2'b012'b00RD — Readbank[4:0], col[5:0] in cy1
2'b012'b01WR — Writebank[4:0], col[5:0] in cy1
2'b102'b00PRE — Prechargebank[4:0] or all-banks flag
2'b102'b01REF — RefreshABR or PBR + bank in cy1
2'b11MRS — Mode Reg WriteMR address [2:0], data [7:0] in cy1
systemverilog — CA decoder
// Two-phase CA packet decoder
logic [7:0] ca_cy0, ca_cy1;
logic       packet_phase;  // 0 = waiting for cycle-0, 1 = waiting for cycle-1

typedef enum logic [2:0] {
  CMD_ACT = 3'd0,
  CMD_RD  = 3'd1,
  CMD_WR  = 3'd2,
  CMD_PRE = 3'd3,
  CMD_REF = 3'd4,
  CMD_MRS = 3'd5,
  CMD_NOP = 3'd7
} cmd_t;

cmd_t         decoded_cmd;
logic [4:0]   decoded_bank;
logic [13:0]  decoded_row;
logic [5:0]   decoded_col;
logic         cmd_valid;

always_ff @(posedge i_clk) begin
  cmd_valid <= 1'b0;
  if (i_cke) begin
    if (!packet_phase) begin
      ca_cy0        <= i_ca;
      packet_phase  <= 1'b1;
    end else begin
      ca_cy1        <= i_ca;
      packet_phase  <= 1'b0;
      cmd_valid     <= 1'b1;
      // Decode on cycle-1 arrival
      case (ca_cy0[7:6])
        2'b00: begin  // ACT
          decoded_cmd  <= CMD_ACT;
          decoded_bank <= ca_cy0[4:0];
          decoded_row  <= {i_ca[5:0], ca_cy0[7:0]};  // simplified packing
        end
        2'b01: begin  // RD or WR
          decoded_cmd  <= (ca_cy0[5]) ? CMD_WR : CMD_RD;
          decoded_bank <= ca_cy0[4:0];
          decoded_col  <= i_ca[5:0];
        end
        2'b10: begin  // PRE or REF
          if (ca_cy0[5]) begin
            decoded_cmd  <= CMD_REF;
          end else begin
            decoded_cmd  <= CMD_PRE;
            decoded_bank <= ca_cy0[4:0];
          end
        end
        2'b11: begin  // MRS
          decoded_cmd <= CMD_MRS;
        end
        default: decoded_cmd <= CMD_NOP;
      endcase
    end
  end else begin
    packet_phase <= 1'b0;  // Reset on CKE low
    cmd_valid    <= 1'b0;
  end
end

Bank State Machine

Each of the 32 banks has an independent FSM. The FSM tracks whether the bank is idle (precharged), active (row open), or refreshing. State transitions are triggered by decoded commands:

StateTriggerNext StateTiming Check
IDLEACTACTIVEtRP since last PRE must have elapsed
IDLEREFREFRESHINGNo constraint (refresh always allowed from idle)
ACTIVERDACTIVEtRCD since ACT must have elapsed
ACTIVEWRACTIVEtRCD since ACT must have elapsed
ACTIVEPREIDLEtRAS since ACT must have elapsed; tWR since last WR
REFRESHING(timer)IDLEAfter tRFC/tRFCpb cycles automatically
ACTIVEACT— (error)$error: ACT to open bank (missing precharge)
IDLERD/WR— (error)$error: column command to closed bank
systemverilog — bank FSM (per-bank, parametrized by BANK_ID)
// Timing parameters (HBM3 @ 2 GHz)
localparam int tRCD    = 28;   // ACT-to-RD/WR
localparam int tRAS    = 76;   // ACT-to-PRE minimum
localparam int tRP     = 28;   // PRE-to-ACT
localparam int tRC     = 112;  // ACT-to-ACT same bank
localparam int tWR     = 40;   // WR-to-PRE
localparam int tCCD    = 8;    // CCD between columns
localparam int tRFC    = 440;  // ABR refresh cycle
localparam int tRFCpb  = 140;  // PBR refresh cycle

typedef enum logic [1:0] {
  BST_IDLE       = 2'd0,
  BST_ACTIVE     = 2'd1,
  BST_REFRESHING = 2'd2
} bank_state_t;

bank_state_t bst [32];
logic [63:0] t_act  [32];  // cycle of last ACT
logic [63:0] t_pre  [32];  // cycle of last PRE
logic [63:0] t_wr   [32];  // cycle of last WR
logic [63:0] t_ref  [32];  // cycle refresh started
logic [13:0] open_row[32]; // currently open row per bank
logic [63:0] cycle_cnt;

always_ff @(posedge i_clk) cycle_cnt <= cycle_cnt + 1;

// Generic bank command handler
task automatic bank_command(
  input cmd_t         cmd,
  input logic [4:0]   bk,
  input logic [13:0]  row,
  input logic [5:0]   col
);
  case (cmd)
    CMD_ACT: begin
      if (bst[bk] != BST_IDLE)
        $error("[DRAM MDL] Bank %0d: ACT while not idle (state=%0d) @ cycle %0d",
               bk, bst[bk], cycle_cnt);
      if ((cycle_cnt - t_pre[bk]) < tRP)
        $error("[DRAM MDL] Bank %0d: tRP violation — only %0d cycles since PRE (need %0d)",
               bk, cycle_cnt - t_pre[bk], tRP);
      if ((cycle_cnt - t_act[bk]) < tRC)
        $error("[DRAM MDL] Bank %0d: tRC violation — only %0d cycles since last ACT (need %0d)",
               bk, cycle_cnt - t_act[bk], tRC);
      bst[bk]      <= BST_ACTIVE;
      open_row[bk] <= row;
      t_act[bk]    <= cycle_cnt;
    end
    CMD_RD: begin
      if (bst[bk] != BST_ACTIVE)
        $error("[DRAM MDL] Bank %0d: RD to closed bank @ cycle %0d", bk, cycle_cnt);
      if ((cycle_cnt - t_act[bk]) < tRCD)
        $error("[DRAM MDL] Bank %0d: tRCD violation — only %0d cycles after ACT (need %0d)",
               bk, cycle_cnt - t_act[bk], tRCD);
      // Schedule read data return through CL pipeline
      schedule_read(bk, open_row[bk], col);
    end
    CMD_WR: begin
      if (bst[bk] != BST_ACTIVE)
        $error("[DRAM MDL] Bank %0d: WR to closed bank @ cycle %0d", bk, cycle_cnt);
      if ((cycle_cnt - t_act[bk]) < tRCD)
        $error("[DRAM MDL] Bank %0d: tRCD violation on WR — only %0d cycles (need %0d)",
               bk, cycle_cnt - t_act[bk], tRCD);
      t_wr[bk] <= cycle_cnt;
      // Data captured from i_dq_in on next i_dqs_wr edge (handled in write capture block)
    end
    CMD_PRE: begin
      if (bst[bk] != BST_ACTIVE)
        $error("[DRAM MDL] Bank %0d: PRE to already-idle bank @ cycle %0d", bk, cycle_cnt);
      if ((cycle_cnt - t_act[bk]) < tRAS)
        $error("[DRAM MDL] Bank %0d: tRAS violation — only %0d cycles active (need %0d)",
               bk, cycle_cnt - t_act[bk], tRAS);
      if ((cycle_cnt - t_wr[bk]) < tWR)
        $error("[DRAM MDL] Bank %0d: tWR violation — only %0d cycles since WR (need %0d)",
               bk, cycle_cnt - t_wr[bk], tWR);
      bst[bk]  <= BST_IDLE;
      t_pre[bk] <= cycle_cnt;
    end
    CMD_REF: begin
      bst[bk]  <= BST_REFRESHING;
      t_ref[bk] <= cycle_cnt;
      // Auto-return to IDLE after tRFCpb (handled in refresh timer block)
    end
  endcase
endtask

Mode Register Shadow

HBM3 mode registers (MR0–MR7) control operating parameters: CAS latency, write recovery time, burst length, refresh mode, and power-down behaviour. The behavioral model maintains a shadow copy of all 8 mode registers and uses them to configure timing parameters dynamically.

systemverilog — mode register shadow
logic [7:0] mode_reg [8];  // MR0–MR7 shadow copies
int         CL_param;      // CAS latency extracted from MR0
int         CWL_param;     // Write latency from MR0

// MRS command handler
task automatic handle_mrs(input logic [7:0] ca_cy0_p, ca_cy1_p);
  logic [2:0] mr_addr;
  logic [7:0] mr_data;
  mr_addr = ca_cy1_p[2:0];
  mr_data = {ca_cy0_p[3:0], ca_cy1_p[7:4]};
  mode_reg[mr_addr] <= mr_data;
  $display("[DRAM MDL] MRS: MR%0d <= 8'h%02h @ cycle %0d", mr_addr, mr_data, cycle_cnt);
  // Update derived parameters
  case (mr_addr)
    3'd0: begin
      // MR0[3:0] = CAS latency code: 4'd0 = CL14, 4'd7 = CL70 (HBM3 typical)
      case (mr_data[3:0])
        4'd0: CL_param = 14;
        4'd3: CL_param = 36;
        4'd7: CL_param = 70;
        default: CL_param = 70;
      endcase
      CWL_param = CL_param / 2;  // Simplified: CWL ~ CL/2
    end
  endcase
endtask

Read Data Return Pipeline

CAS latency (CL = 70 cycles) is the number of clock cycles between the RD command and the first valid read data on o_dq_out. The model implements this as a 71-stage valid+data shift register. When a RD is decoded, stage 0 is loaded with {valid=1, data=mem[addr]}. After 70 clock edges, the entry emerges from stage 70 and drives the output.

systemverilog — CL=70 read pipeline
localparam int MAX_CL = 128;

logic [31:0] rd_pipe_data  [MAX_CL];
logic        rd_pipe_valid [MAX_CL];

// Shift pipeline on every clock
always_ff @(posedge i_clk) begin : cl_pipeline
  int i;
  // Shift from stage 0 toward MAX_CL-1
  for (i = MAX_CL-1; i > 0; i = i - 1) begin
    rd_pipe_data [i] <= rd_pipe_data [i-1];
    rd_pipe_valid[i] <= rd_pipe_valid[i-1];
  end
  // Clear stage 0 (will be loaded by schedule_read task on same edge)
  rd_pipe_data [0] <= '0;
  rd_pipe_valid[0] <= 1'b0;
end

// Schedule a read: push data into pipeline at CL offset
task automatic schedule_read(
  input logic [4:0]  bk,
  input logic [13:0] row,
  input logic [5:0]  col
);
  logic [31:0] rdata;
  rdata = do_read(bk, row, col);
  // Stage 0 gets loaded; after CL_param clocks it appears at output
  rd_pipe_data [0] = rdata;
  rd_pipe_valid[0] = 1'b1;
endtask

// Output assignment — CL_param stages down the pipeline
assign o_dq_out  = rd_pipe_valid[CL_param] ? rd_pipe_data[CL_param] : 32'hzzzz_zzzz;
assign o_dqs_rd  = rd_pipe_valid[CL_param];

// Also handle write data capture on DQS strobe
always_ff @(posedge i_dqs_wr) begin : write_capture
  // Pending write address must be tracked from CMD_WR dispatch
  if (wr_pending) begin
    do_write(wr_bank, wr_row, wr_col, i_dq_in, i_dm);
    wr_pending <= 1'b0;
  end
end

Full Behavioral Model

The complete hbm3_dram_model.sv integrates all subsystems into a single simulation module. This is the file you drop into your testbench alongside the controller DUT.

systemverilog — hbm3_dram_model.sv (complete)
// =============================================================
// hbm3_dram_model.sv — HBM3 Behavioral DRAM Model
// EcrioniX HBM3 Controller Build · Module 16
// SIMULATION ONLY — NOT SYNTHESIZABLE
// =============================================================
`timescale 1ns/1ps

module hbm3_dram_model #(
  parameter int CL_DEFAULT  = 70,
  parameter int CWL_DEFAULT = 36,
  parameter int NUM_BANKS   = 32
)(
  input  logic        i_clk,
  input  logic        i_cke,
  input  logic [7:0]  i_ca,
  input  logic [31:0] i_dq_in,
  input  logic [3:0]  i_dm,
  input  logic        i_dqs_wr,
  output logic [31:0] o_dq_out,
  output logic        o_dqs_rd
);

  // ----- Timing Parameters (HBM3 @ 2 GHz) -----
  localparam int tRCD   = 28;
  localparam int tRAS   = 76;
  localparam int tRP    = 28;
  localparam int tRC    = 112;
  localparam int tWR    = 40;
  localparam int tCCD   = 8;
  localparam int tRFC   = 440;
  localparam int tRFCpb = 140;

  // ----- Memory Array -----
  logic [31:0] mem [logic [24:0]];
  localparam logic [31:0] UNINIT = 32'hDEAD_BEEF;

  // ----- Bank State -----
  typedef enum logic [1:0] {BST_IDLE=2'd0, BST_ACTIVE=2'd1, BST_REFRESH=2'd2} bank_state_t;
  bank_state_t  bst      [NUM_BANKS];
  logic [63:0]  t_act    [NUM_BANKS];
  logic [63:0]  t_pre    [NUM_BANKS];
  logic [63:0]  t_wr     [NUM_BANKS];
  logic [13:0]  open_row [NUM_BANKS];
  logic [63:0]  cycle_cnt;

  // ----- Mode Registers -----
  logic [7:0] mode_reg [8];
  int         CL_param  = CL_DEFAULT;
  int         CWL_param = CWL_DEFAULT;

  // ----- CA Decode State -----
  logic [7:0] ca_cy0;
  logic       packet_phase;
  logic       cmd_valid;

  typedef enum logic [2:0] {
    CMD_NOP=3'd7, CMD_ACT=3'd0, CMD_RD=3'd1,
    CMD_WR=3'd2,  CMD_PRE=3'd3, CMD_REF=3'd4, CMD_MRS=3'd5
  } cmd_t;

  cmd_t        cur_cmd;
  logic [4:0]  cur_bank;
  logic [13:0] cur_row;
  logic [5:0]  cur_col;

  // ----- Write Pending State -----
  logic        wr_pending;
  logic [4:0]  wr_bank;
  logic [13:0] wr_row;
  logic [5:0]  wr_col;

  // ----- CL Read Pipeline -----
  localparam int MAX_CL = 128;
  logic [31:0] rd_pipe_data  [MAX_CL];
  logic        rd_pipe_valid [MAX_CL];

  // Cycle counter
  always_ff @(posedge i_clk) cycle_cnt <= cycle_cnt + 64'd1;

  // Read pipeline shift
  always_ff @(posedge i_clk) begin : cl_pipe
    int j;
    for (j = MAX_CL-1; j > 0; j--) begin
      rd_pipe_data [j] <= rd_pipe_data [j-1];
      rd_pipe_valid[j] <= rd_pipe_valid[j-1];
    end
    rd_pipe_data [0] <= '0;
    rd_pipe_valid[0] <= 1'b0;
  end

  // Output from CL stage
  assign o_dq_out = rd_pipe_valid[CL_param] ? rd_pipe_data[CL_param] : 32'hzzzz_zzzz;
  assign o_dqs_rd = rd_pipe_valid[CL_param];

  // Write capture on DQS
  always_ff @(posedge i_dqs_wr) begin : wr_cap
    if (wr_pending) begin
      automatic logic [24:0] waddr = {wr_bank, wr_row, wr_col};
      automatic logic [31:0] existing = mem.exists(waddr) ? mem[waddr] : UNINIT;
      if (!i_dm[0]) existing[ 7: 0] = i_dq_in[ 7: 0];
      if (!i_dm[1]) existing[15: 8] = i_dq_in[15: 8];
      if (!i_dm[2]) existing[23:16] = i_dq_in[23:16];
      if (!i_dm[3]) existing[31:24] = i_dq_in[31:24];
      mem[waddr]  = existing;
      wr_pending <= 1'b0;
    end
  end

  // Main CA decoder + command dispatcher
  always_ff @(posedge i_clk) begin : ca_decode
    cmd_valid <= 1'b0;
    if (!i_cke) begin
      packet_phase <= 1'b0;
    end else if (!packet_phase) begin
      ca_cy0       <= i_ca;
      packet_phase <= 1'b1;
    end else begin
      packet_phase <= 1'b0;
      cmd_valid    <= 1'b1;
      // Decode and dispatch
      case (ca_cy0[7:6])
        2'b00: begin  // ACT
          automatic logic [4:0] bk = ca_cy0[4:0];
          if (bst[bk] != BST_IDLE)
            $error("[MDL] Bank%0d: ACT to non-idle bank (state=%0d) @cy%0d",bk,bst[bk],cycle_cnt);
          if ((cycle_cnt - t_pre[bk]) < tRP)
            $error("[MDL] Bank%0d: tRP viol %0d<%0d @cy%0d",bk,cycle_cnt-t_pre[bk],tRP,cycle_cnt);
          bst[bk]      <= BST_ACTIVE;
          open_row[bk] <= {i_ca[5:0], ca_cy0[7:0]};
          t_act[bk]    <= cycle_cnt;
        end
        2'b01: begin  // RD or WR
          automatic logic [4:0] bk = ca_cy0[4:0];
          automatic logic [5:0] col = i_ca[5:0];
          if (bst[bk] != BST_ACTIVE)
            $error("[MDL] Bank%0d: column cmd to closed bank @cy%0d",bk,cycle_cnt);
          if ((cycle_cnt - t_act[bk]) < tRCD)
            $error("[MDL] Bank%0d: tRCD viol %0d<%0d @cy%0d",bk,cycle_cnt-t_act[bk],tRCD,cycle_cnt);
          if (!ca_cy0[5]) begin  // RD
            automatic logic [24:0] raddr = {bk, open_row[bk], col};
            automatic logic [31:0] rdata = mem.exists(raddr) ? mem[raddr] : UNINIT;
            rd_pipe_data [0] = rdata;
            rd_pipe_valid[0] = 1'b1;
          end else begin  // WR
            wr_bank    <= bk;
            wr_row     <= open_row[bk];
            wr_col     <= col;
            wr_pending <= 1'b1;
            t_wr[bk]   <= cycle_cnt;
          end
        end
        2'b10: begin  // PRE or REF
          if (!ca_cy0[5]) begin  // PRE
            automatic logic [4:0] bk = ca_cy0[4:0];
            if (bst[bk] != BST_ACTIVE)
              $error("[MDL] Bank%0d: PRE to idle bank @cy%0d",bk,cycle_cnt);
            if ((cycle_cnt - t_act[bk]) < tRAS)
              $error("[MDL] Bank%0d: tRAS viol %0d<%0d @cy%0d",bk,cycle_cnt-t_act[bk],tRAS,cycle_cnt);
            if ((cycle_cnt - t_wr[bk]) < tWR)
              $error("[MDL] Bank%0d: tWR viol %0d<%0d @cy%0d",bk,cycle_cnt-t_wr[bk],tWR,cycle_cnt);
            bst[bk]   <= BST_IDLE;
            t_pre[bk] <= cycle_cnt;
          end else begin  // REF — mark all banks refreshing (ABR simplified)
            for (int b = 0; b < NUM_BANKS; b++) bst[b] <= BST_REFRESH;
            fork
              begin
                repeat(tRFC) @(posedge i_clk);
                for (int b = 0; b < NUM_BANKS; b++) bst[b] <= BST_IDLE;
              end
            join_none
          end
        end
        2'b11: begin  // MRS
          automatic logic [2:0] mr_addr = i_ca[2:0];
          automatic logic [7:0] mr_data = {ca_cy0[3:0], i_ca[7:4]};
          mode_reg[mr_addr] <= mr_data;
          $display("[MDL] MRS MR%0d=8'h%02h @cy%0d", mr_addr, mr_data, cycle_cnt);
          if (mr_addr == 3'd0) begin
            case (mr_data[3:0])
              4'd7:    CL_param = 70;
              4'd3:    CL_param = 36;
              default: CL_param = 70;
            endcase
          end
        end
      endcase
    end
  end

  // Initialize
  initial begin
    cycle_cnt    = 0;
    packet_phase = 0;
    wr_pending   = 0;
    for (int b = 0; b < NUM_BANKS; b++) begin
      bst[b]      = BST_IDLE;
      t_act[b]    = 64'hFFFF_FFFF_FFFF_FFFF;
      t_pre[b]    = 64'hFFFF_FFFF_FFFF_FFFF;
      t_wr[b]     = 64'hFFFF_FFFF_FFFF_FFFF;
      open_row[b] = '0;
    end
    for (int m = 0; m < 8; m++) mode_reg[m] = '0;
    for (int s = 0; s < MAX_CL; s++) begin
      rd_pipe_data[s]  = '0;
      rd_pipe_valid[s] = '0;
    end
    $display("[DRAM MDL] HBM3 Behavioral Model initialized. CL=%0d CWL=%0d",
             CL_param, CWL_param);
  end

endmodule

Using the Model with Your Testbench

Instantiate hbm3_dram_model in your testbench top alongside the controller DUT. Connect the controller's CA output directly to the model's CA input. The Module 17 testbench does this for you, but the standalone connection pattern is:

systemverilog — instantiation snippet
// In tb_hbm3_top.sv:
logic        dram_clk;
logic        dram_cke;
logic [7:0]  dram_ca;
logic [31:0] dram_dq_in, dram_dq_out;
logic [3:0]  dram_dm;
logic        dram_dqs_wr, dram_dqs_rd;

// Controller DUT drives CA bus and DQ
hbm3_pc_ctrl dut (
  .i_clk          (clk),
  // ... AXI4 inputs ...
  .o_dram_cke     (dram_cke),
  .o_dram_ca      (dram_ca),
  .o_dram_dq      (dram_dq_in),
  .o_dram_dm      (dram_dm),
  .o_dram_dqs_wr  (dram_dqs_wr),
  .i_dram_dq      (dram_dq_out),
  .i_dram_dqs_rd  (dram_dqs_rd)
);

// Behavioral DRAM model — responds to controller
hbm3_dram_model #(.CL_DEFAULT(70)) u_dram (
  .i_clk    (dram_clk),
  .i_cke    (dram_cke),
  .i_ca     (dram_ca),
  .i_dq_in  (dram_dq_in),
  .i_dm     (dram_dm),
  .i_dqs_wr (dram_dqs_wr),
  .o_dq_out (dram_dq_out),
  .o_dqs_rd (dram_dqs_rd)
);

assign dram_clk = clk;  // Same clock domain in single-channel TB

Model Limitations

The behavioral model is a functional model for protocol compliance verification. It intentionally omits several physical and architectural details:

FeatureModeledNot Modeled
Command/Address bus2-cycle CA packet decode, all 6 command typesCA bus parity, per-bit DBI
Bank stateIDLE/ACTIVE/REFRESHING per bankBank Group timing (tCCDS vs tCCDL)
Timing parameterstRCD, tRAS, tRP, tRC, tWR, tCCD, tRFCtFAW, tRRD, write-to-read turn-around
Data path32-bit write/read with byte masksECC lanes, DBI-DQ, data scrambling
RefreshABR blocking window (tRFC)PBR per-bank refresh tracking, tREFI watchdog
Mode registersMR0 CL field decodeMR1–MR7 full field decode
Power statesCKE=0 freezes modelSelf-refresh entry/exit, power-down timings
PHY layerNot modeled — direct signal connectionDFI protocol, FIFO, equalisation, jitter
Signal integrityNot modeledCrosstalk, SSO, impedance mismatch
TemperatureNot modeledRetention degradation, derating factors

Frequently Asked Questions

Why can't we just use a simple memory array for HBM3 simulation?

A simple memory array ignores DRAM protocol timing — it accepts any read or write at any time with no concept of bank activation, tRCD, CAS latency, or refresh. A behavioral DRAM model enforces timing constraints so the controller is exercised exactly as it would be with real silicon. Timing violations that would cause data corruption on real hardware get caught in simulation as assertion failures, making the model essential for verifying the controller's protocol compliance.

What does the CA bus decode to in HBM3?

The HBM3 CA bus is an 8-bit packet-based command/address bus. Each command occupies two consecutive cycles (a two-cycle packet). Bits [7:6] of the first cycle encode the command type: 00 = ACT (activate row), 01 = RD/WR, 10 = PRE/REF, 11 = MRS. The row address, column address, and bank address are packed into the remaining bits across the two cycles.

How does the CAS latency pipeline work in the behavioral model?

When the model decodes a RD command, it does not return data immediately. Instead, it pushes the read data into stage 0 of a shift-register pipeline of depth CL (70 cycles). After CL clock edges, the data emerges from stage 70 and drives o_dq_out along with o_dqs_rd. This faithfully models the latency the controller will observe in real hardware.

What timing parameters does the model check?

The model checks tRCD (28 cy — ACT to RD/WR), tRAS (76 cy — minimum row active time before PRE), tRP (28 cy — precharge recovery before next ACT), tRC (112 cy — ACT-to-ACT same bank), tWR (40 cy — write recovery before PRE), and tCCD (8 cy — column command spacing). Any violation triggers a $error with the bank number, measured cycles, and required cycles.

What does the model NOT simulate?

The behavioral model does not simulate signal integrity (crosstalk, jitter), PHY-level serialization, DFI protocol timing, power consumption, temperature effects on data retention, tFAW (four-activate window), bank-group timing (tCCDS vs tCCDL), or ECC encoding. It is a functional protocol-compliance model only. For PHY-level validation, a SPICE or transistor-level model is required.