A simulation-only SystemVerilog model of HBM3 DRAM. Decodes CA packets, enforces bank state machines, checks timing violations with $error assertions, and returns read data after CL cycles through a pipeline. Drop-in replacement for real HBM3 silicon in your testbench.
You cannot simulate a complete HBM3 controller without something on the other end of the CA bus. Real HBM3 silicon obviously cannot be connected in RTL simulation. A simple dual-port SRAM is insufficient — it accepts any access at any time, ignoring the entire DRAM protocol: bank activation, tRCD, CAS latency, precharge, refresh blackout windows, and mode register state. A controller that violates tRCD on a simple SRAM just works; on real silicon, it corrupts data silently.
A behavioral DRAM model fills this gap. It is a simulation-only SystemVerilog module that:
$error assertions on violationsThe complete simulation stack from Module 17 is: AXI4 Master BFM → HBM3 Controller DUT → hbm3_dram_model (this module) → Scoreboard. The model is the "oracle" — it stores what was written and returns it on read. The scoreboard compares model output to expected data from the BFM transaction log. Any mismatch is a controller bug.
The model has four cooperating subsystems:
Samples i_ca[7:0] on every rising edge of i_clk when i_cke is high. Accumulates two consecutive cycles into a command packet. Bits [7:6] of cycle 0 select the command class; remaining bits and cycle 1 carry addresses. Output: decoded command type + bank + row/column address.
32 independent 3-state FSMs: IDLE → ACTIVE (on ACT, stores open row address) → IDLE (on PRE) with REFRESHING side-state during REF. Timestamps each state transition for timing violation checks.
SystemVerilog associative array logic [31:0] mem [logic [24:0]]. Indexed by 25-bit {bank[4:0], row[13:0], col[5:0]}. Sparse — only written locations consume simulator memory. Default read value is 32'hDEAD_BEEF to catch uninitialized reads.
Stores cycle-stamp of last ACT/PRE/WR/RD per bank. On each new command, computes elapsed cycles and asserts if any JEDEC minimum is violated. A 71-entry shift register pipelines read data from storage to o_dq_out after CL cycles.
All ports are named from the controller's perspective — inputs are signals the controller drives, outputs are signals the model drives back to the controller.
| Port | Dir | Width | Meaning |
|---|---|---|---|
| i_clk | input | 1 | 2 GHz HBM3 clock. All state transitions on rising edge. |
| i_cke | input | 1 | Clock Enable. Model ignores CA bus when CKE=0 (power-down mode). |
| i_ca[7:0] | input | 8 | Command/Address bus. Packed 2-cycle packets per JEDEC JESD238 protocol. |
| i_dq_in[31:0] | input | 32 | Write data bus from controller. Captured on i_dqs_wr rising edge. |
| i_dm[3:0] | input | 4 | Data Mask — one bit per byte. Masked bytes are not written to the array. |
| i_dqs_wr | input | 1 | Write DQS strobe. Write data is captured on rising edge of this signal. |
| o_dq_out[31:0] | output | 32 | Read data to controller. Valid CL cycles after the RD command. |
| o_dqs_rd | output | 1 | Read DQS strobe driven by model to indicate valid read data on o_dq_out. |
Modeling 64 MB of DRAM as a flat Verilog register array would allocate 512 Mb (64 million 32-bit words) of simulator memory even if the test writes only a handful of locations. That is impractical. Instead, the model uses a SystemVerilog associative array which only allocates entries for addresses that have been explicitly written.
// 25-bit address = {bank[4:0], row[13:0], col[5:0]}
// 32-bit data word (controller uses 32-bit wide pseudo-channel)
logic [31:0] mem [logic [24:0]];
// Default uninitialized-read sentinel
localparam logic [31:0] UNINIT_SENTINEL = 32'hDEAD_BEEF;
// Write (masked)
task automatic do_write(
input logic [4:0] bank,
input logic [13:0] row,
input logic [5:0] col,
input logic [31:0] wdata,
input logic [3:0] dm
);
logic [24:0] addr;
logic [31:0] existing;
addr = {bank, row, col};
existing = mem.exists(addr) ? mem[addr] : UNINIT_SENTINEL;
// Apply data mask — dm[n]=1 means mask byte n (do NOT write)
if (!dm[0]) existing[ 7: 0] = wdata[ 7: 0];
if (!dm[1]) existing[15: 8] = wdata[15: 8];
if (!dm[2]) existing[23:16] = wdata[23:16];
if (!dm[3]) existing[31:24] = wdata[31:24];
mem[addr] = existing;
endtask
// Read
function automatic logic [31:0] do_read(
input logic [4:0] bank,
input logic [13:0] row,
input logic [5:0] col
);
logic [24:0] addr;
addr = {bank, row, col};
return mem.exists(addr) ? mem[addr] : UNINIT_SENTINEL;
endfunction
The 25-bit address is composed as {bank[4:0], row[13:0], col[5:0]}. With 32 banks (5 bits), 16,384 rows (14 bits), and 64 columns (6 bits), this covers the full HBM3 pseudo-channel address space of 32 × 16,384 × 64 × 4 bytes = 128 MB per pseudo-channel (though the default parameter limits actual modeled capacity to 64 MB).
HBM3's CA bus uses two-cycle command packets. The model accumulates two consecutive rising-edge samples when CKE is asserted, then dispatches to the appropriate bank handler. Bits [7:6] of the first cycle are the command class discriminator:
| CA[7:6] Cy0 | CA[5:4] Cy0 | Command | Address Payload |
|---|---|---|---|
| 2'b00 | — | ACT — Activate Row | bank[4:0] in cy0[4:0]; row[13:0] across cy0+cy1 |
| 2'b01 | 2'b00 | RD — Read | bank[4:0], col[5:0] in cy1 |
| 2'b01 | 2'b01 | WR — Write | bank[4:0], col[5:0] in cy1 |
| 2'b10 | 2'b00 | PRE — Precharge | bank[4:0] or all-banks flag |
| 2'b10 | 2'b01 | REF — Refresh | ABR or PBR + bank in cy1 |
| 2'b11 | — | MRS — Mode Reg Write | MR address [2:0], data [7:0] in cy1 |
// Two-phase CA packet decoder
logic [7:0] ca_cy0, ca_cy1;
logic packet_phase; // 0 = waiting for cycle-0, 1 = waiting for cycle-1
typedef enum logic [2:0] {
CMD_ACT = 3'd0,
CMD_RD = 3'd1,
CMD_WR = 3'd2,
CMD_PRE = 3'd3,
CMD_REF = 3'd4,
CMD_MRS = 3'd5,
CMD_NOP = 3'd7
} cmd_t;
cmd_t decoded_cmd;
logic [4:0] decoded_bank;
logic [13:0] decoded_row;
logic [5:0] decoded_col;
logic cmd_valid;
always_ff @(posedge i_clk) begin
cmd_valid <= 1'b0;
if (i_cke) begin
if (!packet_phase) begin
ca_cy0 <= i_ca;
packet_phase <= 1'b1;
end else begin
ca_cy1 <= i_ca;
packet_phase <= 1'b0;
cmd_valid <= 1'b1;
// Decode on cycle-1 arrival
case (ca_cy0[7:6])
2'b00: begin // ACT
decoded_cmd <= CMD_ACT;
decoded_bank <= ca_cy0[4:0];
decoded_row <= {i_ca[5:0], ca_cy0[7:0]}; // simplified packing
end
2'b01: begin // RD or WR
decoded_cmd <= (ca_cy0[5]) ? CMD_WR : CMD_RD;
decoded_bank <= ca_cy0[4:0];
decoded_col <= i_ca[5:0];
end
2'b10: begin // PRE or REF
if (ca_cy0[5]) begin
decoded_cmd <= CMD_REF;
end else begin
decoded_cmd <= CMD_PRE;
decoded_bank <= ca_cy0[4:0];
end
end
2'b11: begin // MRS
decoded_cmd <= CMD_MRS;
end
default: decoded_cmd <= CMD_NOP;
endcase
end
end else begin
packet_phase <= 1'b0; // Reset on CKE low
cmd_valid <= 1'b0;
end
end
Each of the 32 banks has an independent FSM. The FSM tracks whether the bank is idle (precharged), active (row open), or refreshing. State transitions are triggered by decoded commands:
| State | Trigger | Next State | Timing Check |
|---|---|---|---|
| IDLE | ACT | ACTIVE | tRP since last PRE must have elapsed |
| IDLE | REF | REFRESHING | No constraint (refresh always allowed from idle) |
| ACTIVE | RD | ACTIVE | tRCD since ACT must have elapsed |
| ACTIVE | WR | ACTIVE | tRCD since ACT must have elapsed |
| ACTIVE | PRE | IDLE | tRAS since ACT must have elapsed; tWR since last WR |
| REFRESHING | (timer) | IDLE | After tRFC/tRFCpb cycles automatically |
| ACTIVE | ACT | — (error) | $error: ACT to open bank (missing precharge) |
| IDLE | RD/WR | — (error) | $error: column command to closed bank |
// Timing parameters (HBM3 @ 2 GHz)
localparam int tRCD = 28; // ACT-to-RD/WR
localparam int tRAS = 76; // ACT-to-PRE minimum
localparam int tRP = 28; // PRE-to-ACT
localparam int tRC = 112; // ACT-to-ACT same bank
localparam int tWR = 40; // WR-to-PRE
localparam int tCCD = 8; // CCD between columns
localparam int tRFC = 440; // ABR refresh cycle
localparam int tRFCpb = 140; // PBR refresh cycle
typedef enum logic [1:0] {
BST_IDLE = 2'd0,
BST_ACTIVE = 2'd1,
BST_REFRESHING = 2'd2
} bank_state_t;
bank_state_t bst [32];
logic [63:0] t_act [32]; // cycle of last ACT
logic [63:0] t_pre [32]; // cycle of last PRE
logic [63:0] t_wr [32]; // cycle of last WR
logic [63:0] t_ref [32]; // cycle refresh started
logic [13:0] open_row[32]; // currently open row per bank
logic [63:0] cycle_cnt;
always_ff @(posedge i_clk) cycle_cnt <= cycle_cnt + 1;
// Generic bank command handler
task automatic bank_command(
input cmd_t cmd,
input logic [4:0] bk,
input logic [13:0] row,
input logic [5:0] col
);
case (cmd)
CMD_ACT: begin
if (bst[bk] != BST_IDLE)
$error("[DRAM MDL] Bank %0d: ACT while not idle (state=%0d) @ cycle %0d",
bk, bst[bk], cycle_cnt);
if ((cycle_cnt - t_pre[bk]) < tRP)
$error("[DRAM MDL] Bank %0d: tRP violation — only %0d cycles since PRE (need %0d)",
bk, cycle_cnt - t_pre[bk], tRP);
if ((cycle_cnt - t_act[bk]) < tRC)
$error("[DRAM MDL] Bank %0d: tRC violation — only %0d cycles since last ACT (need %0d)",
bk, cycle_cnt - t_act[bk], tRC);
bst[bk] <= BST_ACTIVE;
open_row[bk] <= row;
t_act[bk] <= cycle_cnt;
end
CMD_RD: begin
if (bst[bk] != BST_ACTIVE)
$error("[DRAM MDL] Bank %0d: RD to closed bank @ cycle %0d", bk, cycle_cnt);
if ((cycle_cnt - t_act[bk]) < tRCD)
$error("[DRAM MDL] Bank %0d: tRCD violation — only %0d cycles after ACT (need %0d)",
bk, cycle_cnt - t_act[bk], tRCD);
// Schedule read data return through CL pipeline
schedule_read(bk, open_row[bk], col);
end
CMD_WR: begin
if (bst[bk] != BST_ACTIVE)
$error("[DRAM MDL] Bank %0d: WR to closed bank @ cycle %0d", bk, cycle_cnt);
if ((cycle_cnt - t_act[bk]) < tRCD)
$error("[DRAM MDL] Bank %0d: tRCD violation on WR — only %0d cycles (need %0d)",
bk, cycle_cnt - t_act[bk], tRCD);
t_wr[bk] <= cycle_cnt;
// Data captured from i_dq_in on next i_dqs_wr edge (handled in write capture block)
end
CMD_PRE: begin
if (bst[bk] != BST_ACTIVE)
$error("[DRAM MDL] Bank %0d: PRE to already-idle bank @ cycle %0d", bk, cycle_cnt);
if ((cycle_cnt - t_act[bk]) < tRAS)
$error("[DRAM MDL] Bank %0d: tRAS violation — only %0d cycles active (need %0d)",
bk, cycle_cnt - t_act[bk], tRAS);
if ((cycle_cnt - t_wr[bk]) < tWR)
$error("[DRAM MDL] Bank %0d: tWR violation — only %0d cycles since WR (need %0d)",
bk, cycle_cnt - t_wr[bk], tWR);
bst[bk] <= BST_IDLE;
t_pre[bk] <= cycle_cnt;
end
CMD_REF: begin
bst[bk] <= BST_REFRESHING;
t_ref[bk] <= cycle_cnt;
// Auto-return to IDLE after tRFCpb (handled in refresh timer block)
end
endcase
endtask
HBM3 mode registers (MR0–MR7) control operating parameters: CAS latency, write recovery time, burst length, refresh mode, and power-down behaviour. The behavioral model maintains a shadow copy of all 8 mode registers and uses them to configure timing parameters dynamically.
logic [7:0] mode_reg [8]; // MR0–MR7 shadow copies
int CL_param; // CAS latency extracted from MR0
int CWL_param; // Write latency from MR0
// MRS command handler
task automatic handle_mrs(input logic [7:0] ca_cy0_p, ca_cy1_p);
logic [2:0] mr_addr;
logic [7:0] mr_data;
mr_addr = ca_cy1_p[2:0];
mr_data = {ca_cy0_p[3:0], ca_cy1_p[7:4]};
mode_reg[mr_addr] <= mr_data;
$display("[DRAM MDL] MRS: MR%0d <= 8'h%02h @ cycle %0d", mr_addr, mr_data, cycle_cnt);
// Update derived parameters
case (mr_addr)
3'd0: begin
// MR0[3:0] = CAS latency code: 4'd0 = CL14, 4'd7 = CL70 (HBM3 typical)
case (mr_data[3:0])
4'd0: CL_param = 14;
4'd3: CL_param = 36;
4'd7: CL_param = 70;
default: CL_param = 70;
endcase
CWL_param = CL_param / 2; // Simplified: CWL ~ CL/2
end
endcase
endtask
CAS latency (CL = 70 cycles) is the number of clock cycles between the RD command and the first valid read data on o_dq_out. The model implements this as a 71-stage valid+data shift register. When a RD is decoded, stage 0 is loaded with {valid=1, data=mem[addr]}. After 70 clock edges, the entry emerges from stage 70 and drives the output.
localparam int MAX_CL = 128;
logic [31:0] rd_pipe_data [MAX_CL];
logic rd_pipe_valid [MAX_CL];
// Shift pipeline on every clock
always_ff @(posedge i_clk) begin : cl_pipeline
int i;
// Shift from stage 0 toward MAX_CL-1
for (i = MAX_CL-1; i > 0; i = i - 1) begin
rd_pipe_data [i] <= rd_pipe_data [i-1];
rd_pipe_valid[i] <= rd_pipe_valid[i-1];
end
// Clear stage 0 (will be loaded by schedule_read task on same edge)
rd_pipe_data [0] <= '0;
rd_pipe_valid[0] <= 1'b0;
end
// Schedule a read: push data into pipeline at CL offset
task automatic schedule_read(
input logic [4:0] bk,
input logic [13:0] row,
input logic [5:0] col
);
logic [31:0] rdata;
rdata = do_read(bk, row, col);
// Stage 0 gets loaded; after CL_param clocks it appears at output
rd_pipe_data [0] = rdata;
rd_pipe_valid[0] = 1'b1;
endtask
// Output assignment — CL_param stages down the pipeline
assign o_dq_out = rd_pipe_valid[CL_param] ? rd_pipe_data[CL_param] : 32'hzzzz_zzzz;
assign o_dqs_rd = rd_pipe_valid[CL_param];
// Also handle write data capture on DQS strobe
always_ff @(posedge i_dqs_wr) begin : write_capture
// Pending write address must be tracked from CMD_WR dispatch
if (wr_pending) begin
do_write(wr_bank, wr_row, wr_col, i_dq_in, i_dm);
wr_pending <= 1'b0;
end
end
The complete hbm3_dram_model.sv integrates all subsystems into a single simulation module. This is the file you drop into your testbench alongside the controller DUT.
// =============================================================
// hbm3_dram_model.sv — HBM3 Behavioral DRAM Model
// EcrioniX HBM3 Controller Build · Module 16
// SIMULATION ONLY — NOT SYNTHESIZABLE
// =============================================================
`timescale 1ns/1ps
module hbm3_dram_model #(
parameter int CL_DEFAULT = 70,
parameter int CWL_DEFAULT = 36,
parameter int NUM_BANKS = 32
)(
input logic i_clk,
input logic i_cke,
input logic [7:0] i_ca,
input logic [31:0] i_dq_in,
input logic [3:0] i_dm,
input logic i_dqs_wr,
output logic [31:0] o_dq_out,
output logic o_dqs_rd
);
// ----- Timing Parameters (HBM3 @ 2 GHz) -----
localparam int tRCD = 28;
localparam int tRAS = 76;
localparam int tRP = 28;
localparam int tRC = 112;
localparam int tWR = 40;
localparam int tCCD = 8;
localparam int tRFC = 440;
localparam int tRFCpb = 140;
// ----- Memory Array -----
logic [31:0] mem [logic [24:0]];
localparam logic [31:0] UNINIT = 32'hDEAD_BEEF;
// ----- Bank State -----
typedef enum logic [1:0] {BST_IDLE=2'd0, BST_ACTIVE=2'd1, BST_REFRESH=2'd2} bank_state_t;
bank_state_t bst [NUM_BANKS];
logic [63:0] t_act [NUM_BANKS];
logic [63:0] t_pre [NUM_BANKS];
logic [63:0] t_wr [NUM_BANKS];
logic [13:0] open_row [NUM_BANKS];
logic [63:0] cycle_cnt;
// ----- Mode Registers -----
logic [7:0] mode_reg [8];
int CL_param = CL_DEFAULT;
int CWL_param = CWL_DEFAULT;
// ----- CA Decode State -----
logic [7:0] ca_cy0;
logic packet_phase;
logic cmd_valid;
typedef enum logic [2:0] {
CMD_NOP=3'd7, CMD_ACT=3'd0, CMD_RD=3'd1,
CMD_WR=3'd2, CMD_PRE=3'd3, CMD_REF=3'd4, CMD_MRS=3'd5
} cmd_t;
cmd_t cur_cmd;
logic [4:0] cur_bank;
logic [13:0] cur_row;
logic [5:0] cur_col;
// ----- Write Pending State -----
logic wr_pending;
logic [4:0] wr_bank;
logic [13:0] wr_row;
logic [5:0] wr_col;
// ----- CL Read Pipeline -----
localparam int MAX_CL = 128;
logic [31:0] rd_pipe_data [MAX_CL];
logic rd_pipe_valid [MAX_CL];
// Cycle counter
always_ff @(posedge i_clk) cycle_cnt <= cycle_cnt + 64'd1;
// Read pipeline shift
always_ff @(posedge i_clk) begin : cl_pipe
int j;
for (j = MAX_CL-1; j > 0; j--) begin
rd_pipe_data [j] <= rd_pipe_data [j-1];
rd_pipe_valid[j] <= rd_pipe_valid[j-1];
end
rd_pipe_data [0] <= '0;
rd_pipe_valid[0] <= 1'b0;
end
// Output from CL stage
assign o_dq_out = rd_pipe_valid[CL_param] ? rd_pipe_data[CL_param] : 32'hzzzz_zzzz;
assign o_dqs_rd = rd_pipe_valid[CL_param];
// Write capture on DQS
always_ff @(posedge i_dqs_wr) begin : wr_cap
if (wr_pending) begin
automatic logic [24:0] waddr = {wr_bank, wr_row, wr_col};
automatic logic [31:0] existing = mem.exists(waddr) ? mem[waddr] : UNINIT;
if (!i_dm[0]) existing[ 7: 0] = i_dq_in[ 7: 0];
if (!i_dm[1]) existing[15: 8] = i_dq_in[15: 8];
if (!i_dm[2]) existing[23:16] = i_dq_in[23:16];
if (!i_dm[3]) existing[31:24] = i_dq_in[31:24];
mem[waddr] = existing;
wr_pending <= 1'b0;
end
end
// Main CA decoder + command dispatcher
always_ff @(posedge i_clk) begin : ca_decode
cmd_valid <= 1'b0;
if (!i_cke) begin
packet_phase <= 1'b0;
end else if (!packet_phase) begin
ca_cy0 <= i_ca;
packet_phase <= 1'b1;
end else begin
packet_phase <= 1'b0;
cmd_valid <= 1'b1;
// Decode and dispatch
case (ca_cy0[7:6])
2'b00: begin // ACT
automatic logic [4:0] bk = ca_cy0[4:0];
if (bst[bk] != BST_IDLE)
$error("[MDL] Bank%0d: ACT to non-idle bank (state=%0d) @cy%0d",bk,bst[bk],cycle_cnt);
if ((cycle_cnt - t_pre[bk]) < tRP)
$error("[MDL] Bank%0d: tRP viol %0d<%0d @cy%0d",bk,cycle_cnt-t_pre[bk],tRP,cycle_cnt);
bst[bk] <= BST_ACTIVE;
open_row[bk] <= {i_ca[5:0], ca_cy0[7:0]};
t_act[bk] <= cycle_cnt;
end
2'b01: begin // RD or WR
automatic logic [4:0] bk = ca_cy0[4:0];
automatic logic [5:0] col = i_ca[5:0];
if (bst[bk] != BST_ACTIVE)
$error("[MDL] Bank%0d: column cmd to closed bank @cy%0d",bk,cycle_cnt);
if ((cycle_cnt - t_act[bk]) < tRCD)
$error("[MDL] Bank%0d: tRCD viol %0d<%0d @cy%0d",bk,cycle_cnt-t_act[bk],tRCD,cycle_cnt);
if (!ca_cy0[5]) begin // RD
automatic logic [24:0] raddr = {bk, open_row[bk], col};
automatic logic [31:0] rdata = mem.exists(raddr) ? mem[raddr] : UNINIT;
rd_pipe_data [0] = rdata;
rd_pipe_valid[0] = 1'b1;
end else begin // WR
wr_bank <= bk;
wr_row <= open_row[bk];
wr_col <= col;
wr_pending <= 1'b1;
t_wr[bk] <= cycle_cnt;
end
end
2'b10: begin // PRE or REF
if (!ca_cy0[5]) begin // PRE
automatic logic [4:0] bk = ca_cy0[4:0];
if (bst[bk] != BST_ACTIVE)
$error("[MDL] Bank%0d: PRE to idle bank @cy%0d",bk,cycle_cnt);
if ((cycle_cnt - t_act[bk]) < tRAS)
$error("[MDL] Bank%0d: tRAS viol %0d<%0d @cy%0d",bk,cycle_cnt-t_act[bk],tRAS,cycle_cnt);
if ((cycle_cnt - t_wr[bk]) < tWR)
$error("[MDL] Bank%0d: tWR viol %0d<%0d @cy%0d",bk,cycle_cnt-t_wr[bk],tWR,cycle_cnt);
bst[bk] <= BST_IDLE;
t_pre[bk] <= cycle_cnt;
end else begin // REF — mark all banks refreshing (ABR simplified)
for (int b = 0; b < NUM_BANKS; b++) bst[b] <= BST_REFRESH;
fork
begin
repeat(tRFC) @(posedge i_clk);
for (int b = 0; b < NUM_BANKS; b++) bst[b] <= BST_IDLE;
end
join_none
end
end
2'b11: begin // MRS
automatic logic [2:0] mr_addr = i_ca[2:0];
automatic logic [7:0] mr_data = {ca_cy0[3:0], i_ca[7:4]};
mode_reg[mr_addr] <= mr_data;
$display("[MDL] MRS MR%0d=8'h%02h @cy%0d", mr_addr, mr_data, cycle_cnt);
if (mr_addr == 3'd0) begin
case (mr_data[3:0])
4'd7: CL_param = 70;
4'd3: CL_param = 36;
default: CL_param = 70;
endcase
end
end
endcase
end
end
// Initialize
initial begin
cycle_cnt = 0;
packet_phase = 0;
wr_pending = 0;
for (int b = 0; b < NUM_BANKS; b++) begin
bst[b] = BST_IDLE;
t_act[b] = 64'hFFFF_FFFF_FFFF_FFFF;
t_pre[b] = 64'hFFFF_FFFF_FFFF_FFFF;
t_wr[b] = 64'hFFFF_FFFF_FFFF_FFFF;
open_row[b] = '0;
end
for (int m = 0; m < 8; m++) mode_reg[m] = '0;
for (int s = 0; s < MAX_CL; s++) begin
rd_pipe_data[s] = '0;
rd_pipe_valid[s] = '0;
end
$display("[DRAM MDL] HBM3 Behavioral Model initialized. CL=%0d CWL=%0d",
CL_param, CWL_param);
end
endmodule
Instantiate hbm3_dram_model in your testbench top alongside the controller DUT. Connect the controller's CA output directly to the model's CA input. The Module 17 testbench does this for you, but the standalone connection pattern is:
// In tb_hbm3_top.sv: logic dram_clk; logic dram_cke; logic [7:0] dram_ca; logic [31:0] dram_dq_in, dram_dq_out; logic [3:0] dram_dm; logic dram_dqs_wr, dram_dqs_rd; // Controller DUT drives CA bus and DQ hbm3_pc_ctrl dut ( .i_clk (clk), // ... AXI4 inputs ... .o_dram_cke (dram_cke), .o_dram_ca (dram_ca), .o_dram_dq (dram_dq_in), .o_dram_dm (dram_dm), .o_dram_dqs_wr (dram_dqs_wr), .i_dram_dq (dram_dq_out), .i_dram_dqs_rd (dram_dqs_rd) ); // Behavioral DRAM model — responds to controller hbm3_dram_model #(.CL_DEFAULT(70)) u_dram ( .i_clk (dram_clk), .i_cke (dram_cke), .i_ca (dram_ca), .i_dq_in (dram_dq_in), .i_dm (dram_dm), .i_dqs_wr (dram_dqs_wr), .o_dq_out (dram_dq_out), .o_dqs_rd (dram_dqs_rd) ); assign dram_clk = clk; // Same clock domain in single-channel TB
The behavioral model is a functional model for protocol compliance verification. It intentionally omits several physical and architectural details:
| Feature | Modeled | Not Modeled |
|---|---|---|
| Command/Address bus | 2-cycle CA packet decode, all 6 command types | CA bus parity, per-bit DBI |
| Bank state | IDLE/ACTIVE/REFRESHING per bank | Bank Group timing (tCCDS vs tCCDL) |
| Timing parameters | tRCD, tRAS, tRP, tRC, tWR, tCCD, tRFC | tFAW, tRRD, write-to-read turn-around |
| Data path | 32-bit write/read with byte masks | ECC lanes, DBI-DQ, data scrambling |
| Refresh | ABR blocking window (tRFC) | PBR per-bank refresh tracking, tREFI watchdog |
| Mode registers | MR0 CL field decode | MR1–MR7 full field decode |
| Power states | CKE=0 freezes model | Self-refresh entry/exit, power-down timings |
| PHY layer | Not modeled — direct signal connection | DFI protocol, FIFO, equalisation, jitter |
| Signal integrity | Not modeled | Crosstalk, SSO, impedance mismatch |
| Temperature | Not modeled | Retention degradation, derating factors |
A simple memory array ignores DRAM protocol timing — it accepts any read or write at any time with no concept of bank activation, tRCD, CAS latency, or refresh. A behavioral DRAM model enforces timing constraints so the controller is exercised exactly as it would be with real silicon. Timing violations that would cause data corruption on real hardware get caught in simulation as assertion failures, making the model essential for verifying the controller's protocol compliance.
The HBM3 CA bus is an 8-bit packet-based command/address bus. Each command occupies two consecutive cycles (a two-cycle packet). Bits [7:6] of the first cycle encode the command type: 00 = ACT (activate row), 01 = RD/WR, 10 = PRE/REF, 11 = MRS. The row address, column address, and bank address are packed into the remaining bits across the two cycles.
When the model decodes a RD command, it does not return data immediately. Instead, it pushes the read data into stage 0 of a shift-register pipeline of depth CL (70 cycles). After CL clock edges, the data emerges from stage 70 and drives o_dq_out along with o_dqs_rd. This faithfully models the latency the controller will observe in real hardware.
The model checks tRCD (28 cy — ACT to RD/WR), tRAS (76 cy — minimum row active time before PRE), tRP (28 cy — precharge recovery before next ACT), tRC (112 cy — ACT-to-ACT same bank), tWR (40 cy — write recovery before PRE), and tCCD (8 cy — column command spacing). Any violation triggers a $error with the bank number, measured cycles, and required cycles.
The behavioral model does not simulate signal integrity (crosstalk, jitter), PHY-level serialization, DFI protocol timing, power consumption, temperature effects on data retention, tFAW (four-activate window), bank-group timing (tCCDS vs tCCDL), or ECC encoding. It is a functional protocol-compliance model only. For PHY-level validation, a SPICE or transistor-level model is required.