The HBM3 PHY is analog — DLL, PLL, ZQ calibration, SerDes. We model it behaviorally for simulation. This module builds the behavioral PHY model and the synthesizable digital control-plane interface between controller and PHY.
Every DRAM controller in production — LiteDRAM, OpenRAM, every commercial memory IP — ships with a behavioral PHY model alongside its synthesizable RTL. The reason is fundamental: the HBM3 PHY is an analog mixed-signal circuit that operates at sub-nanosecond timing precision.
The PHY contains circuits that cannot be described in synthesizable Verilog:
The behavioral model captures the timing effects of these circuits — write latency pipeline depth, read capture alignment, initialization sequence duration — without describing their analog implementation. This lets the full HBM3 controller RTL simulate correctly with a cycle-accurate PHY model in a standard RTL simulator.
The HBM3 PHY is organized into four functional paths:
The controller drives CA[7:0] and CKE to the PHY. The PHY serializes these onto the DRAM's command bus at the interface data rate. The CA path includes a propagation delay of approximately 1 clock cycle through the PHY's output buffer.
Write data flows: Controller drives DQ_out[31:0] and DM[3:0] → PHY applies write leveling delay (wdqs_delay) → PHY serializes and drives DQ pads. The DQS strobe is enabled via DQS_en. After Write Latency (WL) cycles, data appears at the DRAM pads.
Read data flows in reverse: DRAM drives DQ pads → PHY's DLL-centered capture captures DQ on DQS edges → PHY drives DQ_in[31:0] to controller after Read Latency (RL) cycles. The DQS_valid signal qualifies when the captured data is valid.
At power-up the PHY runs a multi-step training sequence: ZQ calibration → Write leveling → Read DQS centering → Normal operation. The phy_init_done signal goes high when all calibration steps pass.
HBM3 uses a silicon interposer where DQ/DQS traces have slightly different lengths than the CK trace. This means the DQS strobe launched by the controller arrives at the DRAM at a different phase than CK. Write leveling compensates for this.
During write leveling the controller drives DQS while the DRAM samples DQS with its internal CK. The DRAM reports, via MR readback, whether DQS led or lagged CK. The PHY increments wdqs_delay[4:0] (a 5-bit tap delay code) until the DRAM reports DQS rising edge aligned to CK rising edge.
In the behavioral model, wdqs_delay sets a #(WL + wdqs_delay) delay on the write data path. Real silicon uses a DLL tap chain; the behavioral model uses a shift register of parameterizable depth.
For reads the DRAM drives DQ data centered around DQS edges it generates. The DQS strobe arrives at the PHY with some delay relative to the PHY's internal clock. The PHY must capture DQ at the center of the valid data window.
rdqs_delay[4:0] shifts the PHY's internal DQS capture edge. Centering is done by sweeping rdqs_delay and checking for read data errors. The value that provides the widest pass window is selected as the operating point — typically the center of the eye diagram.
In the behavioral model, read data is captured by sampling i_dq_pad after (RL + rdqs_delay) cycles. The o_dqs_valid signal pulses to indicate valid captured data.
HBM3 requires periodic ZQ calibration to keep output driver impedance matched to the PCB transmission line. The DRAM package has an external ZQ resistor (240 Ω nominal) connected to a VDDQ/2 reference. An analog comparator inside the PHY compares the driver impedance against this reference and adjusts a 6-bit code (ZQCAL code) using successive approximation.
Two calibration commands exist: ZQCL (Long, ~512 cycles — at power-up) and ZQCS (Short, ~64 cycles — periodic). The controller must issue ZQCL during initialization and schedule ZQCS commands periodically during idle windows.
The behavioral model implements ZQ calibration as a FSM: when i_zq_cal_req asserts, the model waits ZQ_CAL_CYCLES cycles and then asserts o_zq_cal_done.
After power-up the PHY must complete a fixed initialization sequence before the controller can issue DRAM commands. The behavioral model runs through these stages:
| Stage | Duration (cycles) | Description | Exit Signal |
|---|---|---|---|
| RESET | 200 | CKE low, RESET_n low, all outputs tri-stated | — |
| POWER_STABLE | 100 | VDD/VDDQ stable, CKE still low | — |
| ZQ_LONG | 512 | ZQCL command issued, impedance calibration | zq_cal_done |
| WR_LEVELING | 64 | Write leveling sweep, wdqs_delay converges | — |
| RD_CENTERING | 64 | Read DQS centering sweep, rdqs_delay converges | — |
| NORMAL | — | phy_init_done=1, controller may proceed | phy_init_done |
INIT_CYCLES can be adjusted to match a specific PHY datasheet.| Signal | Direction | Width | Description |
|---|---|---|---|
| i_ca | Ctrl→PHY | [7:0] | Command/Address bus to DRAM |
| i_cke | Ctrl→PHY | 1 | Clock Enable — gates DRAM clock |
| i_wr_data | Ctrl→PHY | [31:0] | Write data to serialize onto DQ |
| i_dm | Ctrl→PHY | [3:0] | Data mask (per-byte write enable) |
| i_dqs_en | Ctrl→PHY | 1 | Enable DQS output toggling for write |
| i_wr_en | Ctrl→PHY | 1 | Write enable qualifier |
| i_wdqs_delay | Ctrl→PHY | [4:0] | Write DQS delay tap (0–31) |
| i_rdqs_delay | Ctrl→PHY | [4:0] | Read DQS capture delay tap (0–31) |
| i_zq_cal_req | Ctrl→PHY | 1 | Request ZQ calibration (ZQCS/ZQCL) |
| i_phy_init_req | Ctrl→PHY | 1 | Start PHY initialization sequence |
| i_dq_pad | DRAM→PHY | [31:0] | Incoming DQ from DRAM pads (read) |
| o_rd_data | PHY→Ctrl | [31:0] | Captured read data after RL pipeline |
| o_dqs_valid | PHY→Ctrl | 1 | Read data valid strobe |
| o_phy_init_done | PHY→Ctrl | 1 | All PHY calibration complete |
| o_zq_cal_done | PHY→Ctrl | 1 | ZQ calibration complete |
| o_dq_out | PHY→DRAM | [31:0] | Write data output to DRAM pads (after WL delay) |
// ============================================================
// hbm3_phy_model.sv — HBM3 PHY Behavioral Model
// EcrioniX · HBM3 Controller Build · Module 13
// Phase 4: PHY Interface
// ============================================================
// Behavioral model — NOT synthesizable.
// The digital controller-to-PHY interface signals ARE the
// synthesizable boundary; everything inside this module is
// behavioral modeling of the analog PHY core.
// ============================================================
`timescale 1ns/1ps
module hbm3_phy_model #(
parameter WL = 8, // Write Latency (cycles)
parameter RL = 16, // Read Latency (cycles)
parameter INIT_CYCLES = 940, // Total PHY init (reset+ZQ+WL+RD)
parameter ZQ_CAL_CYC = 512 // ZQCL duration (cycles)
)(
// Clock and reset
input logic i_clk,
input logic i_rst_n,
// Controller -> PHY: Command/Address
input logic [7:0] i_ca,
input logic i_cke,
// Controller -> PHY: Write data path
input logic [31:0] i_wr_data,
input logic [3:0] i_dm,
input logic i_dqs_en,
input logic i_wr_en,
// Controller -> PHY: Delay calibration codes
input logic [4:0] i_wdqs_delay, // Write DQS delay taps
input logic [4:0] i_rdqs_delay, // Read DQS delay taps
// Controller -> PHY: Initialization and calibration
input logic i_phy_init_req,
input logic i_zq_cal_req,
// PHY -> Controller: Read data
output logic [31:0] o_rd_data,
output logic o_dqs_valid,
// PHY -> Controller: Status
output logic o_phy_init_done,
output logic o_zq_cal_done,
// PHY -> DRAM pads: Write data (post write-leveling delay)
output logic [31:0] o_dq_out,
output logic [3:0] o_dm_out,
output logic o_dqs_out,
// DRAM -> PHY pads: Read data (driven by DRAM model)
input logic [31:0] i_dq_pad
);
// ============================================================
// Initialization FSM
// ============================================================
typedef enum logic [2:0] {
S_IDLE = 3'd0,
S_RESET = 3'd1,
S_POWER_WAIT = 3'd2,
S_ZQ_LONG = 3'd3,
S_WR_LEVEL = 3'd4,
S_RD_CENTER = 3'd5,
S_DONE = 3'd6
} init_state_t;
init_state_t init_state;
logic [9:0] init_cnt;
always_ff @(posedge i_clk or negedge i_rst_n) begin
if (!i_rst_n) begin
init_state <= S_IDLE;
init_cnt <= '0;
o_phy_init_done <= 1'b0;
end else begin
case (init_state)
S_IDLE: begin
o_phy_init_done <= 1'b0;
if (i_phy_init_req) begin
init_state <= S_RESET;
init_cnt <= '0;
end
end
S_RESET: begin
if (init_cnt == 10'd199) begin
init_state <= S_POWER_WAIT;
init_cnt <= '0;
end else init_cnt <= init_cnt + 1;
end
S_POWER_WAIT: begin
if (init_cnt == 10'd99) begin
init_state <= S_ZQ_LONG;
init_cnt <= '0;
end else init_cnt <= init_cnt + 1;
end
S_ZQ_LONG: begin
if (init_cnt == 10'd511) begin
init_state <= S_WR_LEVEL;
init_cnt <= '0;
end else init_cnt <= init_cnt + 1;
end
S_WR_LEVEL: begin
if (init_cnt == 10'd63) begin
init_state <= S_RD_CENTER;
init_cnt <= '0;
end else init_cnt <= init_cnt + 1;
end
S_RD_CENTER: begin
if (init_cnt == 10'd63) begin
init_state <= S_DONE;
o_phy_init_done <= 1'b1;
end else init_cnt <= init_cnt + 1;
end
S_DONE: begin
o_phy_init_done <= 1'b1;
end
default: init_state <= S_IDLE;
endcase
end
end
// ============================================================
// ZQ Calibration (periodic ZQCS)
// ============================================================
logic [9:0] zq_cnt;
logic zq_busy;
always_ff @(posedge i_clk or negedge i_rst_n) begin
if (!i_rst_n) begin
zq_cnt <= '0;
zq_busy <= 1'b0;
o_zq_cal_done <= 1'b0;
end else begin
o_zq_cal_done <= 1'b0;
if (!zq_busy && i_zq_cal_req && o_phy_init_done) begin
zq_busy <= 1'b1;
zq_cnt <= '0;
end else if (zq_busy) begin
if (zq_cnt == ZQ_CAL_CYC[9:0] - 1) begin
zq_busy <= 1'b0;
o_zq_cal_done <= 1'b1;
end else zq_cnt <= zq_cnt + 1;
end
end
end
// ============================================================
// Write Data Path — Write Leveling Delay Pipeline
// Behavioral: shift register depth = WL + wdqs_delay
// ============================================================
localparam WL_MAX = WL + 31; // max pipeline depth
logic [31:0] wr_pipe [0:WL_MAX];
logic [3:0] dm_pipe [0:WL_MAX];
logic dqs_pipe [0:WL_MAX];
integer i;
always_ff @(posedge i_clk or negedge i_rst_n) begin
if (!i_rst_n) begin
for (i = 0; i <= WL_MAX; i++)
{wr_pipe[i], dm_pipe[i], dqs_pipe[i]} <= '0;
o_dq_out <= '0;
o_dm_out <= '0;
o_dqs_out <= 1'b0;
end else begin
// Shift pipeline
wr_pipe[0] <= i_wr_en ? i_wr_data : '0;
dm_pipe[0] <= i_wr_en ? i_dm : '0;
dqs_pipe[0] <= i_dqs_en;
for (i = 1; i <= WL_MAX; i++) begin
wr_pipe[i] <= wr_pipe[i-1];
dm_pipe[i] <= dm_pipe[i-1];
dqs_pipe[i] <= dqs_pipe[i-1];
end
// Tap output at WL + wdqs_delay
o_dq_out <= wr_pipe [WL + i_wdqs_delay];
o_dm_out <= dm_pipe [WL + i_wdqs_delay];
o_dqs_out <= dqs_pipe[WL + i_wdqs_delay];
end
end
// ============================================================
// Read Data Path — DQS Centering Delay Pipeline
// Behavioral: sample i_dq_pad into shift register,
// tap output at RL + rdqs_delay
// ============================================================
localparam RL_MAX = RL + 31;
logic [31:0] rd_pipe [0:RL_MAX];
logic rdv_pipe [0:RL_MAX];
// DQS_valid is high when DRAM is driving DQ pads (read burst)
// Behavioral: we use the DQS_en delayed by RL as proxy
logic dqs_en_dly;
always_ff @(posedge i_clk or negedge i_rst_n) begin
if (!i_rst_n) begin
for (i = 0; i <= RL_MAX; i++)
{rd_pipe[i], rdv_pipe[i]} <= '0;
o_rd_data <= '0;
o_dqs_valid <= 1'b0;
end else begin
rd_pipe[0] <= i_dq_pad;
rdv_pipe[0] <= i_dqs_en; // dqs_en qualifies read burst too in model
for (i = 1; i <= RL_MAX; i++) begin
rd_pipe[i] <= rd_pipe[i-1];
rdv_pipe[i] <= rdv_pipe[i-1];
end
o_rd_data <= rd_pipe [RL + i_rdqs_delay];
o_dqs_valid <= rdv_pipe[RL + i_rdqs_delay];
end
end
endmodule
// ============================================================
// tb_hbm3_phy_model.sv — Testbench for HBM3 PHY Behavioral Model
// EcrioniX · HBM3 Controller Build · Module 13
// ============================================================
`timescale 1ns/1ps
module tb_hbm3_phy_model;
// Parameters match DUT
localparam WL = 8;
localparam RL = 16;
// DUT ports
logic clk, rst_n;
logic [7:0] ca;
logic cke;
logic [31:0] wr_data;
logic [3:0] dm;
logic dqs_en, wr_en;
logic [4:0] wdqs_dly, rdqs_dly;
logic init_req, zq_req;
logic [31:0] dq_pad;
logic [31:0] rd_data;
logic dqs_valid;
logic init_done, zq_done;
logic [31:0] dq_out;
logic [3:0] dm_out;
logic dqs_out;
// Instantiate DUT
hbm3_phy_model #(.WL(WL), .RL(RL), .INIT_CYCLES(940), .ZQ_CAL_CYC(512)) dut (
.i_clk(clk), .i_rst_n(rst_n),
.i_ca(ca), .i_cke(cke),
.i_wr_data(wr_data), .i_dm(dm),
.i_dqs_en(dqs_en), .i_wr_en(wr_en),
.i_wdqs_delay(wdqs_dly), .i_rdqs_delay(rdqs_dly),
.i_phy_init_req(init_req), .i_zq_cal_req(zq_req),
.i_dq_pad(dq_pad),
.o_rd_data(rd_data), .o_dqs_valid(dqs_valid),
.o_phy_init_done(init_done), .o_zq_cal_done(zq_done),
.o_dq_out(dq_out), .o_dm_out(dm_out), .o_dqs_out(dqs_out)
);
// 500MHz clock (2ns period)
initial clk = 0;
always #1 clk = ~clk;
// Track test errors
integer errors = 0;
task wait_cycles(input integer n);
repeat(n) @(posedge clk);
endtask
// ---- Test sequence ----
initial begin
$dumpfile("tb_hbm3_phy.vcd");
$dumpvars(0, tb_hbm3_phy_model);
// Reset
rst_n = 0; ca = '0; cke = 0;
wr_data = '0; dm = '0; dqs_en = 0; wr_en = 0;
wdqs_dly = 5'd4; rdqs_dly = 5'd8;
init_req = 0; zq_req = 0; dq_pad = '0;
wait_cycles(10);
rst_n = 1;
wait_cycles(5);
// --- TEST 1: PHY Initialization ---
$display("[%0t] TEST1: PHY init sequence", $time);
init_req = 1;
@(posedge clk); init_req = 0;
wait_cycles(940);
if (!init_done) begin
$error("FAIL: phy_init_done did not assert after 940 cycles");
errors++;
end else $display("[%0t] PASS: phy_init_done asserted", $time);
// --- TEST 2: Write Latency Pipeline ---
$display("[%0t] TEST2: Write latency pipeline (WL=%0d wdqs_dly=%0d)", $time, WL, dut.i_wdqs_delay);
wr_data = 32'hDEAD_BEEF;
dqs_en = 1; wr_en = 1;
@(posedge clk);
wr_en = 0; dqs_en = 0;
// Data should appear at o_dq_out after WL+wdqs_dly cycles
wait_cycles(WL + dut.i_wdqs_delay - 1);
@(posedge clk);
if (dq_out !== 32'hDEAD_BEEF) begin
$error("FAIL: Write latency incorrect. Expected DEADBEEF got %0h", dq_out);
errors++;
end else $display("[%0t] PASS: Write data appeared at correct latency", $time);
// --- TEST 3: Read Latency Pipeline ---
$display("[%0t] TEST3: Read latency pipeline (RL=%0d rdqs_dly=%0d)", $time, RL, dut.i_rdqs_delay);
dq_pad = 32'hCAFE_1234;
dqs_en = 1; // used as read-valid qualifier in model
@(posedge clk);
dqs_en = 0;
wait_cycles(RL + dut.i_rdqs_delay - 1);
@(posedge clk);
if (rd_data !== 32'hCAFE_1234) begin
$error("FAIL: Read latency incorrect. Expected CAFE1234 got %0h", rd_data);
errors++;
end else $display("[%0t] PASS: Read data captured at correct latency", $time);
// --- TEST 4: ZQ Calibration ---
$display("[%0t] TEST4: ZQ calibration", $time);
zq_req = 1;
@(posedge clk); zq_req = 0;
wait_cycles(514);
if (!zq_done) begin
$error("FAIL: zq_cal_done did not assert");
errors++;
end else $display("[%0t] PASS: ZQ calibration complete", $time);
// --- Summary ---
wait_cycles(20);
if (errors == 0)
$display("[%0t] ALL TESTS PASSED", $time);
else
$display("[%0t] %0d TEST(S) FAILED", $time, errors);
$finish;
end
// Timeout watchdog
initial begin
#200000;
$error("TIMEOUT: simulation exceeded 200us");
$finish;
end
endmodule
The HBM3 PHY contains analog circuits — DLL, PLL, ZQ calibration, SerDes front-ends — that cannot be described in synthesizable Verilog. These operate at transistor level. The behavioral model captures timing effects (latency, DQS alignment, ZQ state) without the analog implementation, enabling full RTL simulation with a cycle-accurate PHY representation.
Write leveling calibrates the delay between the controller's DQS strobe and the DRAM's clock arrival. Silicon interposer traces have different physical lengths, so DQS can arrive at the DRAM at a different phase than CK. Write leveling adjusts wdqs_delay per-byte so the DQS rising edge aligns with CK inside the DRAM, ensuring reliable write data capture.
Read DQS centering positions the PHY's capture window so DQ is sampled at the center of the valid data eye. The DRAM drives DQS aligned to the center of each DQ bit period. The controller's PHY applies rdqs_delay to shift its capture strobe to fall exactly in the middle of the DQ valid window, maximizing setup and hold margins.
ZQ calibration sets the output driver impedance of both DRAM and PHY to match the PCB transmission line (240 Ω external reference in HBM3). Mismatched impedance causes reflections on DQ/DQS that degrade signal integrity and increase bit-error rate. ZQ calibration runs at power-up (ZQCL, ~512 cycles) and periodically during idle windows (ZQCS, ~64 cycles).
The digital interface carries: CA[7:0] and CKE (command/address), DQ_out[31:0] and DM[3:0] (write data), DQS_en (write DQS enable), WR_en (write qualifier), wdqs_delay[4:0] / rdqs_delay[4:0] (delay codes). The PHY returns DQ_in[31:0] (captured read data), DQS_valid, phy_init_done, and zq_cal_done.