HomeHBM3 ControllerModule 13 — PHY Interface Model
Phase 4 · Module 13

HBM3 PHY Interface Behavioral Model

The HBM3 PHY is analog — DLL, PLL, ZQ calibration, SerDes. We model it behaviorally for simulation. This module builds the behavioral PHY model and the synthesizable digital control-plane interface between controller and PHY.

hbm3_phy_model.sv tb_hbm3_phy_model.sv Behavioral + Synthesizable RTL JEDEC JESD238 Phase 4

1. Why the PHY Must Be Modeled Behaviorally

Every DRAM controller in production — LiteDRAM, OpenRAM, every commercial memory IP — ships with a behavioral PHY model alongside its synthesizable RTL. The reason is fundamental: the HBM3 PHY is an analog mixed-signal circuit that operates at sub-nanosecond timing precision.

The PHY contains circuits that cannot be described in synthesizable Verilog:

The behavioral model captures the timing effects of these circuits — write latency pipeline depth, read capture alignment, initialization sequence duration — without describing their analog implementation. This lets the full HBM3 controller RTL simulate correctly with a cycle-accurate PHY model in a standard RTL simulator.

The digital signals crossing the controller-to-PHY boundary (CA, DQ, DQS_en, delay codes) ARE synthesizable. Module 13 models both sides: the behavioral analog PHY core and the synthesizable digital interface.

2. PHY Layer Architecture

The HBM3 PHY is organized into four functional paths:

Command/Address (CA) Path

The controller drives CA[7:0] and CKE to the PHY. The PHY serializes these onto the DRAM's command bus at the interface data rate. The CA path includes a propagation delay of approximately 1 clock cycle through the PHY's output buffer.

DQ/DQS Write Path

Write data flows: Controller drives DQ_out[31:0] and DM[3:0] → PHY applies write leveling delay (wdqs_delay) → PHY serializes and drives DQ pads. The DQS strobe is enabled via DQS_en. After Write Latency (WL) cycles, data appears at the DRAM pads.

DQ/DQS Read Path

Read data flows in reverse: DRAM drives DQ pads → PHY's DLL-centered capture captures DQ on DQS edges → PHY drives DQ_in[31:0] to controller after Read Latency (RL) cycles. The DQS_valid signal qualifies when the captured data is valid.

Training and Calibration

At power-up the PHY runs a multi-step training sequence: ZQ calibration → Write leveling → Read DQS centering → Normal operation. The phy_init_done signal goes high when all calibration steps pass.

HBM3 CONTROLLER hbm3_ctrl_top.v Scheduler / Timing FSM Bank FSM (32 banks) AXI4 Interface PHY Interface (digital) HBM3 PHY (Behavioral) hbm3_phy_model.sv DLL — DQS delay generation Write Leveling (wdqs_delay) Read DQS Centering (rdqs_delay) ZQ Calibration (analog) Init FSM — training sequencer SerDes / I/O Buffers (analog) HBM3 DRAM 8 Hi × 8 Gb stacks CA[7:0] / CK DQ[1023:0] / DQS[127:0] ZQ (240Ω external) VREF, RESET_n digital pads Synthesizable: Controller + PHY digital interface  |  Behavioral: PHY analog core

3. Write Leveling — Calibrating DQS-to-CK Alignment

HBM3 uses a silicon interposer where DQ/DQS traces have slightly different lengths than the CK trace. This means the DQS strobe launched by the controller arrives at the DRAM at a different phase than CK. Write leveling compensates for this.

During write leveling the controller drives DQS while the DRAM samples DQS with its internal CK. The DRAM reports, via MR readback, whether DQS led or lagged CK. The PHY increments wdqs_delay[4:0] (a 5-bit tap delay code) until the DRAM reports DQS rising edge aligned to CK rising edge.

In the behavioral model, wdqs_delay sets a #(WL + wdqs_delay) delay on the write data path. Real silicon uses a DLL tap chain; the behavioral model uses a shift register of parameterizable depth.

wdqs_delay range: 0–31 taps. Each tap represents half a UI (unit interval). At 8 Gbps (HBM3 Gen2) one UI = 125 ps, so 31 taps provides up to ~1.9 ns of DQS alignment range — sufficient for typical silicon interposer trace skews.

4. Read DQS Centering

For reads the DRAM drives DQ data centered around DQS edges it generates. The DQS strobe arrives at the PHY with some delay relative to the PHY's internal clock. The PHY must capture DQ at the center of the valid data window.

rdqs_delay[4:0] shifts the PHY's internal DQS capture edge. Centering is done by sweeping rdqs_delay and checking for read data errors. The value that provides the widest pass window is selected as the operating point — typically the center of the eye diagram.

In the behavioral model, read data is captured by sampling i_dq_pad after (RL + rdqs_delay) cycles. The o_dqs_valid signal pulses to indicate valid captured data.

5. ZQ Calibration

HBM3 requires periodic ZQ calibration to keep output driver impedance matched to the PCB transmission line. The DRAM package has an external ZQ resistor (240 Ω nominal) connected to a VDDQ/2 reference. An analog comparator inside the PHY compares the driver impedance against this reference and adjusts a 6-bit code (ZQCAL code) using successive approximation.

Two calibration commands exist: ZQCL (Long, ~512 cycles — at power-up) and ZQCS (Short, ~64 cycles — periodic). The controller must issue ZQCL during initialization and schedule ZQCS commands periodically during idle windows.

The behavioral model implements ZQ calibration as a FSM: when i_zq_cal_req asserts, the model waits ZQ_CAL_CYCLES cycles and then asserts o_zq_cal_done.

6. PHY Initialization Sequence

After power-up the PHY must complete a fixed initialization sequence before the controller can issue DRAM commands. The behavioral model runs through these stages:

StageDuration (cycles)DescriptionExit Signal
RESET200CKE low, RESET_n low, all outputs tri-stated
POWER_STABLE100VDD/VDDQ stable, CKE still low
ZQ_LONG512ZQCL command issued, impedance calibrationzq_cal_done
WR_LEVELING64Write leveling sweep, wdqs_delay converges
RD_CENTERING64Read DQS centering sweep, rdqs_delay converges
NORMALphy_init_done=1, controller may proceedphy_init_done
The behavioral model uses cycle counts from JEDEC JESD238. Real silicon timings vary by vendor. The parameter INIT_CYCLES can be adjusted to match a specific PHY datasheet.

7. Controller-to-PHY Interface Signal Table

SignalDirectionWidthDescription
i_caCtrl→PHY[7:0]Command/Address bus to DRAM
i_ckeCtrl→PHY1Clock Enable — gates DRAM clock
i_wr_dataCtrl→PHY[31:0]Write data to serialize onto DQ
i_dmCtrl→PHY[3:0]Data mask (per-byte write enable)
i_dqs_enCtrl→PHY1Enable DQS output toggling for write
i_wr_enCtrl→PHY1Write enable qualifier
i_wdqs_delayCtrl→PHY[4:0]Write DQS delay tap (0–31)
i_rdqs_delayCtrl→PHY[4:0]Read DQS capture delay tap (0–31)
i_zq_cal_reqCtrl→PHY1Request ZQ calibration (ZQCS/ZQCL)
i_phy_init_reqCtrl→PHY1Start PHY initialization sequence
i_dq_padDRAM→PHY[31:0]Incoming DQ from DRAM pads (read)
o_rd_dataPHY→Ctrl[31:0]Captured read data after RL pipeline
o_dqs_validPHY→Ctrl1Read data valid strobe
o_phy_init_donePHY→Ctrl1All PHY calibration complete
o_zq_cal_donePHY→Ctrl1ZQ calibration complete
o_dq_outPHY→DRAM[31:0]Write data output to DRAM pads (after WL delay)

8. Full Behavioral SystemVerilog Model

SystemVerilog — hbm3_phy_model.sv
// ============================================================
// hbm3_phy_model.sv — HBM3 PHY Behavioral Model
// EcrioniX · HBM3 Controller Build · Module 13
// Phase 4: PHY Interface
// ============================================================
// Behavioral model — NOT synthesizable.
// The digital controller-to-PHY interface signals ARE the
// synthesizable boundary; everything inside this module is
// behavioral modeling of the analog PHY core.
// ============================================================

`timescale 1ns/1ps

module hbm3_phy_model #(
    parameter WL          = 8,   // Write Latency (cycles)
    parameter RL          = 16,  // Read  Latency (cycles)
    parameter INIT_CYCLES = 940, // Total PHY init (reset+ZQ+WL+RD)
    parameter ZQ_CAL_CYC  = 512  // ZQCL duration (cycles)
)(
    // Clock and reset
    input  logic        i_clk,
    input  logic        i_rst_n,

    // Controller -> PHY: Command/Address
    input  logic [7:0]  i_ca,
    input  logic        i_cke,

    // Controller -> PHY: Write data path
    input  logic [31:0] i_wr_data,
    input  logic [3:0]  i_dm,
    input  logic        i_dqs_en,
    input  logic        i_wr_en,

    // Controller -> PHY: Delay calibration codes
    input  logic [4:0]  i_wdqs_delay, // Write DQS delay taps
    input  logic [4:0]  i_rdqs_delay, // Read  DQS delay taps

    // Controller -> PHY: Initialization and calibration
    input  logic        i_phy_init_req,
    input  logic        i_zq_cal_req,

    // PHY -> Controller: Read data
    output logic [31:0] o_rd_data,
    output logic        o_dqs_valid,

    // PHY -> Controller: Status
    output logic        o_phy_init_done,
    output logic        o_zq_cal_done,

    // PHY -> DRAM pads: Write data (post write-leveling delay)
    output logic [31:0] o_dq_out,
    output logic [3:0]  o_dm_out,
    output logic        o_dqs_out,

    // DRAM -> PHY pads: Read data (driven by DRAM model)
    input  logic [31:0] i_dq_pad
);

// ============================================================
// Initialization FSM
// ============================================================
typedef enum logic [2:0] {
    S_IDLE        = 3'd0,
    S_RESET       = 3'd1,
    S_POWER_WAIT  = 3'd2,
    S_ZQ_LONG     = 3'd3,
    S_WR_LEVEL    = 3'd4,
    S_RD_CENTER   = 3'd5,
    S_DONE        = 3'd6
} init_state_t;

init_state_t init_state;
logic [9:0] init_cnt;

always_ff @(posedge i_clk or negedge i_rst_n) begin
    if (!i_rst_n) begin
        init_state      <= S_IDLE;
        init_cnt        <= '0;
        o_phy_init_done <= 1'b0;
    end else begin
        case (init_state)
            S_IDLE: begin
                o_phy_init_done <= 1'b0;
                if (i_phy_init_req) begin
                    init_state <= S_RESET;
                    init_cnt   <= '0;
                end
            end
            S_RESET: begin
                if (init_cnt == 10'd199) begin
                    init_state <= S_POWER_WAIT;
                    init_cnt   <= '0;
                end else init_cnt <= init_cnt + 1;
            end
            S_POWER_WAIT: begin
                if (init_cnt == 10'd99) begin
                    init_state <= S_ZQ_LONG;
                    init_cnt   <= '0;
                end else init_cnt <= init_cnt + 1;
            end
            S_ZQ_LONG: begin
                if (init_cnt == 10'd511) begin
                    init_state <= S_WR_LEVEL;
                    init_cnt   <= '0;
                end else init_cnt <= init_cnt + 1;
            end
            S_WR_LEVEL: begin
                if (init_cnt == 10'd63) begin
                    init_state <= S_RD_CENTER;
                    init_cnt   <= '0;
                end else init_cnt <= init_cnt + 1;
            end
            S_RD_CENTER: begin
                if (init_cnt == 10'd63) begin
                    init_state      <= S_DONE;
                    o_phy_init_done <= 1'b1;
                end else init_cnt <= init_cnt + 1;
            end
            S_DONE: begin
                o_phy_init_done <= 1'b1;
            end
            default: init_state <= S_IDLE;
        endcase
    end
end

// ============================================================
// ZQ Calibration (periodic ZQCS)
// ============================================================
logic [9:0] zq_cnt;
logic       zq_busy;

always_ff @(posedge i_clk or negedge i_rst_n) begin
    if (!i_rst_n) begin
        zq_cnt       <= '0;
        zq_busy      <= 1'b0;
        o_zq_cal_done <= 1'b0;
    end else begin
        o_zq_cal_done <= 1'b0;
        if (!zq_busy && i_zq_cal_req && o_phy_init_done) begin
            zq_busy <= 1'b1;
            zq_cnt  <= '0;
        end else if (zq_busy) begin
            if (zq_cnt == ZQ_CAL_CYC[9:0] - 1) begin
                zq_busy       <= 1'b0;
                o_zq_cal_done <= 1'b1;
            end else zq_cnt <= zq_cnt + 1;
        end
    end
end

// ============================================================
// Write Data Path — Write Leveling Delay Pipeline
// Behavioral: shift register depth = WL + wdqs_delay
// ============================================================
localparam WL_MAX = WL + 31; // max pipeline depth

logic [31:0] wr_pipe  [0:WL_MAX];
logic [3:0]  dm_pipe  [0:WL_MAX];
logic        dqs_pipe [0:WL_MAX];

integer i;
always_ff @(posedge i_clk or negedge i_rst_n) begin
    if (!i_rst_n) begin
        for (i = 0; i <= WL_MAX; i++)
            {wr_pipe[i], dm_pipe[i], dqs_pipe[i]} <= '0;
        o_dq_out  <= '0;
        o_dm_out  <= '0;
        o_dqs_out <= 1'b0;
    end else begin
        // Shift pipeline
        wr_pipe[0]  <= i_wr_en  ? i_wr_data : '0;
        dm_pipe[0]  <= i_wr_en  ? i_dm      : '0;
        dqs_pipe[0] <= i_dqs_en;
        for (i = 1; i <= WL_MAX; i++) begin
            wr_pipe[i]  <= wr_pipe[i-1];
            dm_pipe[i]  <= dm_pipe[i-1];
            dqs_pipe[i] <= dqs_pipe[i-1];
        end
        // Tap output at WL + wdqs_delay
        o_dq_out  <= wr_pipe [WL + i_wdqs_delay];
        o_dm_out  <= dm_pipe [WL + i_wdqs_delay];
        o_dqs_out <= dqs_pipe[WL + i_wdqs_delay];
    end
end

// ============================================================
// Read Data Path — DQS Centering Delay Pipeline
// Behavioral: sample i_dq_pad into shift register,
// tap output at RL + rdqs_delay
// ============================================================
localparam RL_MAX = RL + 31;

logic [31:0] rd_pipe  [0:RL_MAX];
logic        rdv_pipe [0:RL_MAX];

// DQS_valid is high when DRAM is driving DQ pads (read burst)
// Behavioral: we use the DQS_en delayed by RL as proxy
logic dqs_en_dly;
always_ff @(posedge i_clk or negedge i_rst_n) begin
    if (!i_rst_n) begin
        for (i = 0; i <= RL_MAX; i++)
            {rd_pipe[i], rdv_pipe[i]} <= '0;
        o_rd_data  <= '0;
        o_dqs_valid <= 1'b0;
    end else begin
        rd_pipe[0]  <= i_dq_pad;
        rdv_pipe[0] <= i_dqs_en; // dqs_en qualifies read burst too in model
        for (i = 1; i <= RL_MAX; i++) begin
            rd_pipe[i]  <= rd_pipe[i-1];
            rdv_pipe[i] <= rdv_pipe[i-1];
        end
        o_rd_data   <= rd_pipe [RL + i_rdqs_delay];
        o_dqs_valid <= rdv_pipe[RL + i_rdqs_delay];
    end
end

endmodule

9. SystemVerilog Testbench

SystemVerilog — tb_hbm3_phy_model.sv
// ============================================================
// tb_hbm3_phy_model.sv — Testbench for HBM3 PHY Behavioral Model
// EcrioniX · HBM3 Controller Build · Module 13
// ============================================================
`timescale 1ns/1ps

module tb_hbm3_phy_model;

// Parameters match DUT
localparam WL = 8;
localparam RL = 16;

// DUT ports
logic        clk, rst_n;
logic [7:0]  ca;
logic        cke;
logic [31:0] wr_data;
logic [3:0]  dm;
logic        dqs_en, wr_en;
logic [4:0]  wdqs_dly, rdqs_dly;
logic        init_req, zq_req;
logic [31:0] dq_pad;

logic [31:0] rd_data;
logic        dqs_valid;
logic        init_done, zq_done;
logic [31:0] dq_out;
logic [3:0]  dm_out;
logic        dqs_out;

// Instantiate DUT
hbm3_phy_model #(.WL(WL), .RL(RL), .INIT_CYCLES(940), .ZQ_CAL_CYC(512)) dut (
    .i_clk(clk),         .i_rst_n(rst_n),
    .i_ca(ca),           .i_cke(cke),
    .i_wr_data(wr_data), .i_dm(dm),
    .i_dqs_en(dqs_en),   .i_wr_en(wr_en),
    .i_wdqs_delay(wdqs_dly), .i_rdqs_delay(rdqs_dly),
    .i_phy_init_req(init_req), .i_zq_cal_req(zq_req),
    .i_dq_pad(dq_pad),
    .o_rd_data(rd_data), .o_dqs_valid(dqs_valid),
    .o_phy_init_done(init_done), .o_zq_cal_done(zq_done),
    .o_dq_out(dq_out),  .o_dm_out(dm_out), .o_dqs_out(dqs_out)
);

// 500MHz clock (2ns period)
initial clk = 0;
always #1 clk = ~clk;

// Track test errors
integer errors = 0;

task wait_cycles(input integer n);
    repeat(n) @(posedge clk);
endtask

// ---- Test sequence ----
initial begin
    $dumpfile("tb_hbm3_phy.vcd");
    $dumpvars(0, tb_hbm3_phy_model);

    // Reset
    rst_n = 0; ca = '0; cke = 0;
    wr_data = '0; dm = '0; dqs_en = 0; wr_en = 0;
    wdqs_dly = 5'd4; rdqs_dly = 5'd8;
    init_req = 0; zq_req = 0; dq_pad = '0;
    wait_cycles(10);
    rst_n = 1;
    wait_cycles(5);

    // --- TEST 1: PHY Initialization ---
    $display("[%0t] TEST1: PHY init sequence", $time);
    init_req = 1;
    @(posedge clk); init_req = 0;
    wait_cycles(940);
    if (!init_done) begin
        $error("FAIL: phy_init_done did not assert after 940 cycles");
        errors++;
    end else $display("[%0t] PASS: phy_init_done asserted", $time);

    // --- TEST 2: Write Latency Pipeline ---
    $display("[%0t] TEST2: Write latency pipeline (WL=%0d wdqs_dly=%0d)", $time, WL, dut.i_wdqs_delay);
    wr_data = 32'hDEAD_BEEF;
    dqs_en  = 1; wr_en = 1;
    @(posedge clk);
    wr_en = 0; dqs_en = 0;
    // Data should appear at o_dq_out after WL+wdqs_dly cycles
    wait_cycles(WL + dut.i_wdqs_delay - 1);
    @(posedge clk);
    if (dq_out !== 32'hDEAD_BEEF) begin
        $error("FAIL: Write latency incorrect. Expected DEADBEEF got %0h", dq_out);
        errors++;
    end else $display("[%0t] PASS: Write data appeared at correct latency", $time);

    // --- TEST 3: Read Latency Pipeline ---
    $display("[%0t] TEST3: Read latency pipeline (RL=%0d rdqs_dly=%0d)", $time, RL, dut.i_rdqs_delay);
    dq_pad  = 32'hCAFE_1234;
    dqs_en  = 1; // used as read-valid qualifier in model
    @(posedge clk);
    dqs_en = 0;
    wait_cycles(RL + dut.i_rdqs_delay - 1);
    @(posedge clk);
    if (rd_data !== 32'hCAFE_1234) begin
        $error("FAIL: Read latency incorrect. Expected CAFE1234 got %0h", rd_data);
        errors++;
    end else $display("[%0t] PASS: Read data captured at correct latency", $time);

    // --- TEST 4: ZQ Calibration ---
    $display("[%0t] TEST4: ZQ calibration", $time);
    zq_req = 1;
    @(posedge clk); zq_req = 0;
    wait_cycles(514);
    if (!zq_done) begin
        $error("FAIL: zq_cal_done did not assert");
        errors++;
    end else $display("[%0t] PASS: ZQ calibration complete", $time);

    // --- Summary ---
    wait_cycles(20);
    if (errors == 0)
        $display("[%0t] ALL TESTS PASSED", $time);
    else
        $display("[%0t] %0d TEST(S) FAILED", $time, errors);
    $finish;
end

// Timeout watchdog
initial begin
    #200000;
    $error("TIMEOUT: simulation exceeded 200us");
    $finish;
end

endmodule

Frequently Asked Questions

Why is the HBM3 PHY modeled behaviorally instead of synthesized?

The HBM3 PHY contains analog circuits — DLL, PLL, ZQ calibration, SerDes front-ends — that cannot be described in synthesizable Verilog. These operate at transistor level. The behavioral model captures timing effects (latency, DQS alignment, ZQ state) without the analog implementation, enabling full RTL simulation with a cycle-accurate PHY representation.

What is write leveling in HBM3?

Write leveling calibrates the delay between the controller's DQS strobe and the DRAM's clock arrival. Silicon interposer traces have different physical lengths, so DQS can arrive at the DRAM at a different phase than CK. Write leveling adjusts wdqs_delay per-byte so the DQS rising edge aligns with CK inside the DRAM, ensuring reliable write data capture.

What is read DQS centering?

Read DQS centering positions the PHY's capture window so DQ is sampled at the center of the valid data eye. The DRAM drives DQS aligned to the center of each DQ bit period. The controller's PHY applies rdqs_delay to shift its capture strobe to fall exactly in the middle of the DQ valid window, maximizing setup and hold margins.

What is ZQ calibration and why does it matter?

ZQ calibration sets the output driver impedance of both DRAM and PHY to match the PCB transmission line (240 Ω external reference in HBM3). Mismatched impedance causes reflections on DQ/DQS that degrade signal integrity and increase bit-error rate. ZQ calibration runs at power-up (ZQCL, ~512 cycles) and periodically during idle windows (ZQCS, ~64 cycles).

What signals cross the synthesizable controller-to-PHY digital interface?

The digital interface carries: CA[7:0] and CKE (command/address), DQ_out[31:0] and DM[3:0] (write data), DQS_en (write DQS enable), WR_en (write qualifier), wdqs_delay[4:0] / rdqs_delay[4:0] (delay codes). The PHY returns DQ_in[31:0] (captured read data), DQS_valid, phy_init_done, and zq_cal_done.