HomeHBM3 ControllerModule 11 — CA Bus Controller
Phase 3 · Module 11

HBM3 CA Bus Controller

The command/address bus is the single communication channel between your controller and each HBM3 pseudo-channel. This module builds the CA bus encoder FSM — 8-bit serial packets, 2-packet ACT sequences, CA parity generation, and CKE control — all per JEDEC JESD238.

📄 hbm3_ca_bus.v 🕐 ~45 min JESD238 CA Bus Parity

What Is the HBM3 CA Bus?

Every HBM3 device contains two pseudo-channels (PC0 and PC1). Each pseudo-channel has its own independent 8-bit Command/Address (CA) bus plus a single parity bit. Unlike conventional SDRAM — which routes separate RAS, CAS, WE, and wide address lines — HBM3 encodes all command and address information into compact 8-bit packets transmitted on this narrow bus.

The CA bus is clocked by the differential CK/CK_n pair. Commands are latched on the rising edge of CK. One or two consecutive CA packets represent a complete command, depending on the command type. The controller must serialize multi-packet commands correctly and maintain correct CA timing relative to CK.

The narrow bus width is not a performance limitation — it is a deliberate TSV budget decision. An HBM3 stack routes hundreds of data signals through silicon vias. Assigning only 8 CA signals per pseudo-channel leaves the vast majority of TSVs available for the 128-bit data bus, delivering the high bandwidth HBM3 is designed for.

Key numbers: 8-bit CA bus per pseudo-channel · 1 or 2 packets per command · Single-ended signaling · 1 parity bit per packet · Clocked by CK/CK_n differential pair.

CA Bus vs DDR5 Address Bus

Comparing HBM3's CA bus to DDR5's address bus highlights how on-package integration changes controller architecture:

FeatureHBM3 CA BusDDR5 Address Bus
Bus width8 bits (serial)14 bits A[13:0] (parallel)
Command encoding8-bit packets, 1 or 2 per commandRAS/CAS/WE + CA[5:0] encoding
SignalingSingle-ended (VSS referenced)POD (Pseudo Open Drain)
Row address15 bits, split across 2 ACT packets17 bits, multiple CS activations
Clock inputDifferential CK/CK_nDifferential CK_t/CK_c
ParityEven parity, 1 bit per packetEven parity across CA[13:0]+CS
TerminationOn-Die Termination (ODT) via MRSExternal + on-die termination
Routing mediumOn-package (interposer/TSV)PCB trace, 50–100 mm

The single-ended CA bus works reliably in HBM3 because the signal path is measured in micrometers, not centimeters. Reflections that would destroy an 8-bit single-ended bus at DDR5 speeds on a PCB are negligible on silicon interposer.

Command Encoding Reference

The table below lists every HBM3 command, how many CA packets it occupies, the i_cmd_type encoding used by our controller, and the high-level packet layout. Bits marked R are reserved and must be driven low.

Commandi_cmd_typePacketsPacket 0 [7:0]Packet 1 [7:0]
NOP3'b0001{0,0,0,0,0,0,0,0}
ACTIVATE (ACT)3'b0012{H, BA[1:0], BG[2:0], RA[14:10]}{RA[9:0], R, R}
READ (RD)3'b0101{C, BG[2:0], BA[1:0], C10, C[3:0]}
WRITE (WR)3'b0111{C, BG[2:0], BA[1:0], C10, C[3:0]}
PRECHARGE (PRE)3'b1001{0,0,BG[2:0],BA[1:0],AP}
REFRESH (REF)3'b1011{0,0,0,BG[2:0],0,0}
MODE REG SET (MRS)3'b1102{1,MA[6:0]}{MO[7:0]}

H bit: The ACT command is identified by H=1 in bit 7 of Packet 0. All other single-packet commands have H=0, allowing the DRAM to distinguish ACT from 1-packet commands in the first cycle. C bit in RD/WR identifies CAS commands (C=1). AP in PRE is the auto-precharge bit — set to precharge all banks.

Timing rule: Packet 0 and Packet 1 of a 2-packet command must be placed on consecutive rising edges of CK. No NOP or any other command may be inserted between them.

ACT Command — 2-Packet Deep Dive

The ACTIVATE command is the only mandatory 2-packet command that every application must issue repeatedly (once per row access). Understanding its exact encoding is critical.

Why Two Packets?

An HBM3 pseudo-channel has 8 bank groups (BG[2:0] = 3 bits) and 4 banks per group (BA[1:0] = 2 bits), giving 32 independent banks. Each bank's row address is 15 bits wide (RA[14:0]). The total information content of an ACT command is:

H(1) + BG(3) + BA(2) + RA(15) = 21 bits

One 8-bit packet carries only 8 bits. Two packets carry 16 bits — still short. The solution: the H-bit itself is the command identifier, so it occupies bit 7 of packet 0 and the remaining 7 bits carry BG, BA, and the upper 5 bits of RA. Packet 1 carries RA[9:0] in 10 bits, leaving 2 bits as reserved.

Packet Layout

PacketBit 7Bit 6Bit 5Bit 4Bit 3Bit 2Bit 1Bit 0
Packet 0H=1BA[1]BA[0]BG[2]BG[1]BG[0]RA[14]RA[13]
Packet 0 (cont.)Note: RA[14:10] fits in bits [4:0] after H,BA,BG — shown split for clarity
Packet 1RA[9]RA[8]RA[7]RA[6]RA[5]RA[4]RR

Corrected full 8-bit breakdown per packet:

PacketCA[7:0] bit assignment
Packet 0CA[7]=H(1), CA[6]=BA[1], CA[5]=BA[0], CA[4]=BG[2], CA[3]=BG[1], CA[2]=BG[0], CA[1]=RA[14], CA[0]=RA[13]
Packet 1CA[7:6]=RA[12:11], CA[5:4]=RA[10:9], CA[3:2]=RA[8:7], CA[1]=RA[6], CA[0]=RA[5] ... RA[9:0] packed MSB-first, R,R at LSB
Simplified representation: The Verilog below encodes ACT as Packet0={1'b1, i_ba, i_bg, i_row[14:10]} and Packet1={i_row[9:0], 2'b00} — matching the JEDEC JESD238 bitfield definition directly.

CA Parity — Error Detection on the Command Bus

Unlike a data bus where error correction (ECC) can absorb random bit flips, a corrupted command can cause destructive behavior — writing to the wrong address, corrupting a row that was never targeted. CA parity provides a first line of defense.

Even Parity Definition

For each CA packet, the parity bit is defined as the XOR reduction of all eight CA bits:

parity = CA[7] ^ CA[6] ^ CA[5] ^ CA[4] ^ CA[3] ^ CA[2] ^ CA[1] ^ CA[0]

This makes the total number of '1' bits across CA[7:0] and the parity bit always even. The DRAM checks parity on every received packet and asserts the ALERT_n pin low if an error is detected. The controller must sample ALERT_n and take corrective action (typically a reset or re-initialization sequence).

Parity in Multi-Packet Commands

For 2-packet commands like ACT and MRS, parity is computed independently for each packet. Packet 0 parity accompanies Packet 0 on the same clock cycle; Packet 1 parity accompanies Packet 1. There is no cumulative parity across packets.

In Verilog, even parity is simply: o_ca_parity = ^o_ca_out; — the reduction XOR operator produces a single bit that is '1' when an odd number of input bits are '1', making the total count even.

CKE Control and Power Management

The Clock Enable (CKE) signal is a single bit per pseudo-channel that gates the DRAM's internal clock tree. It is one of the most powerful levers for power management in an HBM3 system.

CKE States

CKE TransitionEffectRequired Delay
High → Low (while idle)Enter Power-Down ModetCPDED (min 4 ns)
High → Low (after self-refresh entry cmd)Enter Self-RefreshtCSSRE cycles after REFRESH cmd
Low → HighExit Power-Down / Self-RefreshtXP or tXS exit latency before next cmd
Continuously HighNormal Operation

In our controller, i_cke is an input from the power manager and is registered and forwarded to o_cke_out. The CA bus encoder does not attempt to issue commands while CKE is deasserted — the upstream scheduler is responsible for draining the command queue before asserting power-down.

Self-Refresh Entry

To enter self-refresh: (1) issue a REFRESH command with all banks precharged, (2) wait tCSSRE cycles, (3) deassert CKE. To exit: reassert CKE, wait tXS, then resume normal commands. The CKE output from this module is registered to prevent glitches on the PHY interface.

On-Die Termination (ODT) for Signal Integrity

Even on a silicon interposer, the HBM3 CA bus must manage signal integrity. The single-ended 8-bit bus transitions at multi-GHz rates, and unterminated stubs on the interposer cause reflections that distort waveforms and increase setup/hold violation risk.

HBM3 solves this with On-Die Termination. The DRAM die contains programmable resistor networks — typically 40 Ω, 60 Ω, or 120 Ω — connected between each CA pin and a reference voltage (VDDQ/2 for center termination, or VSS for pull-down). These are configured via MRS commands during initialization.

Controller Responsibility

The CA bus controller itself does not toggle ODT dynamically on a per-command basis (unlike DDR4/DDR5 where the host asserts an ODT pin). Instead, the ODT value for the CA bus is set once during the initialization MRS sequence and remains active throughout normal operation. Our hbm3_ca_bus module issues the MRS commands that program the ODT register when requested by the initialization FSM.

ODT for the data bus (DQ) is handled separately by the PHY and data scheduler — not by the CA bus controller module described here.

ACT Command Waveform — 2-Packet Sequence

The waveform below shows a complete ACT command for BG=2, BA=1, Row=0x1A5F transmitted on the CA[7:0] bus. CKE remains high throughout. CA parity accompanies each packet on the same clock edge.

T0 T1 T2 T3 CK CKE CA[7:0] PARITY ca_valid ACT Packet 0 8'hDA (H=1,BA=1,BG=2,RA[14:10]=5'h1A) ACT Packet 1 8'h97 (RA[9:0]=10'h25F, R,R=00) NOP P=^8'hDA=1 P=^8'h97=0 cmd_sent

Full Verilog Source — hbm3_ca_bus.v

The module implements a 3-state FSM: IDLE, PKT0 (send first packet), and PKT1 (send second packet for 2-packet commands). Single-packet commands skip PKT1 and return directly to IDLE.

verilog
// hbm3_ca_bus.v — HBM3 CA Bus Command Encoder
// JEDEC JESD238 compliant · Phase 3 Module 11
// EcrioniX — https://ecrionix.org/hbm3-controller/ca-bus/

module hbm3_ca_bus (
    input  wire        i_clk,
    input  wire        i_rst_n,

    // Command interface
    input  wire [2:0]  i_cmd_type,   // 000=NOP 001=ACT 010=RD 011=WR 100=PRE 101=REF 110=MRS
    input  wire        i_cmd_valid,  // pulse: new command to send
    input  wire [2:0]  i_bg,         // bank group
    input  wire [1:0]  i_ba,         // bank address
    input  wire [14:0] i_row,        // row address
    input  wire [4:0]  i_col,        // column address (C10,C[3:0])
    input  wire        i_cke,        // clock enable from power manager

    // PHY interface
    output reg  [7:0]  o_ca_out,     // CA bus to PHY
    output wire        o_ca_parity,  // even parity over CA[7:0]
    output reg         o_cke_out,    // CKE to PHY (registered)
    output reg         o_ca_valid,   // CA bus data is valid this cycle
    output reg         o_cmd_sent    // pulses when full command serialized
);

// ── FSM state encoding ──────────────────────────────
localparam [1:0]
    S_IDLE = 2'b00,
    S_PKT0 = 2'b01,
    S_PKT1 = 2'b10;

// ── Command type encoding ───────────────────────────
localparam [2:0]
    CMD_NOP = 3'b000,
    CMD_ACT = 3'b001,
    CMD_RD  = 3'b010,
    CMD_WR  = 3'b011,
    CMD_PRE = 3'b100,
    CMD_REF = 3'b101,
    CMD_MRS = 3'b110;

reg [1:0]  state, next_state;
reg [2:0]  r_cmd_type;
reg [2:0]  r_bg;
reg [1:0]  r_ba;
reg [14:0] r_row;
reg [4:0]  r_col;
wire       needs_pkt1;

// ── Parity: combinational XOR reduction ─────────────
assign o_ca_parity = ^o_ca_out;

// ── ACT and MRS need 2 packets ───────────────────────
assign needs_pkt1 = (r_cmd_type == CMD_ACT) || (r_cmd_type == CMD_MRS);

// ── Input capture on command valid ──────────────────
always @(posedge i_clk or negedge i_rst_n) begin
    if (!i_rst_n) begin
        r_cmd_type <= CMD_NOP;
        r_bg       <= 3'b0;
        r_ba       <= 2'b0;
        r_row      <= 15'b0;
        r_col      <= 5'b0;
    end else if (i_cmd_valid && state == S_IDLE) begin
        r_cmd_type <= i_cmd_type;
        r_bg       <= i_bg;
        r_ba       <= i_ba;
        r_row      <= i_row;
        r_col      <= i_col;
    end
end

// ── FSM state register ──────────────────────────────
always @(posedge i_clk or negedge i_rst_n) begin
    if (!i_rst_n)
        state <= S_IDLE;
    else
        state <= next_state;
end

// ── FSM next-state logic ────────────────────────────
always @(*) begin
    next_state = state;
    case (state)
        S_IDLE: if (i_cmd_valid) next_state = S_PKT0;
        S_PKT0: next_state = needs_pkt1 ? S_PKT1 : S_IDLE;
        S_PKT1: next_state = S_IDLE;
        default: next_state = S_IDLE;
    endcase
end

// ── Output logic (registered) ────────────────────────
always @(posedge i_clk or negedge i_rst_n) begin
    if (!i_rst_n) begin
        o_ca_out  <= 8'h00;
        o_cke_out <= 1'b1;
        o_ca_valid <= 1'b0;
        o_cmd_sent <= 1'b0;
    end else begin
        o_cke_out  <= i_cke;
        o_cmd_sent <= 1'b0;  // default: no pulse

        case (next_state)
            S_IDLE: begin
                o_ca_out   <= 8'h00;  // NOP on CA bus
                o_ca_valid <= 1'b0;
                if (state != S_IDLE)
                    o_cmd_sent <= 1'b1;  // command fully serialized
            end

            S_PKT0: begin
                o_ca_valid <= 1'b1;
                case (i_cmd_type)
                    CMD_ACT: o_ca_out <= {1'b1, i_ba, i_bg, i_row[14:10]};
                    CMD_RD:  o_ca_out <= {1'b1, i_bg, i_ba, i_col};  // C=1
                    CMD_WR:  o_ca_out <= {1'b1, i_bg, i_ba, i_col};  // C=1
                    CMD_PRE: o_ca_out <= {2'b00, i_bg, i_ba, 1'b0};
                    CMD_REF: o_ca_out <= {2'b00, i_bg, 3'b000};
                    CMD_MRS: o_ca_out <= {1'b1, i_row[6:0]};  // MA[6:0]
                    default: o_ca_out <= 8'h00;  // NOP
                endcase
            end

            S_PKT1: begin
                o_ca_valid <= 1'b1;
                case (r_cmd_type)
                    CMD_ACT: o_ca_out <= {r_row[9:0], 2'b00};  // RA[9:0], R, R
                    CMD_MRS: o_ca_out <= r_col[4:0] | {3'b000, r_col}; // MO[7:0] via col reuse
                    default: o_ca_out <= 8'h00;
                endcase
            end

            default: begin
                o_ca_out   <= 8'h00;
                o_ca_valid <= 1'b0;
            end
        endcase
    end
end

endmodule

SystemVerilog Testbench + SVA Assertions

systemverilog
// tb_hbm3_ca_bus.sv — SystemVerilog testbench
`timescale 1ns/1ps

module tb_hbm3_ca_bus;

logic        i_clk, i_rst_n;
logic [2:0]  i_cmd_type;
logic        i_cmd_valid;
logic [2:0]  i_bg;
logic [1:0]  i_ba;
logic [14:0] i_row;
logic [4:0]  i_col;
logic        i_cke;
logic [7:0]  o_ca_out;
logic        o_ca_parity;
logic        o_cke_out, o_ca_valid, o_cmd_sent;

hbm3_ca_bus dut (.*);

// ── Clock: 500 MHz (2 ns period) ─────────────────────
initial i_clk = 0;
always #1 i_clk = ~i_clk;

// ── SVA: parity must always equal XOR reduction ──────
assert property (@(posedge i_clk) disable iff (!i_rst_n)
    o_ca_parity == ^o_ca_out)
else $error("PARITY MISMATCH at %0t", $time);

// ── SVA: ca_valid must be low in IDLE (no spurious data)
assert property (@(posedge i_clk) disable iff (!i_rst_n)
    (!i_cmd_valid && !o_cmd_sent) |=> ##1 !o_ca_valid)
else $error("ca_valid spuriously asserted at %0t", $time);

// ── SVA: ACT packet 0 must have bit7 = H = 1 ─────────
assert property (@(posedge i_clk) disable iff (!i_rst_n)
    (i_cmd_valid && i_cmd_type == 3'b001)
    |=> o_ca_out[7] == 1'b1)
else $error("ACT Packet0: H bit not set at %0t", $time);

// ── SVA: cmd_sent must pulse exactly once per command ─
assert property (@(posedge i_clk) disable iff (!i_rst_n)
    $rose(o_cmd_sent) |=> !o_cmd_sent)
else $error("cmd_sent held high beyond one cycle at %0t", $time);

task automatic send_cmd(
    input [2:0] cmd, bg, ba,
    input [14:0] row,
    input [4:0] col
);
    @(posedge i_clk);
    i_cmd_type  = cmd;
    i_bg        = bg;
    i_ba        = ba;
    i_row       = row;
    i_col       = col;
    i_cmd_valid = 1'b1;
    @(posedge i_clk);
    i_cmd_valid = 1'b0;
    // Wait for cmd_sent
    @(posedge o_cmd_sent);
    $display("[%0t] Command %0b sent. Final CA=%08b parity=%b",
             $time, cmd, o_ca_out, o_ca_parity);
endtask

initial begin
    i_rst_n     = 0;
    i_cmd_valid = 0;
    i_cmd_type  = 3'b000;
    i_bg        = 3'b000;
    i_ba        = 2'b00;
    i_row       = 15'h0;
    i_col       = 5'h0;
    i_cke       = 1'b1;
    repeat(4) @(posedge i_clk);
    i_rst_n = 1;
    repeat(2) @(posedge i_clk);

    // Test 1: ACT command (2 packets)
    $display("=== TEST 1: ACT BG=2 BA=1 Row=15'h1A5F ===");
    send_cmd(3'b001, 3'd2, 2'd1, 15'h1A5F, 5'h0);

    repeat(2) @(posedge i_clk);

    // Test 2: READ command (1 packet)
    $display("=== TEST 2: RD BG=2 BA=1 Col=5'h0F ===");
    send_cmd(3'b010, 3'd2, 2'd1, 15'h0, 5'h0F);

    repeat(2) @(posedge i_clk);

    // Test 3: WRITE command
    $display("=== TEST 3: WR BG=0 BA=0 Col=5'h00 ===");
    send_cmd(3'b011, 3'd0, 2'd0, 15'h0, 5'h00);

    repeat(2) @(posedge i_clk);

    // Test 4: PRECHARGE
    $display("=== TEST 4: PRE BG=2 BA=1 ===");
    send_cmd(3'b100, 3'd2, 2'd1, 15'h0, 5'h0);

    repeat(2) @(posedge i_clk);

    // Test 5: REFRESH
    $display("=== TEST 5: REF ===");
    send_cmd(3'b101, 3'd0, 2'd0, 15'h0, 5'h0);

    repeat(4) @(posedge i_clk);
    $display("=== ALL TESTS PASSED ===");
    $finish;
end
endmodule

Port Reference Table

PortDirWidthDescription
i_clkIn1Controller clock (e.g. 500 MHz CK rate)
i_rst_nIn1Active-low synchronous reset
i_cmd_typeIn3Command select: 000=NOP, 001=ACT, 010=RD, 011=WR, 100=PRE, 101=REF, 110=MRS
i_cmd_validIn1One-cycle pulse: latch and encode the command
i_bgIn3Bank group BG[2:0]
i_baIn2Bank address BA[1:0]
i_rowIn15Row address RA[14:0] (used by ACT and MRS MA[6:0])
i_colIn5Column address C10+C[3:0] (used by RD/WR; MRS MO via col)
i_ckeIn1Clock enable from power manager
o_ca_outOut8CA bus byte to PHY, changes each cycle during serialization
o_ca_parityOut1Even parity over o_ca_out[7:0]; combinational
o_cke_outOut1Registered CKE to PHY, one-cycle delayed from i_cke
o_ca_validOut1High when o_ca_out contains valid command data
o_cmd_sentOut1One-cycle pulse when full command (1 or 2 packets) has been serialized

FAQ

Why does HBM3 use an 8-bit serial CA bus instead of a wide parallel address bus?

HBM3 stacks DRAM dies connected via Through-Silicon Vias (TSVs). Routing a wide parallel address bus through TSVs would consume too many vias, leaving fewer for the high-bandwidth data paths. The 8-bit serial CA bus minimizes TSV count while carrying all command and address information across one or two clock cycles.

What is CA parity in HBM3 and how is it computed?

CA parity is a single even-parity bit computed as the XOR reduction of all 8 CA bus bits: o_ca_parity = ^o_ca_out. It makes the total count of '1' bits across CA[7:0] plus parity always even. The DRAM verifies this and can signal an error via ALERT_n if a mismatch is detected.

Why does the ACTIVATE command need two CA bus packets?

HBM3 has a 15-bit row address per pseudo-channel plus 3-bit bank group and 2-bit bank select. Together with the H-bit that identifies ACT, that is 21 bits — far more than a single 8-bit packet. Two consecutive packets are needed: Packet 0 carries H, BA, BG, and RA[14:10]; Packet 1 carries RA[9:0] plus two reserved bits.

What does CKE control in HBM3?

CKE (Clock Enable) controls the DRAM's clock receiver. When deasserted, the DRAM ignores CA bus transitions and enters power-down or self-refresh. When reasserted, the DRAM resumes accepting commands after the required tXP or tXS exit latency. Our module registers i_cke to o_cke_out to prevent PHY glitches.

What is ODT in the context of the HBM3 CA bus?

On-Die Termination places programmable resistors inside the DRAM die to terminate CA bus signal lines. HBM3's single-ended CA bus would otherwise suffer reflections on the interposer. ODT is configured once via MRS commands during initialization — the CA bus controller itself issues those MRS packets, but does not toggle ODT dynamically per transaction.