The command/address bus is the single communication channel between your controller and each HBM3 pseudo-channel. This module builds the CA bus encoder FSM — 8-bit serial packets, 2-packet ACT sequences, CA parity generation, and CKE control — all per JEDEC JESD238.
Every HBM3 device contains two pseudo-channels (PC0 and PC1). Each pseudo-channel has its own independent 8-bit Command/Address (CA) bus plus a single parity bit. Unlike conventional SDRAM — which routes separate RAS, CAS, WE, and wide address lines — HBM3 encodes all command and address information into compact 8-bit packets transmitted on this narrow bus.
The CA bus is clocked by the differential CK/CK_n pair. Commands are latched on the rising edge of CK. One or two consecutive CA packets represent a complete command, depending on the command type. The controller must serialize multi-packet commands correctly and maintain correct CA timing relative to CK.
The narrow bus width is not a performance limitation — it is a deliberate TSV budget decision. An HBM3 stack routes hundreds of data signals through silicon vias. Assigning only 8 CA signals per pseudo-channel leaves the vast majority of TSVs available for the 128-bit data bus, delivering the high bandwidth HBM3 is designed for.
Comparing HBM3's CA bus to DDR5's address bus highlights how on-package integration changes controller architecture:
| Feature | HBM3 CA Bus | DDR5 Address Bus |
|---|---|---|
| Bus width | 8 bits (serial) | 14 bits A[13:0] (parallel) |
| Command encoding | 8-bit packets, 1 or 2 per command | RAS/CAS/WE + CA[5:0] encoding |
| Signaling | Single-ended (VSS referenced) | POD (Pseudo Open Drain) |
| Row address | 15 bits, split across 2 ACT packets | 17 bits, multiple CS activations |
| Clock input | Differential CK/CK_n | Differential CK_t/CK_c |
| Parity | Even parity, 1 bit per packet | Even parity across CA[13:0]+CS |
| Termination | On-Die Termination (ODT) via MRS | External + on-die termination |
| Routing medium | On-package (interposer/TSV) | PCB trace, 50–100 mm |
The single-ended CA bus works reliably in HBM3 because the signal path is measured in micrometers, not centimeters. Reflections that would destroy an 8-bit single-ended bus at DDR5 speeds on a PCB are negligible on silicon interposer.
The table below lists every HBM3 command, how many CA packets it occupies, the i_cmd_type encoding used by our controller, and the high-level packet layout. Bits marked R are reserved and must be driven low.
| Command | i_cmd_type | Packets | Packet 0 [7:0] | Packet 1 [7:0] |
|---|---|---|---|---|
| NOP | 3'b000 | 1 | {0,0,0,0,0,0,0,0} | — |
| ACTIVATE (ACT) | 3'b001 | 2 | {H, BA[1:0], BG[2:0], RA[14:10]} | {RA[9:0], R, R} |
| READ (RD) | 3'b010 | 1 | {C, BG[2:0], BA[1:0], C10, C[3:0]} | — |
| WRITE (WR) | 3'b011 | 1 | {C, BG[2:0], BA[1:0], C10, C[3:0]} | — |
| PRECHARGE (PRE) | 3'b100 | 1 | {0,0,BG[2:0],BA[1:0],AP} | — |
| REFRESH (REF) | 3'b101 | 1 | {0,0,0,BG[2:0],0,0} | — |
| MODE REG SET (MRS) | 3'b110 | 2 | {1,MA[6:0]} | {MO[7:0]} |
H bit: The ACT command is identified by H=1 in bit 7 of Packet 0. All other single-packet commands have H=0, allowing the DRAM to distinguish ACT from 1-packet commands in the first cycle. C bit in RD/WR identifies CAS commands (C=1). AP in PRE is the auto-precharge bit — set to precharge all banks.
The ACTIVATE command is the only mandatory 2-packet command that every application must issue repeatedly (once per row access). Understanding its exact encoding is critical.
An HBM3 pseudo-channel has 8 bank groups (BG[2:0] = 3 bits) and 4 banks per group (BA[1:0] = 2 bits), giving 32 independent banks. Each bank's row address is 15 bits wide (RA[14:0]). The total information content of an ACT command is:
H(1) + BG(3) + BA(2) + RA(15) = 21 bits
One 8-bit packet carries only 8 bits. Two packets carry 16 bits — still short. The solution: the H-bit itself is the command identifier, so it occupies bit 7 of packet 0 and the remaining 7 bits carry BG, BA, and the upper 5 bits of RA. Packet 1 carries RA[9:0] in 10 bits, leaving 2 bits as reserved.
| Packet | Bit 7 | Bit 6 | Bit 5 | Bit 4 | Bit 3 | Bit 2 | Bit 1 | Bit 0 |
|---|---|---|---|---|---|---|---|---|
| Packet 0 | H=1 | BA[1] | BA[0] | BG[2] | BG[1] | BG[0] | RA[14] | RA[13] |
| Packet 0 (cont.) | Note: RA[14:10] fits in bits [4:0] after H,BA,BG — shown split for clarity | |||||||
| Packet 1 | RA[9] | RA[8] | RA[7] | RA[6] | RA[5] | RA[4] | R | R |
Corrected full 8-bit breakdown per packet:
| Packet | CA[7:0] bit assignment |
|---|---|
| Packet 0 | CA[7]=H(1), CA[6]=BA[1], CA[5]=BA[0], CA[4]=BG[2], CA[3]=BG[1], CA[2]=BG[0], CA[1]=RA[14], CA[0]=RA[13] |
| Packet 1 | CA[7:6]=RA[12:11], CA[5:4]=RA[10:9], CA[3:2]=RA[8:7], CA[1]=RA[6], CA[0]=RA[5] ... RA[9:0] packed MSB-first, R,R at LSB |
Unlike a data bus where error correction (ECC) can absorb random bit flips, a corrupted command can cause destructive behavior — writing to the wrong address, corrupting a row that was never targeted. CA parity provides a first line of defense.
For each CA packet, the parity bit is defined as the XOR reduction of all eight CA bits:
parity = CA[7] ^ CA[6] ^ CA[5] ^ CA[4] ^ CA[3] ^ CA[2] ^ CA[1] ^ CA[0]
This makes the total number of '1' bits across CA[7:0] and the parity bit always even. The DRAM checks parity on every received packet and asserts the ALERT_n pin low if an error is detected. The controller must sample ALERT_n and take corrective action (typically a reset or re-initialization sequence).
For 2-packet commands like ACT and MRS, parity is computed independently for each packet. Packet 0 parity accompanies Packet 0 on the same clock cycle; Packet 1 parity accompanies Packet 1. There is no cumulative parity across packets.
o_ca_parity = ^o_ca_out; — the reduction XOR operator produces a single bit that is '1' when an odd number of input bits are '1', making the total count even.The Clock Enable (CKE) signal is a single bit per pseudo-channel that gates the DRAM's internal clock tree. It is one of the most powerful levers for power management in an HBM3 system.
| CKE Transition | Effect | Required Delay |
|---|---|---|
| High → Low (while idle) | Enter Power-Down Mode | tCPDED (min 4 ns) |
| High → Low (after self-refresh entry cmd) | Enter Self-Refresh | tCSSRE cycles after REFRESH cmd |
| Low → High | Exit Power-Down / Self-Refresh | tXP or tXS exit latency before next cmd |
| Continuously High | Normal Operation | — |
In our controller, i_cke is an input from the power manager and is registered and forwarded to o_cke_out. The CA bus encoder does not attempt to issue commands while CKE is deasserted — the upstream scheduler is responsible for draining the command queue before asserting power-down.
To enter self-refresh: (1) issue a REFRESH command with all banks precharged, (2) wait tCSSRE cycles, (3) deassert CKE. To exit: reassert CKE, wait tXS, then resume normal commands. The CKE output from this module is registered to prevent glitches on the PHY interface.
Even on a silicon interposer, the HBM3 CA bus must manage signal integrity. The single-ended 8-bit bus transitions at multi-GHz rates, and unterminated stubs on the interposer cause reflections that distort waveforms and increase setup/hold violation risk.
HBM3 solves this with On-Die Termination. The DRAM die contains programmable resistor networks — typically 40 Ω, 60 Ω, or 120 Ω — connected between each CA pin and a reference voltage (VDDQ/2 for center termination, or VSS for pull-down). These are configured via MRS commands during initialization.
The CA bus controller itself does not toggle ODT dynamically on a per-command basis (unlike DDR4/DDR5 where the host asserts an ODT pin). Instead, the ODT value for the CA bus is set once during the initialization MRS sequence and remains active throughout normal operation. Our hbm3_ca_bus module issues the MRS commands that program the ODT register when requested by the initialization FSM.
The waveform below shows a complete ACT command for BG=2, BA=1, Row=0x1A5F transmitted on the CA[7:0] bus. CKE remains high throughout. CA parity accompanies each packet on the same clock edge.
The module implements a 3-state FSM: IDLE, PKT0 (send first packet), and PKT1 (send second packet for 2-packet commands). Single-packet commands skip PKT1 and return directly to IDLE.
// hbm3_ca_bus.v — HBM3 CA Bus Command Encoder // JEDEC JESD238 compliant · Phase 3 Module 11 // EcrioniX — https://ecrionix.org/hbm3-controller/ca-bus/ module hbm3_ca_bus ( input wire i_clk, input wire i_rst_n, // Command interface input wire [2:0] i_cmd_type, // 000=NOP 001=ACT 010=RD 011=WR 100=PRE 101=REF 110=MRS input wire i_cmd_valid, // pulse: new command to send input wire [2:0] i_bg, // bank group input wire [1:0] i_ba, // bank address input wire [14:0] i_row, // row address input wire [4:0] i_col, // column address (C10,C[3:0]) input wire i_cke, // clock enable from power manager // PHY interface output reg [7:0] o_ca_out, // CA bus to PHY output wire o_ca_parity, // even parity over CA[7:0] output reg o_cke_out, // CKE to PHY (registered) output reg o_ca_valid, // CA bus data is valid this cycle output reg o_cmd_sent // pulses when full command serialized ); // ── FSM state encoding ────────────────────────────── localparam [1:0] S_IDLE = 2'b00, S_PKT0 = 2'b01, S_PKT1 = 2'b10; // ── Command type encoding ─────────────────────────── localparam [2:0] CMD_NOP = 3'b000, CMD_ACT = 3'b001, CMD_RD = 3'b010, CMD_WR = 3'b011, CMD_PRE = 3'b100, CMD_REF = 3'b101, CMD_MRS = 3'b110; reg [1:0] state, next_state; reg [2:0] r_cmd_type; reg [2:0] r_bg; reg [1:0] r_ba; reg [14:0] r_row; reg [4:0] r_col; wire needs_pkt1; // ── Parity: combinational XOR reduction ───────────── assign o_ca_parity = ^o_ca_out; // ── ACT and MRS need 2 packets ─────────────────────── assign needs_pkt1 = (r_cmd_type == CMD_ACT) || (r_cmd_type == CMD_MRS); // ── Input capture on command valid ────────────────── always @(posedge i_clk or negedge i_rst_n) begin if (!i_rst_n) begin r_cmd_type <= CMD_NOP; r_bg <= 3'b0; r_ba <= 2'b0; r_row <= 15'b0; r_col <= 5'b0; end else if (i_cmd_valid && state == S_IDLE) begin r_cmd_type <= i_cmd_type; r_bg <= i_bg; r_ba <= i_ba; r_row <= i_row; r_col <= i_col; end end // ── FSM state register ────────────────────────────── always @(posedge i_clk or negedge i_rst_n) begin if (!i_rst_n) state <= S_IDLE; else state <= next_state; end // ── FSM next-state logic ──────────────────────────── always @(*) begin next_state = state; case (state) S_IDLE: if (i_cmd_valid) next_state = S_PKT0; S_PKT0: next_state = needs_pkt1 ? S_PKT1 : S_IDLE; S_PKT1: next_state = S_IDLE; default: next_state = S_IDLE; endcase end // ── Output logic (registered) ──────────────────────── always @(posedge i_clk or negedge i_rst_n) begin if (!i_rst_n) begin o_ca_out <= 8'h00; o_cke_out <= 1'b1; o_ca_valid <= 1'b0; o_cmd_sent <= 1'b0; end else begin o_cke_out <= i_cke; o_cmd_sent <= 1'b0; // default: no pulse case (next_state) S_IDLE: begin o_ca_out <= 8'h00; // NOP on CA bus o_ca_valid <= 1'b0; if (state != S_IDLE) o_cmd_sent <= 1'b1; // command fully serialized end S_PKT0: begin o_ca_valid <= 1'b1; case (i_cmd_type) CMD_ACT: o_ca_out <= {1'b1, i_ba, i_bg, i_row[14:10]}; CMD_RD: o_ca_out <= {1'b1, i_bg, i_ba, i_col}; // C=1 CMD_WR: o_ca_out <= {1'b1, i_bg, i_ba, i_col}; // C=1 CMD_PRE: o_ca_out <= {2'b00, i_bg, i_ba, 1'b0}; CMD_REF: o_ca_out <= {2'b00, i_bg, 3'b000}; CMD_MRS: o_ca_out <= {1'b1, i_row[6:0]}; // MA[6:0] default: o_ca_out <= 8'h00; // NOP endcase end S_PKT1: begin o_ca_valid <= 1'b1; case (r_cmd_type) CMD_ACT: o_ca_out <= {r_row[9:0], 2'b00}; // RA[9:0], R, R CMD_MRS: o_ca_out <= r_col[4:0] | {3'b000, r_col}; // MO[7:0] via col reuse default: o_ca_out <= 8'h00; endcase end default: begin o_ca_out <= 8'h00; o_ca_valid <= 1'b0; end endcase end end endmodule
// tb_hbm3_ca_bus.sv — SystemVerilog testbench `timescale 1ns/1ps module tb_hbm3_ca_bus; logic i_clk, i_rst_n; logic [2:0] i_cmd_type; logic i_cmd_valid; logic [2:0] i_bg; logic [1:0] i_ba; logic [14:0] i_row; logic [4:0] i_col; logic i_cke; logic [7:0] o_ca_out; logic o_ca_parity; logic o_cke_out, o_ca_valid, o_cmd_sent; hbm3_ca_bus dut (.*); // ── Clock: 500 MHz (2 ns period) ───────────────────── initial i_clk = 0; always #1 i_clk = ~i_clk; // ── SVA: parity must always equal XOR reduction ────── assert property (@(posedge i_clk) disable iff (!i_rst_n) o_ca_parity == ^o_ca_out) else $error("PARITY MISMATCH at %0t", $time); // ── SVA: ca_valid must be low in IDLE (no spurious data) assert property (@(posedge i_clk) disable iff (!i_rst_n) (!i_cmd_valid && !o_cmd_sent) |=> ##1 !o_ca_valid) else $error("ca_valid spuriously asserted at %0t", $time); // ── SVA: ACT packet 0 must have bit7 = H = 1 ───────── assert property (@(posedge i_clk) disable iff (!i_rst_n) (i_cmd_valid && i_cmd_type == 3'b001) |=> o_ca_out[7] == 1'b1) else $error("ACT Packet0: H bit not set at %0t", $time); // ── SVA: cmd_sent must pulse exactly once per command ─ assert property (@(posedge i_clk) disable iff (!i_rst_n) $rose(o_cmd_sent) |=> !o_cmd_sent) else $error("cmd_sent held high beyond one cycle at %0t", $time); task automatic send_cmd( input [2:0] cmd, bg, ba, input [14:0] row, input [4:0] col ); @(posedge i_clk); i_cmd_type = cmd; i_bg = bg; i_ba = ba; i_row = row; i_col = col; i_cmd_valid = 1'b1; @(posedge i_clk); i_cmd_valid = 1'b0; // Wait for cmd_sent @(posedge o_cmd_sent); $display("[%0t] Command %0b sent. Final CA=%08b parity=%b", $time, cmd, o_ca_out, o_ca_parity); endtask initial begin i_rst_n = 0; i_cmd_valid = 0; i_cmd_type = 3'b000; i_bg = 3'b000; i_ba = 2'b00; i_row = 15'h0; i_col = 5'h0; i_cke = 1'b1; repeat(4) @(posedge i_clk); i_rst_n = 1; repeat(2) @(posedge i_clk); // Test 1: ACT command (2 packets) $display("=== TEST 1: ACT BG=2 BA=1 Row=15'h1A5F ==="); send_cmd(3'b001, 3'd2, 2'd1, 15'h1A5F, 5'h0); repeat(2) @(posedge i_clk); // Test 2: READ command (1 packet) $display("=== TEST 2: RD BG=2 BA=1 Col=5'h0F ==="); send_cmd(3'b010, 3'd2, 2'd1, 15'h0, 5'h0F); repeat(2) @(posedge i_clk); // Test 3: WRITE command $display("=== TEST 3: WR BG=0 BA=0 Col=5'h00 ==="); send_cmd(3'b011, 3'd0, 2'd0, 15'h0, 5'h00); repeat(2) @(posedge i_clk); // Test 4: PRECHARGE $display("=== TEST 4: PRE BG=2 BA=1 ==="); send_cmd(3'b100, 3'd2, 2'd1, 15'h0, 5'h0); repeat(2) @(posedge i_clk); // Test 5: REFRESH $display("=== TEST 5: REF ==="); send_cmd(3'b101, 3'd0, 2'd0, 15'h0, 5'h0); repeat(4) @(posedge i_clk); $display("=== ALL TESTS PASSED ==="); $finish; end endmodule
| Port | Dir | Width | Description |
|---|---|---|---|
| i_clk | In | 1 | Controller clock (e.g. 500 MHz CK rate) |
| i_rst_n | In | 1 | Active-low synchronous reset |
| i_cmd_type | In | 3 | Command select: 000=NOP, 001=ACT, 010=RD, 011=WR, 100=PRE, 101=REF, 110=MRS |
| i_cmd_valid | In | 1 | One-cycle pulse: latch and encode the command |
| i_bg | In | 3 | Bank group BG[2:0] |
| i_ba | In | 2 | Bank address BA[1:0] |
| i_row | In | 15 | Row address RA[14:0] (used by ACT and MRS MA[6:0]) |
| i_col | In | 5 | Column address C10+C[3:0] (used by RD/WR; MRS MO via col) |
| i_cke | In | 1 | Clock enable from power manager |
| o_ca_out | Out | 8 | CA bus byte to PHY, changes each cycle during serialization |
| o_ca_parity | Out | 1 | Even parity over o_ca_out[7:0]; combinational |
| o_cke_out | Out | 1 | Registered CKE to PHY, one-cycle delayed from i_cke |
| o_ca_valid | Out | 1 | High when o_ca_out contains valid command data |
| o_cmd_sent | Out | 1 | One-cycle pulse when full command (1 or 2 packets) has been serialized |
HBM3 stacks DRAM dies connected via Through-Silicon Vias (TSVs). Routing a wide parallel address bus through TSVs would consume too many vias, leaving fewer for the high-bandwidth data paths. The 8-bit serial CA bus minimizes TSV count while carrying all command and address information across one or two clock cycles.
CA parity is a single even-parity bit computed as the XOR reduction of all 8 CA bus bits: o_ca_parity = ^o_ca_out. It makes the total count of '1' bits across CA[7:0] plus parity always even. The DRAM verifies this and can signal an error via ALERT_n if a mismatch is detected.
HBM3 has a 15-bit row address per pseudo-channel plus 3-bit bank group and 2-bit bank select. Together with the H-bit that identifies ACT, that is 21 bits — far more than a single 8-bit packet. Two consecutive packets are needed: Packet 0 carries H, BA, BG, and RA[14:10]; Packet 1 carries RA[9:0] plus two reserved bits.
CKE (Clock Enable) controls the DRAM's clock receiver. When deasserted, the DRAM ignores CA bus transitions and enters power-down or self-refresh. When reasserted, the DRAM resumes accepting commands after the required tXP or tXS exit latency. Our module registers i_cke to o_cke_out to prevent PHY glitches.
On-Die Termination places programmable resistors inside the DRAM die to terminate CA bus signal lines. HBM3's single-ended CA bus would otherwise suffer reflections on the interposer. ODT is configured once via MRS commands during initialization — the CA bus controller itself issues those MRS packets, but does not toggle ODT dynamically per transaction.