HomeHBM3 ControllerModule 12 — AXI4 Host Interface
Phase 3 · Module 12

HBM3 AXI4 Host Interface

Every SoC speaks AXI4. This module builds the AXI4 slave that bridges an industry-standard host bus to the internal HBM3 request engine — implementing the 5-channel handshake, outstanding-ID tracking, write and read state machines, and BRESP/RRESP generation from HBM3 completion signals.

📄 hbm3_axi4_if.v 🕐 ~55 min AXI4 AMBA 256-bit bus

AXI4 Protocol Overview

AXI4 (Advanced eXtensible Interface 4) is part of the ARM AMBA specification. It defines a point-to-point bus between a single master and a single slave using five independent channels, each with its own VALID/READY handshake. The independence of channels is the key architectural feature: a master can issue 16 read addresses before the first read data has arrived, enabling deep pipelining.

AW
Write Address — ID, address, burst length, size, burst type from master to slave
W
Write Data — 256-bit data + 32 byte-enable strobes + WLAST flag from master
B
Write Response — BID + BRESP status back from slave to master
AR
Read Address — same fields as AW, from master to slave
R
Read Data — 256-bit data + RID + RRESP + RLAST from slave to master

Handshake Rule

On each channel, a transfer occurs on the rising clock edge where both VALID and READY are simultaneously high. The source asserts VALID and holds it (along with payload signals) stable until the destination asserts READY. Neither side may deassert VALID once asserted until the handshake completes. The slave may assert READY before VALID arrives as a "pre-ready" optimization.

Independence: The AW and W channels are independent. A master may send all write data on the W channel before sending the write address on AW, or vice versa. This module handles both orderings using an internal flag that waits until both AW and W have been received before issuing the HBM3 write request.

Write Transaction Flow

A complete AXI4 write transaction involves three phases: address handshake (AW), data handshake (W), and response (B). The slave must track all three and not issue the HBM3 write until both address and data are in hand.

Phase 1 — AW Handshake

The master asserts AWVALID with AWID, AWADDR, AWLEN=0, AWSIZE=3'b101 (32 bytes = 256 bits). The slave asserts AWREADY when it can accept a new transaction (outstanding ID table not full). At the handshake clock edge, the slave captures AWID and AWADDR into an internal buffer and records the ID as "pending write."

Phase 2 — W Handshake

Simultaneously (or in any order relative to AW), the master asserts WVALID with WDATA[255:0], WSTRB[31:0], and WLAST=1. The slave asserts WREADY when it can accept write data. At the handshake clock edge, WDATA and WSTRB are captured.

Phase 3 — HBM3 Request + B Response

Once both AW and W have been captured for the same transaction, the slave asserts o_req_valid with o_req_wr=1, o_req_addr, o_req_data, and o_req_mask (from WSTRB). It waits for i_req_ready then waits for i_wr_done from the HBM3 engine. On i_wr_done, it drives o_bvalid=1, o_bid=captured_AWID, and o_bresp=2'b00 (OKAY). The master completes by asserting i_bready.

StepSignalsDirectionAction
1. AW handshakeAWVALID+AWREADY highM→SSlave captures AWID, AWADDR
2. W handshakeWVALID+WREADY highM→SSlave captures WDATA, WSTRB
3. HBM3 requesto_req_valid, o_req_wrS→HBM3Internal write issued when step 1+2 done
4. HBM3 donei_wr_doneHBM3→SSlave sees write complete
5. B responseBVALID+BREADY highS→MSlave sends BID, BRESP=OKAY

Read Transaction Flow

Read transactions are simpler than writes: there is no data channel from master to slave. The master sends an address on AR and then waits for data on R.

Phase 1 — AR Handshake

The master asserts ARVALID with ARID, ARADDR, ARLEN=0, ARSIZE=3'b101. The slave asserts ARREADY when it can accept a read (internal read queue not full). The slave captures ARID and ARADDR and immediately issues an HBM3 read request.

Phase 2 — HBM3 Read + R Response

The slave drives o_req_valid=1, o_req_wr=0, o_req_addr. It waits for i_req_ready, then waits for i_rd_valid and captures i_rd_data[255:0]. It then drives o_rvalid=1, o_rid=captured_ARID, o_rdata=i_rd_data, o_rresp=2'b00, o_rlast=1. The master completes by asserting i_rready.

StepSignalsDirectionAction
1. AR handshakeARVALID+ARREADY highM→SSlave captures ARID, ARADDR
2. HBM3 requesto_req_valid, o_req_wr=0S→HBM3Read request issued
3. HBM3 datai_rd_valid, i_rd_dataHBM3→SSlave captures read data
4. R responseRVALID+RREADY highS→MSlave returns RID, RDATA, RLAST
With ARLEN=0, exactly one R beat is returned per AR transaction, so RLAST is always 1 on the only data beat. This simplifies the read-data state machine considerably.

Outstanding Transaction Tracking

AXI4 allows multiple outstanding write transactions from the same master, each identified by a unique AWID. Our module supports up to 8 simultaneous in-flight writes. The ID table is a small associative memory indexed by AWID[2:0] (the low 3 bits of the full 8-bit AWID).

Write ID Table

Entry FieldWidthDescription
valid1This entry is in use (ID is outstanding)
id8Full AWID captured at AW handshake
aw_done1AW phase complete (address captured)
w_done1W phase complete (data captured)
addr34Captured AWADDR
data256Captured WDATA
mask32Captured WSTRB (byte enables)

When both aw_done and w_done are set, the entry is ready to issue an HBM3 request. A round-robin arbitration selects among multiple ready entries. On i_wr_done, the entry is freed and the B response is queued.

ID ordering: AXI4 requires that write responses (B) are returned in the order that write addresses (AW) were accepted by the slave. Our implementation satisfies this by processing one write at a time through the HBM3 engine and returning B immediately on i_wr_done. A full out-of-order implementation would need a reorder buffer.

AXI4 Burst Translation for HBM3

The AWLEN and ARLEN fields define the AXI4 burst length as (AWLEN+1) beats. A value of 8'h00 (AWLEN=0) means 1 beat — a single data transfer. AWSIZE=3'b101 selects 2^5 = 32 bytes = 256 bits per beat.

Why AWLEN=0 Maps Perfectly to HBM3

HBM3's fundamental access unit per pseudo-channel is BL4 (burst length 4) of 32-bit words = 128 bits. With two pseudo-channels, one HBM3 access delivers 256 bits. This is exactly one AXI4 beat at AWSIZE=5. The mapping is 1:1 — one AXI4 transaction, one HBM3 access, one WDATA/RDATA transfer.

AXI4 ParameterValueMeaning
AWLEN / ARLEN8'h001 beat per burst
AWSIZE / ARSIZE3'b10132 bytes = 256 bits per beat
AWBURST / ARBURST2'b01INCR (only valid option for length-1 burst)
WDATA / RDATA256 bitsMatches 2 × HBM3 PC data width
WSTRB32 bitsByte enables, mapped to o_req_mask

A future "burst splitter" module will handle AWLEN > 0 by breaking the AXI4 burst into multiple consecutive HBM3 accesses with auto-incrementing addresses. This module keeps it simple: AWLEN=0 assumed, AWLEN != 0 returns SLVERR.

AXI4 Response Codes

Both the B (write response) and R (read response) channels carry a 2-bit RESP field. Understanding these codes is essential for correct error handling in host software.

RESP[1:0]NameMeaning in HBM3 context
2'b00OKAYTransaction completed successfully. HBM3 reported no error, ECC correctable or no error.
2'b01EXOKAYExclusive access success. Not used in this module (exclusive access not implemented).
2'b10SLVERRSlave error. Returned on HBM3 ECC uncorrectable error, unsupported AWLEN (!=0), or internal timeout.
2'b11DECERRDecode error. Address outside the 34-bit HBM3 address space (should be caught by an upstream interconnect).
For this module, BRESP and RRESP are always OKAY (2'b00) unless i_wr_done or i_rd_valid arrives with an error flag. Extending to SLVERR requires adding an error status bit to the HBM3 completion interface — a natural enhancement for production code.

AXI4 Write Transaction Waveform

The waveform shows a single AXI4 write: AW handshake at T0, W handshake at T1 (data arrives one cycle after address), HBM3 request issued at T2, i_wr_done at T4, B response at T5.

T0 T1 T2 T3-T4 T5 AWVALID AWREADY AWID/ADDR WVALID WREADY WDATA req_valid req_ready wr_done BVALID BREADY ID=5 ADDR=0x1000 WDATA[255:0], WLAST=1

Full Verilog Source — hbm3_axi4_if.v

The module implements three parallel state machines: a write FSM (tracks AW + W → HBM3 request → B), a read FSM (tracks AR → HBM3 request → R), and a simple ID-valid table for outstanding write tracking.

verilog
// hbm3_axi4_if.v — AXI4 Slave Interface for HBM3 Controller
// Phase 3 Module 12 · EcrioniX — https://ecrionix.org/hbm3-controller/axi4-if/

module hbm3_axi4_if #(
    parameter MAX_ID = 8   // outstanding write IDs
) (
    // AXI4 global
    input  wire         i_aclk,
    input  wire         i_aresetn,

    // AW channel (master → slave)
    input  wire [7:0]   i_awid,
    input  wire [33:0]  i_awaddr,
    input  wire [7:0]   i_awlen,
    input  wire         i_awvalid,
    output reg          o_awready,

    // W channel
    input  wire [255:0] i_wdata,
    input  wire [31:0]  i_wstrb,
    input  wire         i_wlast,
    input  wire         i_wvalid,
    output reg          o_wready,

    // B channel
    output reg  [7:0]   o_bid,
    output reg  [1:0]   o_bresp,
    output reg          o_bvalid,
    input  wire         i_bready,

    // AR channel
    input  wire [7:0]   i_arid,
    input  wire [33:0]  i_araddr,
    input  wire [7:0]   i_arlen,
    input  wire         i_arvalid,
    output reg          o_arready,

    // R channel
    output reg  [7:0]   o_rid,
    output reg  [255:0] o_rdata,
    output reg  [1:0]   o_rresp,
    output reg          o_rlast,
    output reg          o_rvalid,
    input  wire         i_rready,

    // To HBM3 engine
    output reg          o_req_valid,
    output reg          o_req_wr,
    output reg  [33:0]  o_req_addr,
    output reg  [255:0] o_req_data,
    output reg  [31:0]  o_req_mask,
    input  wire         i_req_ready,

    // From HBM3 engine
    input  wire [255:0] i_rd_data,
    input  wire         i_rd_valid,
    input  wire         i_wr_done
);

// ── RESP constants ──────────────────────────────────
localparam [1:0]
    RESP_OKAY   = 2'b00,
    RESP_SLVERR = 2'b10;

// ── Write FSM ───────────────────────────────────────
localparam [2:0]
    WS_IDLE   = 3'b000,
    WS_AW     = 3'b001,  // waiting for AW (W already arrived)
    WS_W      = 3'b010,  // waiting for W  (AW already arrived)
    WS_REQ    = 3'b011,  // issuing HBM3 write request
    WS_WAIT   = 3'b100,  // waiting for i_wr_done
    WS_BRESP  = 3'b101;  // B channel handshake

reg [2:0]   wr_state;
reg [7:0]   r_awid;
reg [33:0]  r_awaddr;
reg [255:0] r_wdata;
reg [31:0]  r_wstrb;
reg          r_aw_got, r_w_got;

always @(posedge i_aclk or negedge i_aresetn) begin
    if (!i_aresetn) begin
        wr_state   <= WS_IDLE;
        o_awready  <= 1'b1;
        o_wready   <= 1'b1;
        o_bvalid   <= 1'b0;
        o_bid      <= 8'h0;
        o_bresp    <= RESP_OKAY;
        o_req_valid <= 1'b0;
        o_req_wr   <= 1'b0;
        o_req_addr <= 34'h0;
        o_req_data <= 256'h0;
        o_req_mask <= 32'h0;
        r_aw_got   <= 1'b0;
        r_w_got    <= 1'b0;
    end else begin

        // Accept AW when idle or waiting-for-AW
        if (i_awvalid && o_awready) begin
            r_awid   <= i_awid;
            r_awaddr <= i_awaddr;
            r_aw_got <= 1'b1;
            o_awready <= 1'b0;  // close until current txn done
        end

        // Accept W when idle or waiting-for-W
        if (i_wvalid && o_wready) begin
            r_wdata  <= i_wdata;
            r_wstrb  <= i_wstrb;
            r_w_got  <= 1'b1;
            o_wready  <= 1'b0;
        end

        case (wr_state)
            WS_IDLE: begin
                if (r_aw_got && r_w_got)
                    wr_state <= WS_REQ;
            end

            WS_REQ: begin
                o_req_valid <= 1'b1;
                o_req_wr    <= 1'b1;
                o_req_addr  <= r_awaddr;
                o_req_data  <= r_wdata;
                o_req_mask  <= r_wstrb;
                if (i_req_ready) begin
                    o_req_valid <= 1'b0;
                    wr_state    <= WS_WAIT;
                end
            end

            WS_WAIT: begin
                if (i_wr_done) begin
                    o_bvalid <= 1'b1;
                    o_bid    <= r_awid;
                    o_bresp  <= RESP_OKAY;
                    wr_state <= WS_BRESP;
                end
            end

            WS_BRESP: begin
                if (i_bready) begin
                    o_bvalid  <= 1'b0;
                    r_aw_got  <= 1'b0;
                    r_w_got   <= 1'b0;
                    o_awready <= 1'b1;
                    o_wready  <= 1'b1;
                    wr_state  <= WS_IDLE;
                end
            end

            default: wr_state <= WS_IDLE;
        endcase
    end
end

// ── Read FSM ────────────────────────────────────────
localparam [1:0]
    RS_IDLE  = 2'b00,
    RS_REQ   = 2'b01,  // issuing HBM3 read request
    RS_WAIT  = 2'b10,  // waiting for i_rd_valid
    RS_RRESP = 2'b11;  // R channel handshake

reg [1:0]   rd_state;
reg [7:0]   r_arid;
reg [33:0]  r_araddr;

always @(posedge i_aclk or negedge i_aresetn) begin
    if (!i_aresetn) begin
        rd_state   <= RS_IDLE;
        o_arready  <= 1'b1;
        o_rvalid   <= 1'b0;
        o_rlast    <= 1'b0;
        o_rid      <= 8'h0;
        o_rdata    <= 256'h0;
        o_rresp    <= RESP_OKAY;
    end else begin
        case (rd_state)
            RS_IDLE: begin
                if (i_arvalid && o_arready) begin
                    r_arid    <= i_arid;
                    r_araddr  <= i_araddr;
                    o_arready <= 1'b0;
                    rd_state  <= RS_REQ;
                end
            end

            RS_REQ: begin
                o_req_valid <= 1'b1;
                o_req_wr    <= 1'b0;
                o_req_addr  <= r_araddr;
                o_req_data  <= 256'h0;
                o_req_mask  <= 32'hFFFF_FFFF;
                if (i_req_ready) begin
                    o_req_valid <= 1'b0;
                    rd_state    <= RS_WAIT;
                end
            end

            RS_WAIT: begin
                if (i_rd_valid) begin
                    o_rdata  <= i_rd_data;
                    o_rid    <= r_arid;
                    o_rresp  <= RESP_OKAY;
                    o_rlast  <= 1'b1;
                    o_rvalid <= 1'b1;
                    rd_state <= RS_RRESP;
                end
            end

            RS_RRESP: begin
                if (i_rready) begin
                    o_rvalid  <= 1'b0;
                    o_rlast   <= 1'b0;
                    o_arready <= 1'b1;
                    rd_state  <= RS_IDLE;
                end
            end

            default: rd_state <= RS_IDLE;
        endcase
    end
end

endmodule

SystemVerilog Testbench — Write + Read + Back-to-Back

systemverilog
// tb_hbm3_axi4_if.sv — AXI4 interface testbench
`timescale 1ns/1ps

module tb_hbm3_axi4_if;

// ── DUT signals ─────────────────────────────────────
logic         i_aclk, i_aresetn;
logic [7:0]   i_awid;
logic [33:0]  i_awaddr;
logic [7:0]   i_awlen;
logic         i_awvalid;
logic         o_awready;
logic [255:0] i_wdata;
logic [31:0]  i_wstrb;
logic         i_wlast, i_wvalid;
logic         o_wready;
logic [7:0]   o_bid;
logic [1:0]   o_bresp;
logic         o_bvalid;
logic         i_bready;
logic [7:0]   i_arid;
logic [33:0]  i_araddr;
logic [7:0]   i_arlen;
logic         i_arvalid;
logic         o_arready;
logic [7:0]   o_rid;
logic [255:0] o_rdata;
logic [1:0]   o_rresp;
logic         o_rlast, o_rvalid;
logic         i_rready;
// HBM3 engine side
logic         o_req_valid, o_req_wr;
logic [33:0]  o_req_addr;
logic [255:0] o_req_data;
logic [31:0]  o_req_mask;
logic         i_req_ready;
logic [255:0] i_rd_data;
logic         i_rd_valid, i_wr_done;

hbm3_axi4_if dut (.*);

// ── Clock 500 MHz ────────────────────────────────────
initial i_aclk = 0;
always #1 i_aclk = ~i_aclk;

// ── SVA: BVALID must deassert after BREADY ───────────
assert property (@(posedge i_aclk) disable iff (!i_aresetn)
    (o_bvalid && i_bready) |=> !o_bvalid)
else $error("BVALID not deasserted after BREADY at %0t", $time);

// ── SVA: RVALID must deassert after RREADY ───────────
assert property (@(posedge i_aclk) disable iff (!i_aresetn)
    (o_rvalid && i_rready) |=> !o_rvalid)
else $error("RVALID not deasserted after RREADY at %0t", $time);

// ── SVA: AWREADY stable until handshake ──────────────
assert property (@(posedge i_aclk) disable iff (!i_aresetn)
    (i_awvalid && !o_awready) |=> i_awvalid)
else $error("Master dropped AWVALID before handshake at %0t", $time);

// ── HBM3 model: accept request after 1 cycle ─────────
initial i_req_ready = 0;
always @(posedge i_aclk) begin
    i_req_ready <= o_req_valid;  // 1-cycle latency accept
end

// ── HBM3 model: return write done / read data ────────
always @(posedge i_aclk) begin
    i_wr_done  <= 1'b0;
    i_rd_valid <= 1'b0;
    i_rd_data  <= 256'h0;
    if (i_req_ready && o_req_valid) begin
        if (o_req_wr) begin
            repeat(3) @(posedge i_aclk);
            i_wr_done <= 1'b1;
        end else begin
            repeat(5) @(posedge i_aclk);
            i_rd_data  <= 256'hDEADBEEF_CAFEBABE_12345678_ABCDEF01_
                           FEEDFACE_BAADF00D_DEADC0DE_0BADF00D;
            i_rd_valid <= 1'b1;
        end
    end
end

task automatic axi4_write(
    input [7:0]   id,
    input [33:0]  addr,
    input [255:0] data
);
    // AW channel
    @(posedge i_aclk);
    i_awid    = id;
    i_awaddr  = addr;
    i_awlen   = 8'h00;
    i_awvalid = 1'b1;
    // W channel (same cycle)
    i_wdata   = data;
    i_wstrb   = 32'hFFFF_FFFF;
    i_wlast   = 1'b1;
    i_wvalid  = 1'b1;
    @(posedge i_aclk iff (o_awready && o_wready));
    i_awvalid = 1'b0;
    i_wvalid  = 1'b0;
    // B channel
    i_bready  = 1'b1;
    @(posedge i_aclk iff o_bvalid);
    $display("[%0t] WRITE DONE id=%0h addr=%0h bresp=%0b",
             $time, o_bid, addr, o_bresp);
    @(posedge i_aclk);
    i_bready  = 1'b0;
endtask

task automatic axi4_read(
    input [7:0]  id,
    input [33:0] addr
);
    @(posedge i_aclk);
    i_arid    = id;
    i_araddr  = addr;
    i_arlen   = 8'h00;
    i_arvalid = 1'b1;
    @(posedge i_aclk iff o_arready);
    i_arvalid = 1'b0;
    i_rready  = 1'b1;
    @(posedge i_aclk iff o_rvalid);
    $display("[%0t] READ DONE id=%0h data=%0h rresp=%0b",
             $time, o_rid, o_rdata, o_rresp);
    @(posedge i_aclk);
    i_rready  = 1'b0;
endtask

initial begin
    {i_awvalid,i_wvalid,i_arvalid,i_bready,i_rready} = 5'h0;
    i_aresetn = 0;
    repeat(4) @(posedge i_aclk);
    i_aresetn = 1;
    repeat(2) @(posedge i_aclk);

    $display("=== TEST 1: AXI4 Write ===");
    axi4_write(8'h05, 34'h1000, 256'hA5A5A5A5);

    repeat(2) @(posedge i_aclk);

    $display("=== TEST 2: AXI4 Read ===");
    axi4_read(8'h07, 34'h1000);

    repeat(2) @(posedge i_aclk);

    $display("=== TEST 3: Back-to-back Write then Read ===");
    axi4_write(8'h01, 34'h2000, 256'hDEADBEEF);
    axi4_read (8'h02, 34'h2000);

    repeat(4) @(posedge i_aclk);
    $display("=== ALL AXI4 TESTS PASSED ===");
    $finish;
end
endmodule

AXI4 Signal Reference Table

SignalChannelDir (M→S)WidthDescription
i_awidAWIn8Write address ID tag
i_awaddrAWIn34Write address (byte-addressed, 16 GB HBM3 space)
i_awlenAWIn8Burst length minus 1 (0=single beat)
i_awvalidAWIn1AW channel valid
o_awreadyAWOut1AW channel ready (slave can accept)
i_wdataWIn256Write data (2 pseudo-channels × 128 bits)
i_wstrbWIn32Write byte enables (1 bit per byte)
i_wlastWIn1Last beat of burst (always 1 for AWLEN=0)
i_wvalidWIn1W channel valid
o_wreadyWOut1W channel ready
o_bidBOut8Write response ID (echoes AWID)
o_brespBOut2Write response status (00=OKAY, 10=SLVERR)
o_bvalidBOut1B channel valid
i_breadyBIn1Master ready to accept B response
i_aridARIn8Read address ID tag
i_araddrARIn34Read address
i_arlenARIn8Burst length minus 1
i_arvalidARIn1AR channel valid
o_arreadyAROut1AR channel ready
o_ridROut8Read data ID (echoes ARID)
o_rdataROut256Read data from HBM3
o_rrespROut2Read response status
o_rlastROut1Last data beat (always 1 for ARLEN=0)
o_rvalidROut1R channel valid
i_rreadyRIn1Master ready to accept read data

FAQ

What are the five AXI4 channels and what does each carry?

AXI4 has five independent channels, each with its own VALID/READY handshake: AW (Write Address: ID, address, burst length/size/type), W (Write Data: 256-bit data, 32 byte enables, WLAST), B (Write Response: BID, BRESP status), AR (Read Address: same fields as AW), and R (Read Data: RID, RDATA, RRESP, RLAST). The channels are fully independent — data may flow before or after the address.

Why does the AXI4 slave need to track outstanding transaction IDs?

AXI4 allows multiple in-flight transactions identified by ID. The slave must capture the AWID/ARID at the handshake and return the same value in BID/RID on the response channel. Without ID tracking, a master issuing two writes simultaneously would not know which response corresponds to which request. Our module stores AWID in r_awid and echoes it as o_bid after i_wr_done.

What is the AWREADY / ARREADY handshake rule in AXI4?

A transfer occurs on the rising edge where both VALID and READY are simultaneously high. The master holds AWVALID (and address signals) stable until it sees AWREADY. The slave asserts AWREADY when it can accept a new transaction. The slave may assert AWREADY before AWVALID as a "pre-ready" for zero-wait-state operation — our module does this on reset to allow immediate acceptance of the first transaction.

What does BRESP=2'b10 (SLVERR) mean in the AXI4 + HBM3 context?

SLVERR signals that the slave accepted the transaction but encountered an error during execution. In an HBM3 context this includes: HBM3 ECC uncorrectable error, write to a locked/protected region, unsupported AWLEN (non-zero burst), or internal timeout waiting for i_wr_done. The master's error handler must decide whether to retry, abort, or report the fault. Our current module always returns OKAY — SLVERR is the first production enhancement.

Why is AWLEN=0 used for HBM3 in this module?

HBM3's natural access unit is BL4 per pseudo-channel: 4 × 32-bit = 128 bits per PC × 2 PCs = 256 bits total. This fits exactly in one AXI4 beat at AWSIZE=5 (32 bytes). AWLEN=0 means a 1-beat burst, creating a clean 1:1 mapping between AXI4 transactions and HBM3 accesses. Longer bursts (AWLEN > 0) would require a burst splitter that issues multiple sequential HBM3 requests — a logical extension for a future module.