Every SoC speaks AXI4. This module builds the AXI4 slave that bridges an industry-standard host bus to the internal HBM3 request engine — implementing the 5-channel handshake, outstanding-ID tracking, write and read state machines, and BRESP/RRESP generation from HBM3 completion signals.
AXI4 (Advanced eXtensible Interface 4) is part of the ARM AMBA specification. It defines a point-to-point bus between a single master and a single slave using five independent channels, each with its own VALID/READY handshake. The independence of channels is the key architectural feature: a master can issue 16 read addresses before the first read data has arrived, enabling deep pipelining.
On each channel, a transfer occurs on the rising clock edge where both VALID and READY are simultaneously high. The source asserts VALID and holds it (along with payload signals) stable until the destination asserts READY. Neither side may deassert VALID once asserted until the handshake completes. The slave may assert READY before VALID arrives as a "pre-ready" optimization.
A complete AXI4 write transaction involves three phases: address handshake (AW), data handshake (W), and response (B). The slave must track all three and not issue the HBM3 write until both address and data are in hand.
The master asserts AWVALID with AWID, AWADDR, AWLEN=0, AWSIZE=3'b101 (32 bytes = 256 bits). The slave asserts AWREADY when it can accept a new transaction (outstanding ID table not full). At the handshake clock edge, the slave captures AWID and AWADDR into an internal buffer and records the ID as "pending write."
Simultaneously (or in any order relative to AW), the master asserts WVALID with WDATA[255:0], WSTRB[31:0], and WLAST=1. The slave asserts WREADY when it can accept write data. At the handshake clock edge, WDATA and WSTRB are captured.
Once both AW and W have been captured for the same transaction, the slave asserts o_req_valid with o_req_wr=1, o_req_addr, o_req_data, and o_req_mask (from WSTRB). It waits for i_req_ready then waits for i_wr_done from the HBM3 engine. On i_wr_done, it drives o_bvalid=1, o_bid=captured_AWID, and o_bresp=2'b00 (OKAY). The master completes by asserting i_bready.
| Step | Signals | Direction | Action |
|---|---|---|---|
| 1. AW handshake | AWVALID+AWREADY high | M→S | Slave captures AWID, AWADDR |
| 2. W handshake | WVALID+WREADY high | M→S | Slave captures WDATA, WSTRB |
| 3. HBM3 request | o_req_valid, o_req_wr | S→HBM3 | Internal write issued when step 1+2 done |
| 4. HBM3 done | i_wr_done | HBM3→S | Slave sees write complete |
| 5. B response | BVALID+BREADY high | S→M | Slave sends BID, BRESP=OKAY |
Read transactions are simpler than writes: there is no data channel from master to slave. The master sends an address on AR and then waits for data on R.
The master asserts ARVALID with ARID, ARADDR, ARLEN=0, ARSIZE=3'b101. The slave asserts ARREADY when it can accept a read (internal read queue not full). The slave captures ARID and ARADDR and immediately issues an HBM3 read request.
The slave drives o_req_valid=1, o_req_wr=0, o_req_addr. It waits for i_req_ready, then waits for i_rd_valid and captures i_rd_data[255:0]. It then drives o_rvalid=1, o_rid=captured_ARID, o_rdata=i_rd_data, o_rresp=2'b00, o_rlast=1. The master completes by asserting i_rready.
| Step | Signals | Direction | Action |
|---|---|---|---|
| 1. AR handshake | ARVALID+ARREADY high | M→S | Slave captures ARID, ARADDR |
| 2. HBM3 request | o_req_valid, o_req_wr=0 | S→HBM3 | Read request issued |
| 3. HBM3 data | i_rd_valid, i_rd_data | HBM3→S | Slave captures read data |
| 4. R response | RVALID+RREADY high | S→M | Slave returns RID, RDATA, RLAST |
AXI4 allows multiple outstanding write transactions from the same master, each identified by a unique AWID. Our module supports up to 8 simultaneous in-flight writes. The ID table is a small associative memory indexed by AWID[2:0] (the low 3 bits of the full 8-bit AWID).
| Entry Field | Width | Description |
|---|---|---|
| valid | 1 | This entry is in use (ID is outstanding) |
| id | 8 | Full AWID captured at AW handshake |
| aw_done | 1 | AW phase complete (address captured) |
| w_done | 1 | W phase complete (data captured) |
| addr | 34 | Captured AWADDR |
| data | 256 | Captured WDATA |
| mask | 32 | Captured WSTRB (byte enables) |
When both aw_done and w_done are set, the entry is ready to issue an HBM3 request. A round-robin arbitration selects among multiple ready entries. On i_wr_done, the entry is freed and the B response is queued.
The AWLEN and ARLEN fields define the AXI4 burst length as (AWLEN+1) beats. A value of 8'h00 (AWLEN=0) means 1 beat — a single data transfer. AWSIZE=3'b101 selects 2^5 = 32 bytes = 256 bits per beat.
HBM3's fundamental access unit per pseudo-channel is BL4 (burst length 4) of 32-bit words = 128 bits. With two pseudo-channels, one HBM3 access delivers 256 bits. This is exactly one AXI4 beat at AWSIZE=5. The mapping is 1:1 — one AXI4 transaction, one HBM3 access, one WDATA/RDATA transfer.
| AXI4 Parameter | Value | Meaning |
|---|---|---|
| AWLEN / ARLEN | 8'h00 | 1 beat per burst |
| AWSIZE / ARSIZE | 3'b101 | 32 bytes = 256 bits per beat |
| AWBURST / ARBURST | 2'b01 | INCR (only valid option for length-1 burst) |
| WDATA / RDATA | 256 bits | Matches 2 × HBM3 PC data width |
| WSTRB | 32 bits | Byte enables, mapped to o_req_mask |
A future "burst splitter" module will handle AWLEN > 0 by breaking the AXI4 burst into multiple consecutive HBM3 accesses with auto-incrementing addresses. This module keeps it simple: AWLEN=0 assumed, AWLEN != 0 returns SLVERR.
Both the B (write response) and R (read response) channels carry a 2-bit RESP field. Understanding these codes is essential for correct error handling in host software.
| RESP[1:0] | Name | Meaning in HBM3 context |
|---|---|---|
| 2'b00 | OKAY | Transaction completed successfully. HBM3 reported no error, ECC correctable or no error. |
| 2'b01 | EXOKAY | Exclusive access success. Not used in this module (exclusive access not implemented). |
| 2'b10 | SLVERR | Slave error. Returned on HBM3 ECC uncorrectable error, unsupported AWLEN (!=0), or internal timeout. |
| 2'b11 | DECERR | Decode error. Address outside the 34-bit HBM3 address space (should be caught by an upstream interconnect). |
The waveform shows a single AXI4 write: AW handshake at T0, W handshake at T1 (data arrives one cycle after address), HBM3 request issued at T2, i_wr_done at T4, B response at T5.
The module implements three parallel state machines: a write FSM (tracks AW + W → HBM3 request → B), a read FSM (tracks AR → HBM3 request → R), and a simple ID-valid table for outstanding write tracking.
// hbm3_axi4_if.v — AXI4 Slave Interface for HBM3 Controller // Phase 3 Module 12 · EcrioniX — https://ecrionix.org/hbm3-controller/axi4-if/ module hbm3_axi4_if #( parameter MAX_ID = 8 // outstanding write IDs ) ( // AXI4 global input wire i_aclk, input wire i_aresetn, // AW channel (master → slave) input wire [7:0] i_awid, input wire [33:0] i_awaddr, input wire [7:0] i_awlen, input wire i_awvalid, output reg o_awready, // W channel input wire [255:0] i_wdata, input wire [31:0] i_wstrb, input wire i_wlast, input wire i_wvalid, output reg o_wready, // B channel output reg [7:0] o_bid, output reg [1:0] o_bresp, output reg o_bvalid, input wire i_bready, // AR channel input wire [7:0] i_arid, input wire [33:0] i_araddr, input wire [7:0] i_arlen, input wire i_arvalid, output reg o_arready, // R channel output reg [7:0] o_rid, output reg [255:0] o_rdata, output reg [1:0] o_rresp, output reg o_rlast, output reg o_rvalid, input wire i_rready, // To HBM3 engine output reg o_req_valid, output reg o_req_wr, output reg [33:0] o_req_addr, output reg [255:0] o_req_data, output reg [31:0] o_req_mask, input wire i_req_ready, // From HBM3 engine input wire [255:0] i_rd_data, input wire i_rd_valid, input wire i_wr_done ); // ── RESP constants ────────────────────────────────── localparam [1:0] RESP_OKAY = 2'b00, RESP_SLVERR = 2'b10; // ── Write FSM ─────────────────────────────────────── localparam [2:0] WS_IDLE = 3'b000, WS_AW = 3'b001, // waiting for AW (W already arrived) WS_W = 3'b010, // waiting for W (AW already arrived) WS_REQ = 3'b011, // issuing HBM3 write request WS_WAIT = 3'b100, // waiting for i_wr_done WS_BRESP = 3'b101; // B channel handshake reg [2:0] wr_state; reg [7:0] r_awid; reg [33:0] r_awaddr; reg [255:0] r_wdata; reg [31:0] r_wstrb; reg r_aw_got, r_w_got; always @(posedge i_aclk or negedge i_aresetn) begin if (!i_aresetn) begin wr_state <= WS_IDLE; o_awready <= 1'b1; o_wready <= 1'b1; o_bvalid <= 1'b0; o_bid <= 8'h0; o_bresp <= RESP_OKAY; o_req_valid <= 1'b0; o_req_wr <= 1'b0; o_req_addr <= 34'h0; o_req_data <= 256'h0; o_req_mask <= 32'h0; r_aw_got <= 1'b0; r_w_got <= 1'b0; end else begin // Accept AW when idle or waiting-for-AW if (i_awvalid && o_awready) begin r_awid <= i_awid; r_awaddr <= i_awaddr; r_aw_got <= 1'b1; o_awready <= 1'b0; // close until current txn done end // Accept W when idle or waiting-for-W if (i_wvalid && o_wready) begin r_wdata <= i_wdata; r_wstrb <= i_wstrb; r_w_got <= 1'b1; o_wready <= 1'b0; end case (wr_state) WS_IDLE: begin if (r_aw_got && r_w_got) wr_state <= WS_REQ; end WS_REQ: begin o_req_valid <= 1'b1; o_req_wr <= 1'b1; o_req_addr <= r_awaddr; o_req_data <= r_wdata; o_req_mask <= r_wstrb; if (i_req_ready) begin o_req_valid <= 1'b0; wr_state <= WS_WAIT; end end WS_WAIT: begin if (i_wr_done) begin o_bvalid <= 1'b1; o_bid <= r_awid; o_bresp <= RESP_OKAY; wr_state <= WS_BRESP; end end WS_BRESP: begin if (i_bready) begin o_bvalid <= 1'b0; r_aw_got <= 1'b0; r_w_got <= 1'b0; o_awready <= 1'b1; o_wready <= 1'b1; wr_state <= WS_IDLE; end end default: wr_state <= WS_IDLE; endcase end end // ── Read FSM ──────────────────────────────────────── localparam [1:0] RS_IDLE = 2'b00, RS_REQ = 2'b01, // issuing HBM3 read request RS_WAIT = 2'b10, // waiting for i_rd_valid RS_RRESP = 2'b11; // R channel handshake reg [1:0] rd_state; reg [7:0] r_arid; reg [33:0] r_araddr; always @(posedge i_aclk or negedge i_aresetn) begin if (!i_aresetn) begin rd_state <= RS_IDLE; o_arready <= 1'b1; o_rvalid <= 1'b0; o_rlast <= 1'b0; o_rid <= 8'h0; o_rdata <= 256'h0; o_rresp <= RESP_OKAY; end else begin case (rd_state) RS_IDLE: begin if (i_arvalid && o_arready) begin r_arid <= i_arid; r_araddr <= i_araddr; o_arready <= 1'b0; rd_state <= RS_REQ; end end RS_REQ: begin o_req_valid <= 1'b1; o_req_wr <= 1'b0; o_req_addr <= r_araddr; o_req_data <= 256'h0; o_req_mask <= 32'hFFFF_FFFF; if (i_req_ready) begin o_req_valid <= 1'b0; rd_state <= RS_WAIT; end end RS_WAIT: begin if (i_rd_valid) begin o_rdata <= i_rd_data; o_rid <= r_arid; o_rresp <= RESP_OKAY; o_rlast <= 1'b1; o_rvalid <= 1'b1; rd_state <= RS_RRESP; end end RS_RRESP: begin if (i_rready) begin o_rvalid <= 1'b0; o_rlast <= 1'b0; o_arready <= 1'b1; rd_state <= RS_IDLE; end end default: rd_state <= RS_IDLE; endcase end end endmodule
// tb_hbm3_axi4_if.sv — AXI4 interface testbench `timescale 1ns/1ps module tb_hbm3_axi4_if; // ── DUT signals ───────────────────────────────────── logic i_aclk, i_aresetn; logic [7:0] i_awid; logic [33:0] i_awaddr; logic [7:0] i_awlen; logic i_awvalid; logic o_awready; logic [255:0] i_wdata; logic [31:0] i_wstrb; logic i_wlast, i_wvalid; logic o_wready; logic [7:0] o_bid; logic [1:0] o_bresp; logic o_bvalid; logic i_bready; logic [7:0] i_arid; logic [33:0] i_araddr; logic [7:0] i_arlen; logic i_arvalid; logic o_arready; logic [7:0] o_rid; logic [255:0] o_rdata; logic [1:0] o_rresp; logic o_rlast, o_rvalid; logic i_rready; // HBM3 engine side logic o_req_valid, o_req_wr; logic [33:0] o_req_addr; logic [255:0] o_req_data; logic [31:0] o_req_mask; logic i_req_ready; logic [255:0] i_rd_data; logic i_rd_valid, i_wr_done; hbm3_axi4_if dut (.*); // ── Clock 500 MHz ──────────────────────────────────── initial i_aclk = 0; always #1 i_aclk = ~i_aclk; // ── SVA: BVALID must deassert after BREADY ─────────── assert property (@(posedge i_aclk) disable iff (!i_aresetn) (o_bvalid && i_bready) |=> !o_bvalid) else $error("BVALID not deasserted after BREADY at %0t", $time); // ── SVA: RVALID must deassert after RREADY ─────────── assert property (@(posedge i_aclk) disable iff (!i_aresetn) (o_rvalid && i_rready) |=> !o_rvalid) else $error("RVALID not deasserted after RREADY at %0t", $time); // ── SVA: AWREADY stable until handshake ────────────── assert property (@(posedge i_aclk) disable iff (!i_aresetn) (i_awvalid && !o_awready) |=> i_awvalid) else $error("Master dropped AWVALID before handshake at %0t", $time); // ── HBM3 model: accept request after 1 cycle ───────── initial i_req_ready = 0; always @(posedge i_aclk) begin i_req_ready <= o_req_valid; // 1-cycle latency accept end // ── HBM3 model: return write done / read data ──────── always @(posedge i_aclk) begin i_wr_done <= 1'b0; i_rd_valid <= 1'b0; i_rd_data <= 256'h0; if (i_req_ready && o_req_valid) begin if (o_req_wr) begin repeat(3) @(posedge i_aclk); i_wr_done <= 1'b1; end else begin repeat(5) @(posedge i_aclk); i_rd_data <= 256'hDEADBEEF_CAFEBABE_12345678_ABCDEF01_ FEEDFACE_BAADF00D_DEADC0DE_0BADF00D; i_rd_valid <= 1'b1; end end end task automatic axi4_write( input [7:0] id, input [33:0] addr, input [255:0] data ); // AW channel @(posedge i_aclk); i_awid = id; i_awaddr = addr; i_awlen = 8'h00; i_awvalid = 1'b1; // W channel (same cycle) i_wdata = data; i_wstrb = 32'hFFFF_FFFF; i_wlast = 1'b1; i_wvalid = 1'b1; @(posedge i_aclk iff (o_awready && o_wready)); i_awvalid = 1'b0; i_wvalid = 1'b0; // B channel i_bready = 1'b1; @(posedge i_aclk iff o_bvalid); $display("[%0t] WRITE DONE id=%0h addr=%0h bresp=%0b", $time, o_bid, addr, o_bresp); @(posedge i_aclk); i_bready = 1'b0; endtask task automatic axi4_read( input [7:0] id, input [33:0] addr ); @(posedge i_aclk); i_arid = id; i_araddr = addr; i_arlen = 8'h00; i_arvalid = 1'b1; @(posedge i_aclk iff o_arready); i_arvalid = 1'b0; i_rready = 1'b1; @(posedge i_aclk iff o_rvalid); $display("[%0t] READ DONE id=%0h data=%0h rresp=%0b", $time, o_rid, o_rdata, o_rresp); @(posedge i_aclk); i_rready = 1'b0; endtask initial begin {i_awvalid,i_wvalid,i_arvalid,i_bready,i_rready} = 5'h0; i_aresetn = 0; repeat(4) @(posedge i_aclk); i_aresetn = 1; repeat(2) @(posedge i_aclk); $display("=== TEST 1: AXI4 Write ==="); axi4_write(8'h05, 34'h1000, 256'hA5A5A5A5); repeat(2) @(posedge i_aclk); $display("=== TEST 2: AXI4 Read ==="); axi4_read(8'h07, 34'h1000); repeat(2) @(posedge i_aclk); $display("=== TEST 3: Back-to-back Write then Read ==="); axi4_write(8'h01, 34'h2000, 256'hDEADBEEF); axi4_read (8'h02, 34'h2000); repeat(4) @(posedge i_aclk); $display("=== ALL AXI4 TESTS PASSED ==="); $finish; end endmodule
| Signal | Channel | Dir (M→S) | Width | Description |
|---|---|---|---|---|
| i_awid | AW | In | 8 | Write address ID tag |
| i_awaddr | AW | In | 34 | Write address (byte-addressed, 16 GB HBM3 space) |
| i_awlen | AW | In | 8 | Burst length minus 1 (0=single beat) |
| i_awvalid | AW | In | 1 | AW channel valid |
| o_awready | AW | Out | 1 | AW channel ready (slave can accept) |
| i_wdata | W | In | 256 | Write data (2 pseudo-channels × 128 bits) |
| i_wstrb | W | In | 32 | Write byte enables (1 bit per byte) |
| i_wlast | W | In | 1 | Last beat of burst (always 1 for AWLEN=0) |
| i_wvalid | W | In | 1 | W channel valid |
| o_wready | W | Out | 1 | W channel ready |
| o_bid | B | Out | 8 | Write response ID (echoes AWID) |
| o_bresp | B | Out | 2 | Write response status (00=OKAY, 10=SLVERR) |
| o_bvalid | B | Out | 1 | B channel valid |
| i_bready | B | In | 1 | Master ready to accept B response |
| i_arid | AR | In | 8 | Read address ID tag |
| i_araddr | AR | In | 34 | Read address |
| i_arlen | AR | In | 8 | Burst length minus 1 |
| i_arvalid | AR | In | 1 | AR channel valid |
| o_arready | AR | Out | 1 | AR channel ready |
| o_rid | R | Out | 8 | Read data ID (echoes ARID) |
| o_rdata | R | Out | 256 | Read data from HBM3 |
| o_rresp | R | Out | 2 | Read response status |
| o_rlast | R | Out | 1 | Last data beat (always 1 for ARLEN=0) |
| o_rvalid | R | Out | 1 | R channel valid |
| i_rready | R | In | 1 | Master ready to accept read data |
AXI4 has five independent channels, each with its own VALID/READY handshake: AW (Write Address: ID, address, burst length/size/type), W (Write Data: 256-bit data, 32 byte enables, WLAST), B (Write Response: BID, BRESP status), AR (Read Address: same fields as AW), and R (Read Data: RID, RDATA, RRESP, RLAST). The channels are fully independent — data may flow before or after the address.
AXI4 allows multiple in-flight transactions identified by ID. The slave must capture the AWID/ARID at the handshake and return the same value in BID/RID on the response channel. Without ID tracking, a master issuing two writes simultaneously would not know which response corresponds to which request. Our module stores AWID in r_awid and echoes it as o_bid after i_wr_done.
A transfer occurs on the rising edge where both VALID and READY are simultaneously high. The master holds AWVALID (and address signals) stable until it sees AWREADY. The slave asserts AWREADY when it can accept a new transaction. The slave may assert AWREADY before AWVALID as a "pre-ready" for zero-wait-state operation — our module does this on reset to allow immediate acceptance of the first transaction.
SLVERR signals that the slave accepted the transaction but encountered an error during execution. In an HBM3 context this includes: HBM3 ECC uncorrectable error, write to a locked/protected region, unsupported AWLEN (non-zero burst), or internal timeout waiting for i_wr_done. The master's error handler must decide whether to retry, abort, or report the fault. Our current module always returns OKAY — SLVERR is the first production enhancement.
HBM3's natural access unit is BL4 per pseudo-channel: 4 × 32-bit = 128 bits per PC × 2 PCs = 256 bits total. This fits exactly in one AXI4 beat at AWSIZE=5 (32 bytes). AWLEN=0 means a 1-beat burst, creating a clean 1:1 mapping between AXI4 transactions and HBM3 accesses. Longer bursts (AWLEN > 0) would require a burst splitter that issues multiple sequential HBM3 requests — a logical extension for a future module.