AMBA Protocol

AXI4 – Advanced eXtensible Interface

AXI4 is the highest-performance bus in the AMBA 4 family. Its five independent channels — write address, write data, write response, read address, and read data — allow simultaneous read and write operations, out-of-order transaction completion, and burst transfers up to 256 beats, making it the standard backbone interconnect in ARM Cortex SoC designs.

Standard — AMBA 4 AXI (Arm)
Channels — 5 independent
Max Burst — 256 beats
Flow Control — VALID / READY handshake
Use Case — High-bandwidth SoC interconnect

Overview

AXI4 (Advanced eXtensible Interface 4) is part of the AMBA 4 specification released by Arm. It is designed for high-bandwidth, low-latency on-chip communication — typically connecting high-performance masters such as CPUs, DMA engines, and GPUs to memory controllers, peripherals, and other slaves in an SoC.

The defining feature of AXI4 is its channel-based architecture. Rather than a single multiplexed bus, it uses five independent channels, each with its own VALID/READY handshake. This allows the address of the next transaction to be issued before the previous one completes, enabling pipelined and out-of-order operation.

AXI4 variants: AXI4 (full) supports bursts up to 256 beats and is used for memory-mapped transfers. AXI4-Lite is a simplified subset with no bursts, used for low-bandwidth control registers. AXI4-Stream removes address channels entirely for unidirectional data pipelines.

ProtocolBandwidthBurstChannelsTypical Use
APBLowNone1 sharedPeripheral registers (UART, SPI, GPIO)
AHBMediumUp to 16PipelinedOn-chip bus for medium-speed masters
AXI4-LiteLow–MediumNone5 (no burst)Control/status register access
AXI4HighUp to 2565 independentDDR, DMA, CPU interconnect
AXI4-StreamVery HighUnlimited1 (no addr)DSP, video, DMA streaming

Five Independent Channels

AXI4 separates traffic into five channels. Each operates independently with its own VALID/READY pair, so a slow response on the write response channel cannot block progress on the read address channel.

AW
Write Address
Master → Slave
W
Write Data
Master → Slave
B
Write Response
Slave → Master
AR
Read Address
Master → Slave
R
Read Data
Slave → Master
MASTER SLAVE AW — Write Address W — Write Data B — Write Response AR — Read Address R — Read Data ACLK / ARESETn shared by all channels

Signal Reference

Global

SignalDirDescription
ACLKGlobal clock. All channel signals sampled on rising edge.
ARESETnActive-low synchronous reset.

Write Address Channel (AW) — Master → Slave

SignalWidthDescription
AWIDvariableTransaction ID. Allows out-of-order responses.
AWADDR32/64Start address of the burst.
AWLEN8Burst length minus one. 0x00=1 beat … 0xFF=256 beats.
AWSIZE3Bytes per beat: 0=1B, 1=2B, 2=4B, 3=8B, 4=16B …
AWBURST2Burst type: 00=FIXED, 01=INCR, 10=WRAP.
AWLOCK1Lock type for exclusive access sequences.
AWCACHE4Cache attributes: bufferable, cacheable, allocate hints.
AWPROT3Protection: privileged, secure, instruction/data.
AWQOS4Quality of Service priority (0=lowest, 15=highest).
AWVALID1Master asserts: address channel info is valid.
AWREADY1Slave asserts: ready to accept address.

Write Data Channel (W) — Master → Slave

SignalWidthDescription
WDATA32/64/128…Write data for this beat.
WSTRBWDATA/8Byte enable strobe. Bit n=1 means byte lane n is valid.
WLAST1Asserted on the final beat of a burst.
WVALID1Master: write data is valid.
WREADY1Slave: ready to accept write data.

Write Response Channel (B) — Slave → Master

SignalWidthDescription
BIDvariableMatches the AWID of the completed write transaction.
BRESP2Write response status: OKAY, EXOKAY, SLVERR, DECERR.
BVALID1Slave: response is valid.
BREADY1Master: ready to accept response.

Read Address Channel (AR) — Master → Slave

SignalWidthDescription
ARIDvariableTransaction ID for the read.
ARADDR32/64Start address of the read burst.
ARLEN8Burst length minus one.
ARSIZE3Bytes per beat.
ARBURST2Burst type: FIXED, INCR, WRAP.
ARLOCK1Exclusive access lock.
ARCACHE4Cache attributes.
ARPROT3Protection attributes.
ARQOS4QoS priority.
ARVALID1Master: read address is valid.
ARREADY1Slave: ready to accept read address.

Read Data Channel (R) — Slave → Master

SignalWidthDescription
RIDvariableMatches ARID of the request.
RDATA32/64/128…Read data for this beat.
RRESP2Read response status per beat.
RLAST1Asserted on the final beat of the read burst.
RVALID1Slave: read data is valid.
RREADY1Master: ready to accept read data.

VALID / READY Handshake

Every AXI4 channel uses the same two-signal handshake. A transfer completes on the clock edge where both VALID and READY are HIGH simultaneously.

ACLK VALID VALID held high by master READY slave not ready READY asserted Transfer ✓ VALID & READY both HIGH

Key rule: The source (e.g. master on AW channel) may not de-assert VALID once asserted, until the transfer completes. The destination (slave) may assert READY at any time — even before VALID — without restriction.

Write Transaction

A write transaction uses all three write channels: AW → W → B. The address and data channels can operate concurrently — the master does not need to wait for AW to complete before issuing data on W.

ACLK AWVALID AWREADY WVALID WLAST WREADY BVALID BREADY

Steps: (1) Master drives write address on AW channel — accepted when AWVALID & AWREADY. (2) Master drives write data beats on W channel, asserting WLAST on the final beat — each beat accepted when WVALID & WREADY. (3) Slave issues write response on B channel — accepted when BVALID & BREADY.

Read Transaction

A read uses two channels: AR → R. The master sends the read address; the slave returns data beats with RLAST on the final beat. No separate response channel is needed — the per-beat RRESP carries status.

ACLK ARVALID ARREADY AR✓ RVALID RLAST RREADY D0 D1 D2 D3 RLAST✓

Burst Types & Attributes

AWBURSTNameAddress BehaviourUse Case
2'b00FIXEDSame address for every beatFIFO / circular buffer fills
2'b01INCRIncrements by transfer size each beatSequential memory reads/writes (most common)
2'b10WRAPLike INCR but wraps at a power-of-2 boundaryCache-line fills (cache line boundary wrap)

The transfer size (AWSIZE / ARSIZE) encodes bytes per beat as a power of two, and must not exceed the data bus width:

AXSIZE[2:0]Bytes/beat
3'b0001
3'b0012
3'b0104
3'b0118
3'b10016
3'b10132
3'b11064
3'b111128

Response Codes (BRESP / RRESP)

CodeNameMeaning
2'b00OKAYNormal successful completion.
2'b01EXOKAYExclusive access succeeded (for LL/SC operations).
2'b10SLVERRSlave error — transaction reached the slave but was rejected (e.g., write to read-only register).
2'b11DECERRDecode error — no slave exists at that address (issued by interconnect).

BRESP covers the entire write burst — one response after WLAST. RRESP is per-beat and travels with each R-channel transfer, allowing fine-grained error signalling within a burst.

Verilog RTL — AXI4-Lite Slave (Register File)

A minimal AXI4-Lite slave with a 4×32-bit register file. AXI4-Lite is the simplified subset (no bursts), widely used for control register access.

Verilog axi4lite_slave.v
module axi4lite_slave #(
  parameter AW = 32,
  parameter DW = 32
) (
  input  wire          ACLK, ARESETn,

  // Write address channel
  input  wire [AW-1:0] AWADDR,
  input  wire          AWVALID,
  output reg           AWREADY,

  // Write data channel
  input  wire [DW-1:0] WDATA,
  input  wire [DW/8-1:0] WSTRB,
  input  wire          WVALID,
  output reg           WREADY,

  // Write response channel
  output reg  [1:0]    BRESP,
  output reg           BVALID,
  input  wire          BREADY,

  // Read address channel
  input  wire [AW-1:0] ARADDR,
  input  wire          ARVALID,
  output reg           ARREADY,

  // Read data channel
  output reg  [DW-1:0] RDATA,
  output reg  [1:0]    RRESP,
  output reg           RVALID,
  input  wire          RREADY
);

  // 4-register file
  reg [DW-1:0] regs [0:3];
  reg [AW-1:0] aw_addr_lat;
  integer i;

  // ── Write path ──────────────────────────────
  always @(posedge ACLK) begin
    if (!ARESETn) begin
      AWREADY <= 1'b0; WREADY <= 1'b0;
      BVALID  <= 1'b0; BRESP  <= 2'b00;
      for (i=0; i<4; i=i+1) regs[i] <= 32'h0;
    end else begin
      // Accept write address
      AWREADY <= AWVALID && !AWREADY;
      if (AWVALID && AWREADY)
        aw_addr_lat <= AWADDR;

      // Accept write data and update register
      WREADY <= WVALID && !WREADY;
      if (WVALID && WREADY) begin
        if (aw_addr_lat[3:2] < 4) begin
          if (WSTRB[0]) regs[aw_addr_lat[3:2]][7:0]   <= WDATA[7:0];
          if (WSTRB[1]) regs[aw_addr_lat[3:2]][15:8]  <= WDATA[15:8];
          if (WSTRB[2]) regs[aw_addr_lat[3:2]][23:16] <= WDATA[23:16];
          if (WSTRB[3]) regs[aw_addr_lat[3:2]][31:24] <= WDATA[31:24];
          BRESP <= 2'b00; // OKAY
        end else
          BRESP <= 2'b10; // SLVERR
        BVALID <= 1'b1;
      end

      // Clear response once master accepts it
      if (BVALID && BREADY)
        BVALID <= 1'b0;
    end
  end

  // ── Read path ───────────────────────────────
  always @(posedge ACLK) begin
    if (!ARESETn) begin
      ARREADY <= 1'b0; RVALID <= 1'b0;
      RDATA   <= 32'h0; RRESP  <= 2'b00;
    end else begin
      ARREADY <= ARVALID && !ARREADY;
      if (ARVALID && ARREADY) begin
        RDATA  <= (ARADDR[3:2] < 4) ? regs[ARADDR[3:2]] : 32'hDEAD_BEEF;
        RRESP  <= (ARADDR[3:2] < 4) ? 2'b00 : 2'b10;
        RVALID <= 1'b1;
      end
      if (RVALID && RREADY)
        RVALID <= 1'b0;
    end
  end

endmodule

Interview Q&A

Why does AXI4 have five separate channels instead of one shared bus?
Separate channels remove ordering dependencies between unrelated events. A slow write response on channel B does not block a new read address on channel AR. This decoupling enables pipelining — the master can issue the address of transaction N+1 before transaction N completes, hiding latency. A single bus would serialize all these events and stall when any one of them waits.
Can WVALID be asserted before AWVALID is accepted?
Yes. AXI4 allows write data to appear on the W channel before the write address is accepted on AW, at the same time, or after. The slave must buffer the data until it has the address. This flexibility lets the master pipeline data production independently of address arbitration.
What is the difference between SLVERR and DECERR?
SLVERR (Slave Error) means the transaction reached a slave but the slave rejected it — e.g., writing to a read-only register or accessing an unsupported feature. DECERR (Decode Error) means no slave was found for that address — the interconnect (AXI crossbar or bridge) could not route the transaction to any mapped slave. DECERR is issued by the interconnect, not by any endpoint slave.
What is the maximum AXI4 burst length and how is it encoded?
AXI4 supports up to 256 beats per burst. The burst length is encoded in the 8-bit AWLEN/ARLEN field as length − 1: AXLEN=0 means 1 beat, AXLEN=255 means 256 beats. AXI3 was limited to 16 beats (AXLEN was 4 bits). Note that WRAP bursts in AXI4 are additionally constrained to power-of-2 lengths (2, 4, 8, or 16 beats).
What happens if a master de-asserts VALID mid-transaction?
This is a protocol violation. AXI4 requires that once a source asserts VALID, it must hold VALID HIGH until the handshake completes (both VALID and READY HIGH on the same rising clock edge). De-asserting VALID early could leave the destination in an undefined state waiting for a transfer that never arrives, causing a deadlock or data corruption.
How does AXI4 support out-of-order transactions?
Each transaction carries an ID tag (AWID / ARID). A slave or interconnect may complete transactions with different IDs in any order. When responses return (BRESP with BID, RDATA with RID), the master matches them to the original request using the ID. Transactions with the same ID must complete in order; transactions with different IDs can interleave. This is what allows multiple outstanding transactions to a multi-bank memory without stalling on each response.
What is WSTRB and why is it needed?
WSTRB (Write Strobe) is a byte-enable signal with one bit per byte lane of WDATA. When bit n is HIGH, byte lane n of WDATA is valid and should be written; when LOW, the slave ignores that byte. This allows partial-word writes — for example, updating only the upper byte of a 32-bit register without affecting the lower three bytes — without needing a read-modify-write cycle.