AMBA Protocol

AXI4-Stream – AMBA Streaming Interface

AXI4-Stream is the streaming subset of the AMBA 4 family. It strips away the address channel entirely and focuses on moving a continuous, ordered flow of data beats from a single master to a single slave — making it the standard interface for DSP pipelines, video pixel streams, network packet engines, and DMA data paths.

Standard — AMBA 4 AXI-Stream (Arm)
Direction — Unidirectional
Address Channel — None
Flow Control — TVALID / TREADY (backpressure)
Use Case — DSP, video, DMA, networking

Overview & Comparison

AXI4 (full) is designed for memory-mapped transactions — each transfer has an address, allowing random access to any location. AXI4-Stream removes the address dimension completely. Data simply flows from producer to consumer in the order it is presented, with no concept of memory location.

This simplicity is its strength: there is only one channel with a handful of signals, yet it can sustain 100% bus utilisation (one data beat every clock cycle) when the slave keeps TREADY asserted. It is the dominant interface inside Xilinx/AMD and Intel/Altera FPGA IP cores, Arm Cortex-A DMA engines, and all modern video/image signal processing pipelines.

FeatureAXI4 (full)AXI4-Stream
Address channelYes (AW / AR)None
Response channelYes (B / R includes RRESP)None
Data directionBidirectionalUnidirectional
Burst limit256 beatsUnlimited
Packet framingWLAST / RLASTTLAST
Byte lanesWSTRBTKEEP / TSTRB
Typical useMemory, peripheralsDSP, video, networking
MASTER SLAVE TDATA, TVALID, TLAST, TKEEP … TREADY (backpressure)

Signal Reference

Only TDATA, TVALID, and TREADY are mandatory. All other signals are optional and included only when the application needs them.

SignalDirWidthRequiredDescription
ACLK1YesGlobal clock. All signals sampled on rising edge.
ARESETn1YesActive-low synchronous reset.
TDATAM→S8×NYesData payload. Width must be a multiple of 8 bits (8, 16, 32, 64, 128…).
TVALIDM→S1YesMaster: TDATA and sideband signals are valid this cycle.
TREADYS→M1RecommendedSlave: ready to accept data. When omitted, slave is always ready.
TLASTM→S1OptionalMarks the last beat of a packet. Resets framing state in the slave.
TKEEPM→STDATA/8OptionalByte qualifier: 1=this byte is a data byte, 0=null byte (should be ignored). Null bytes may appear only at the end of a packet (TLAST beat).
TSTRBM→STDATA/8OptionalPosition byte qualifier. TSTRB=1,TKEEP=1=data byte; TSTRB=0,TKEEP=1=position byte (occupies a lane but carries no data); TSTRB=0,TKEEP=0=null byte.
TUSERM→SvariableOptionalUser-defined sideband information. Travels alongside TDATA (e.g., SOP flag, error flag, pixel component ID).
TIDM→SvariableOptionalStream identifier. Allows a single physical interface to carry multiple logical streams.
TDESTM→SvariableOptionalRouting destination. Used by stream switches and interconnects to direct data to the correct slave.

Minimum viable interface: A design that never needs backpressure (e.g., a data generator feeding a FIFO that never fills) only needs TDATA + TVALID. Add TREADY when the consumer can be slow. Add TLAST when data arrives in discrete packets or frames.

TVALID / TREADY Handshake

The handshake rule is identical to AXI4: a transfer occurs on the rising clock edge when both TVALID and TREADY are HIGH.

ACLK TVALID TVALID held high (master has data) TREADY not ready ready TDATA D0 D0 stall D0✓ D1✓ D2✓ D3✓ TLAST last beat slave stalls master — data held, no transfer

When the slave de-asserts TREADY (stall), the master must hold TDATA and TVALID stable until TREADY returns. This is called backpressure — the slave is telling the master it is not ready to consume data.

Packet Framing with TLAST

AXI4-Stream itself has no concept of a packet size — data beats flow continuously. TLAST provides a single-bit end-of-packet marker: when HIGH on a data beat, it signals that this is the last beat of the current packet or frame. The slave resets its framing state after the TLAST beat.

ACLK A0 A1 A2 B0 B1 B2 B3 idle ◀── Packet A (3 beats) ──▶ ◀── Packet B (4 beats) ──▶ TLAST A2=LAST B3=LAST TVALID

Two back-to-back packets can be transferred without any idle cycle between them — the cycle after TLAST simply carries the first beat of the next packet.

TKEEP & TSTRB — Byte Qualifiers

When the final beat of a packet does not fill the entire data bus (e.g., a 10-byte payload on a 32-bit / 4-byte bus), TKEEP marks which byte lanes carry meaningful data.

TSTRBTKEEPByte typeDescription
11Data byteLane carries meaningful data — must be processed by slave.
01Position byteLane occupies a position in the stream but carries no application data (e.g., padding for alignment).
00Null byteLane is empty — slave must ignore. Valid only on the TLAST beat.
10Not permitted (undefined / illegal combination).

Most designs use TKEEP only. TSTRB is used in specialised applications that need to distinguish position bytes (e.g., protocol headers where zero-padding must be transmitted but not counted as data). If TSTRB is omitted, all kept bytes are implicitly data bytes.

32-bit bus — last beat carries only 2 valid bytes Byte 3 TKEEP=1 Byte 2 TKEEP=1 Byte 1 TKEEP=0 Byte 0 TKEEP=0 TDATA[31:16] valid | TDATA[15:0] = null (ignored) TLAST=1 on this beat

Backpressure

Backpressure is the mechanism by which a slow slave signals the master to pause. When the slave de-asserts TREADY=0, no transfer occurs even if TVALID is HIGH. The master must hold its data stable on TDATA (along with TVALID and all sideband signals) until TREADY returns.

A common pattern is to insert a skid buffer (a two-entry register FIFO) between the master and slave. This absorbs one beat of backpressure without stalling the master, improving pipeline efficiency.

Design rule: A master must never make the assertion of TVALID depend on TREADY being HIGH. Doing so can create a combinational deadlock where master waits for slave to be ready and slave waits for master to be valid — and neither ever fires. The master must assert TVALID unconditionally when it has data.

Verilog RTL

Two complementary modules: an AXI4-Stream master that generates a packet, and a skid-buffer slave that absorbs one cycle of backpressure.

Verilog axis_master.v — simple packet sender
module axis_master #(
  parameter DW        = 32,
  parameter PKT_BEATS = 4   // beats per packet
) (
  input  wire          aclk,
  input  wire          aresetn,

  // AXI4-Stream master output
  output reg  [DW-1:0] m_tdata,
  output reg           m_tvalid,
  output reg           m_tlast,
  output wire [DW/8-1:0] m_tkeep,
  input  wire          m_tready,

  // trigger to start sending one packet
  input  wire          send
);

  reg [$clog2(PKT_BEATS):0] beat_cnt;
  assign m_tkeep = {(DW/8){1'b1}};  // all bytes valid

  always @(posedge aclk) begin
    if (!aresetn) begin
      m_tvalid <= 1'b0; m_tlast <= 1'b0;
      m_tdata  <= '0;  beat_cnt <= 0;
    end else begin
      if (send && !m_tvalid) begin
        m_tvalid <= 1'b1;
        m_tdata  <= 32'hA000_0000;
        beat_cnt <= 1;
        m_tlast  <= (PKT_BEATS == 1);
      end else if (m_tvalid && m_tready) begin
        if (m_tlast) begin  // packet done
          m_tvalid <= 1'b0;
          m_tlast  <= 1'b0;
          beat_cnt <= 0;
        end else begin
          beat_cnt <= beat_cnt + 1;
          m_tdata  <= m_tdata + 1;
          m_tlast  <= (beat_cnt + 1 == PKT_BEATS - 1);
        end
      end
    end
  end
endmodule
Verilog axis_skid_buf.v — backpressure absorber
// Skid buffer: absorbs one beat of backpressure so the upstream
// master never stalls even when downstream de-asserts TREADY.
module axis_skid_buf #(parameter DW = 32) (
  input  wire          aclk, aresetn,

  input  wire [DW-1:0] s_tdata,
  input  wire          s_tvalid, s_tlast,
  output reg           s_tready,

  output reg  [DW-1:0] m_tdata,
  output reg           m_tvalid, m_tlast,
  input  wire          m_tready
);

  reg [DW-1:0] skid_data;
  reg          skid_valid, skid_last;

  always @(posedge aclk) begin
    if (!aresetn) begin
      m_tvalid <= 1'b0; skid_valid <= 1'b0; s_tready <= 1'b1;
    end else begin
      if (m_tready || !m_tvalid) begin
        if (skid_valid) begin          // drain skid slot first
          m_tdata  <= skid_data;
          m_tlast  <= skid_last;
          m_tvalid <= 1'b1;
          skid_valid <= 1'b0;
          s_tready   <= 1'b1;
        end else begin
          m_tdata  <= s_tdata;
          m_tlast  <= s_tlast;
          m_tvalid <= s_tvalid;
        end
      end else if (s_tvalid && s_tready) begin  // downstream stalled, save beat
        skid_data  <= s_tdata;
        skid_last  <= s_tlast;
        skid_valid <= 1'b1;
        s_tready   <= 1'b0;
      end
    end
  end
endmodule

Use Cases

DomainApplicationKey Signals Used
Video / ImageCamera sensor → ISP → display pipelineTDATA (pixels), TLAST (end of line/frame), TUSER (SOF flag)
DSPFFT, FIR filter chains, sample streamingTDATA (samples), TLAST (end of block), TKEEP
NetworkingEthernet MAC → packet processorTDATA, TLAST (end of frame), TKEEP (partial last word), TUSER (error flag)
DMAMemory → peripheral data moverTDATA, TLAST (transaction boundary)
FPGA IP CoresXilinx/Intel IP interconnect (FFT, DDS, FIR)TDATA, TVALID, TREADY, TLAST
Crypto / SecurityAES encryption pipeline, hash enginesTDATA (128/256-bit blocks), TLAST (last block), TUSER (key select)

Interview Q&A

What is the minimum set of signals needed for an AXI4-Stream interface?
The mandatory signals are ACLK, ARESETn, TDATA, and TVALID. TREADY is strongly recommended (without it the slave must always be ready), and TLAST is needed whenever data arrives in discrete packets or frames. All other signals — TKEEP, TSTRB, TUSER, TID, TDEST — are optional and should only be added when the application explicitly requires them.
What is the difference between TKEEP and TSTRB?
TKEEP marks whether each byte lane is a "live" byte (1) or a null byte to be discarded (0). TSTRB further distinguishes live bytes into data bytes (TSTRB=1, TKEEP=1) and position bytes (TSTRB=0, TKEEP=1). A position byte occupies a bus lane and represents a specific position in the data stream, but carries no actual application data — it is used for padding-aware protocols where alignment must be preserved. Most designs use TKEEP alone and treat all kept bytes as data bytes.
Can TLAST be omitted? What happens?
Yes — TLAST is optional. Without it, the stream is treated as an infinite sequence of beats with no packet boundaries. This is valid for continuous data sources like ADC sample streams or audio PCM feeds where there is no concept of a packet end. However, any slave or IP that needs to know when a packet or frame ends (e.g., a network packet buffer, a video line buffer) requires TLAST to function correctly.
Why must a master not make TVALID conditional on TREADY?
If the master only asserts TVALID when TREADY is already HIGH, and the slave only asserts TREADY when TVALID is already HIGH, neither signal will ever be set — a combinational deadlock. The AXI4-Stream spec explicitly forbids this dependency on the master side: TVALID must be driven based solely on whether the master has data to send, independent of the slave's TREADY state.
What is a skid buffer and why is it used?
A skid buffer is a 2-entry pipeline register that inserts between an AXI4-Stream master and a slave. When the downstream slave de-asserts TREADY, the skid buffer absorbs one beat of data and asserts its own input TREADY LOW — giving the upstream master one extra cycle to register the stall before it needs to stop. This prevents the master from needing to react to backpressure in zero cycles (which would require a purely combinational path), allowing fully registered, timing-friendly designs.
How do TID and TDEST work in a stream switch?
TID identifies the logical stream within a single physical interface — one master can multiplex several independent data streams onto one set of wires, tagging each beat with a stream ID. TDEST carries a routing destination tag used by a stream switch (interconnect) to direct beats to the correct output port. In a video system, for example, TID might distinguish between Y, Cb, and Cr component streams, while TDEST selects which of several display pipelines receives the data.
How does AXI4-Stream achieve 100% bus utilisation?
Because there is no address phase, no response phase, and no burst overhead — every clock cycle where TVALID=1 and TREADY=1 transfers exactly one data beat. A master that keeps TVALID continuously asserted and a slave that keeps TREADY continuously asserted will transfer one beat per cycle, every cycle — limited only by the clock frequency and data bus width. This is impossible with memory-mapped AXI4, which has address and response overhead that prevents back-to-back data transfer at every cycle.