AXI4-Stream – AMBA Streaming Interface
AXI4-Stream is the streaming subset of the AMBA 4 family. It strips away the address channel entirely and focuses on moving a continuous, ordered flow of data beats from a single master to a single slave — making it the standard interface for DSP pipelines, video pixel streams, network packet engines, and DMA data paths.
Overview & Comparison
AXI4 (full) is designed for memory-mapped transactions — each transfer has an address, allowing random access to any location. AXI4-Stream removes the address dimension completely. Data simply flows from producer to consumer in the order it is presented, with no concept of memory location.
This simplicity is its strength: there is only one channel with a handful of signals, yet it can sustain 100% bus utilisation (one data beat every clock cycle) when the slave keeps TREADY asserted. It is the dominant interface inside Xilinx/AMD and Intel/Altera FPGA IP cores, Arm Cortex-A DMA engines, and all modern video/image signal processing pipelines.
| Feature | AXI4 (full) | AXI4-Stream |
|---|---|---|
| Address channel | Yes (AW / AR) | None |
| Response channel | Yes (B / R includes RRESP) | None |
| Data direction | Bidirectional | Unidirectional |
| Burst limit | 256 beats | Unlimited |
| Packet framing | WLAST / RLAST | TLAST |
| Byte lanes | WSTRB | TKEEP / TSTRB |
| Typical use | Memory, peripherals | DSP, video, networking |
Signal Reference
Only TDATA, TVALID, and TREADY are mandatory. All other signals are optional and included only when the application needs them.
| Signal | Dir | Width | Required | Description |
|---|---|---|---|---|
| ACLK | – | 1 | Yes | Global clock. All signals sampled on rising edge. |
| ARESETn | – | 1 | Yes | Active-low synchronous reset. |
| TDATA | M→S | 8×N | Yes | Data payload. Width must be a multiple of 8 bits (8, 16, 32, 64, 128…). |
| TVALID | M→S | 1 | Yes | Master: TDATA and sideband signals are valid this cycle. |
| TREADY | S→M | 1 | Recommended | Slave: ready to accept data. When omitted, slave is always ready. |
| TLAST | M→S | 1 | Optional | Marks the last beat of a packet. Resets framing state in the slave. |
| TKEEP | M→S | TDATA/8 | Optional | Byte qualifier: 1=this byte is a data byte, 0=null byte (should be ignored). Null bytes may appear only at the end of a packet (TLAST beat). |
| TSTRB | M→S | TDATA/8 | Optional | Position byte qualifier. TSTRB=1,TKEEP=1=data byte; TSTRB=0,TKEEP=1=position byte (occupies a lane but carries no data); TSTRB=0,TKEEP=0=null byte. |
| TUSER | M→S | variable | Optional | User-defined sideband information. Travels alongside TDATA (e.g., SOP flag, error flag, pixel component ID). |
| TID | M→S | variable | Optional | Stream identifier. Allows a single physical interface to carry multiple logical streams. |
| TDEST | M→S | variable | Optional | Routing destination. Used by stream switches and interconnects to direct data to the correct slave. |
Minimum viable interface: A design that never needs backpressure (e.g., a data generator feeding a FIFO that never fills) only needs TDATA + TVALID. Add TREADY when the consumer can be slow. Add TLAST when data arrives in discrete packets or frames.
TVALID / TREADY Handshake
The handshake rule is identical to AXI4: a transfer occurs on the rising clock edge when both TVALID and TREADY are HIGH.
When the slave de-asserts TREADY (stall), the master must hold TDATA and TVALID stable until TREADY returns. This is called backpressure — the slave is telling the master it is not ready to consume data.
Packet Framing with TLAST
AXI4-Stream itself has no concept of a packet size — data beats flow continuously. TLAST provides a single-bit end-of-packet marker: when HIGH on a data beat, it signals that this is the last beat of the current packet or frame. The slave resets its framing state after the TLAST beat.
Two back-to-back packets can be transferred without any idle cycle between them — the cycle after TLAST simply carries the first beat of the next packet.
TKEEP & TSTRB — Byte Qualifiers
When the final beat of a packet does not fill the entire data bus (e.g., a 10-byte payload on a 32-bit / 4-byte bus), TKEEP marks which byte lanes carry meaningful data.
| TSTRB | TKEEP | Byte type | Description |
|---|---|---|---|
| 1 | 1 | Data byte | Lane carries meaningful data — must be processed by slave. |
| 0 | 1 | Position byte | Lane occupies a position in the stream but carries no application data (e.g., padding for alignment). |
| 0 | 0 | Null byte | Lane is empty — slave must ignore. Valid only on the TLAST beat. |
| 1 | 0 | — | Not permitted (undefined / illegal combination). |
Most designs use TKEEP only. TSTRB is used in specialised applications that need to distinguish position bytes (e.g., protocol headers where zero-padding must be transmitted but not counted as data). If TSTRB is omitted, all kept bytes are implicitly data bytes.
Backpressure
Backpressure is the mechanism by which a slow slave signals the master to pause. When the slave de-asserts TREADY=0, no transfer occurs even if TVALID is HIGH. The master must hold its data stable on TDATA (along with TVALID and all sideband signals) until TREADY returns.
A common pattern is to insert a skid buffer (a two-entry register FIFO) between the master and slave. This absorbs one beat of backpressure without stalling the master, improving pipeline efficiency.
Design rule: A master must never make the assertion of TVALID depend on TREADY being HIGH. Doing so can create a combinational deadlock where master waits for slave to be ready and slave waits for master to be valid — and neither ever fires. The master must assert TVALID unconditionally when it has data.
Verilog RTL
Two complementary modules: an AXI4-Stream master that generates a packet, and a skid-buffer slave that absorbs one cycle of backpressure.
module axis_master #( parameter DW = 32, parameter PKT_BEATS = 4 // beats per packet ) ( input wire aclk, input wire aresetn, // AXI4-Stream master output output reg [DW-1:0] m_tdata, output reg m_tvalid, output reg m_tlast, output wire [DW/8-1:0] m_tkeep, input wire m_tready, // trigger to start sending one packet input wire send ); reg [$clog2(PKT_BEATS):0] beat_cnt; assign m_tkeep = {(DW/8){1'b1}}; // all bytes valid always @(posedge aclk) begin if (!aresetn) begin m_tvalid <= 1'b0; m_tlast <= 1'b0; m_tdata <= '0; beat_cnt <= 0; end else begin if (send && !m_tvalid) begin m_tvalid <= 1'b1; m_tdata <= 32'hA000_0000; beat_cnt <= 1; m_tlast <= (PKT_BEATS == 1); end else if (m_tvalid && m_tready) begin if (m_tlast) begin // packet done m_tvalid <= 1'b0; m_tlast <= 1'b0; beat_cnt <= 0; end else begin beat_cnt <= beat_cnt + 1; m_tdata <= m_tdata + 1; m_tlast <= (beat_cnt + 1 == PKT_BEATS - 1); end end end end endmodule
// Skid buffer: absorbs one beat of backpressure so the upstream // master never stalls even when downstream de-asserts TREADY. module axis_skid_buf #(parameter DW = 32) ( input wire aclk, aresetn, input wire [DW-1:0] s_tdata, input wire s_tvalid, s_tlast, output reg s_tready, output reg [DW-1:0] m_tdata, output reg m_tvalid, m_tlast, input wire m_tready ); reg [DW-1:0] skid_data; reg skid_valid, skid_last; always @(posedge aclk) begin if (!aresetn) begin m_tvalid <= 1'b0; skid_valid <= 1'b0; s_tready <= 1'b1; end else begin if (m_tready || !m_tvalid) begin if (skid_valid) begin // drain skid slot first m_tdata <= skid_data; m_tlast <= skid_last; m_tvalid <= 1'b1; skid_valid <= 1'b0; s_tready <= 1'b1; end else begin m_tdata <= s_tdata; m_tlast <= s_tlast; m_tvalid <= s_tvalid; end end else if (s_tvalid && s_tready) begin // downstream stalled, save beat skid_data <= s_tdata; skid_last <= s_tlast; skid_valid <= 1'b1; s_tready <= 1'b0; end end end endmodule
Use Cases
| Domain | Application | Key Signals Used |
|---|---|---|
| Video / Image | Camera sensor → ISP → display pipeline | TDATA (pixels), TLAST (end of line/frame), TUSER (SOF flag) |
| DSP | FFT, FIR filter chains, sample streaming | TDATA (samples), TLAST (end of block), TKEEP |
| Networking | Ethernet MAC → packet processor | TDATA, TLAST (end of frame), TKEEP (partial last word), TUSER (error flag) |
| DMA | Memory → peripheral data mover | TDATA, TLAST (transaction boundary) |
| FPGA IP Cores | Xilinx/Intel IP interconnect (FFT, DDS, FIR) | TDATA, TVALID, TREADY, TLAST |
| Crypto / Security | AES encryption pipeline, hash engines | TDATA (128/256-bit blocks), TLAST (last block), TUSER (key select) |