AMBA Protocol

CXS – CCIX Transport Interface

CXS (CCIX Transport Interface) is an AMBA-family link-layer interface that carries CCIX packets between a coherent transaction layer and a transport component — enabling cache-coherent communication between CPUs and accelerators over a flit-based, credit-controlled channel.

Standard — AMBA CXS (Arm)
Family — AMBA 5 / CCIX
Transfer Type — Flit-based, point-to-point
Flow Control — Link credit
Use Case — CPU–Accelerator coherency

Overview & Context

CCIX (Cache Coherent Interconnect for Accelerators) is a standard for sharing coherent memory between a host CPU and heterogeneous accelerators such as GPUs, FPGAs, and ML inference chips. CXS is the AMBA link-layer interface that sits directly below the CCIX Transaction Layer (CTL), providing the physical framing and flow control needed to move CCIX packets reliably across a point-to-point link.

Think of CXS as the "PHY handoff" layer — the CTL above it generates CCIX packets; CXS wraps them into fixed-width flits and delivers them to the underlying transport (e.g., PCIe). It is not application software-visible, but every coherency transaction crosses it.

CXS defines separate TX (transmit) and RX (receive) paths, each carrying flits unidirectionally. It uses a link-credit mechanism to prevent the transmitter from overrunning the receiver's buffers, and a simple ACTIVEREQ / ACTIVEACK handshake for link-level power management.

CPU / Host Processor CCIX Transaction Layer (CTL) CXS Interface (this article) PCIe / Transport Layer GPU / FPGA / Accelerator CCIX Transaction Layer (CTL) CXS Interface (this article) PCIe / Transport Layer CXS Link

CXS Signal Interface

CXS has two independent unidirectional paths: TX (transmitter side) and RX (receiver side). All signals are synchronous to a shared clock CXSCLK. The width of the FLIT bus is configurable at design time (128, 256, or 512 bits).

SignalDirWidthDescription
CXSCLK1System clock. All CXS signals are sampled on the rising edge.
CXSRESETn1Active-low asynchronous reset. De-assert synchronously after clock is stable.
TXFLIT[N:0]TX→RX128/256/512Flit data bus. Carries one complete CCIX flit per cycle when TXFLITV is asserted.
TXFLITVTX→RX1Transmit flit valid. When HIGH, TXFLIT contains a valid flit.
TXLCRDVRX→TX1Transmit link credit valid. The receiver returns one credit to the transmitter per assertion.
TXACTIVEREQTX→RX1TX requests link to enter ACTIVE state. Part of the activation handshake.
TXACTIVEACKRX→TX1RX acknowledges TX activation request. Link becomes active when both match.
RXFLIT[N:0]RX→TX128/256/512Receive flit data bus. Mirror of TXFLIT in the opposite direction.
RXFLITVRX→TX1Receive flit valid.
RXLCRDVTX→RX1Receive link credit valid. TX returns credits to RX for the reverse channel.
RXACTIVEREQRX→TX1RX requests its transmit link to enter ACTIVE state.
RXACTIVEACKTX→RX1TX acknowledges RX activation request.

Note: TX and RX paths are fully independent. Each has its own ACTIVEREQ/ACTIVEACK pair, allowing one direction to be powered down while the other remains active — useful for asymmetric workloads.

Flit Structure

A flit (flow control unit) is the atomic transfer unit on the CXS interface. Every clock cycle in which FLITV=1, exactly one flit is transferred. The flit width is fixed per link instance and must be agreed between the two endpoints at design time.

TXFLIT[511:0] — 512-bit example CRC [511:496] 16 b Protocol [495:488] 8 b Header [487:448] 40 b CCIX Packet Payload [447:0] 448 b
FieldWidthDescription
CRC16 bitsCRC-16 integrity check over the entire flit. Receiver validates before passing to CTL.
Protocol8 bitsIdentifies the CCIX packet type (Request, Response, Snoop, Data, etc.).
Header40 bitsContains source/target node ID, transaction ID, message class, and ordering attributes.
PayloadRemainingCCIX packet body — address, data, or control information depending on packet type.

For 128-bit and 256-bit flit widths, large CCIX packets are segmented across multiple flits and reassembled by the receiving CTL.

Credit-Based Flow Control

CXS uses a link credit mechanism to prevent the transmitter from overrunning the receiver's input buffers. The receiver pre-loads the transmitter with a fixed number of credits at link activation. The transmitter spends one credit per flit sent and can only transmit when it holds at least one credit.

CXSCLK TXLCRDV +1 +1 +1 +1 TXFLITV F0 F1 F2 F3 F4 Credits 0 →1 →0 →1 →0 wait →1 1 →0 stall credit returned flit sent (1 credit spent) stall — no credit

At link activation, the receiver pre-loads the transmitter with an agreed number of credits (implementation-defined, typically 4–16). Each TXLCRDV pulse returns one credit; each TXFLITV cycle spends one. When the credit count reaches zero, the transmitter must stall — it cannot assert TXFLITV until a credit is returned.

Transfer Operation

A typical CXS flit transfer sequence proceeds as follows:

  1. Reset: Both TXACTIVEREQ and TXACTIVEACK are LOW. No flits are sent. CXSRESETn is asserted (LOW).
  2. De-assert reset: CXSRESETn goes HIGH. Receiver initialises its credit counters.
  3. Link activation: Transmitter asserts TXACTIVEREQ=1. Receiver responds with TXACTIVEACK=1. The receiver now pushes initial credits via TXLCRDV.
  4. Flit transfer: Transmitter asserts TXFLITV=1 and places flit data on TXFLIT. One credit is decremented per flit. Transfers continue back-to-back as long as credits are available.
  5. Credit return: Receiver pulses TXLCRDV=1 once per processed flit, returning a credit to the transmitter.
  6. Link deactivation: When idle, transmitter drops TXACTIVEREQ=0. Receiver responds with TXACTIVEACK=0. No flits may be sent while inactive.

Verilog RTL — CXS Transmitter Controller

A simplified CXS transmitter that manages link activation and credit-based flit dispatch:

Verilog cxs_tx_ctrl.v
// CXS Transmitter Controller (simplified)
// Manages ACTIVEREQ handshake + credit-gated flit dispatch
module cxs_tx_ctrl #(
  parameter FLIT_W    = 256,   // flit width: 128, 256, or 512
  parameter MAX_CRED  = 8      // maximum credits
) (
  input  wire              cxsclk,
  input  wire              cxsresetn,

  // Activation handshake
  output reg               txactivereq,
  input  wire              txactiveack,

  // Flit interface to CTL (upper layer)
  input  wire              ctl_valid,     // CTL has a flit to send
  input  wire [FLIT_W-1:0] ctl_flit,
  output reg               ctl_ready,     // can accept flit this cycle

  // CXS TX output
  output reg               txflitv,
  output reg  [FLIT_W-1:0] txflit,

  // Credit return from receiver
  input  wire              txlcrdv
);

  reg [$clog2(MAX_CRED):0] credits;
  wire link_active = txactivereq & txactiveack;
  wire can_send    = link_active & (credits > 0) & ctl_valid;

  always @(posedge cxsclk or negedge cxsresetn) begin
    if (!cxsresetn) begin
      txactivereq <= 1'b0;
      credits     <= 0;
      txflitv     <= 1'b0;
      txflit      <= '0;
      ctl_ready   <= 1'b0;
    end else begin
      // Activate link when CTL has data to send
      if (ctl_valid && !txactivereq)
        txactivereq <= 1'b1;

      // Deactivate when idle
      if (!ctl_valid && link_active && credits == MAX_CRED)
        txactivereq <= 1'b0;

      // Credit management: return +1, spend -1, both cancel
      case ({txlcrdv, (can_send && txflitv)})
        2'b10: credits <= credits + 1;   // credit returned only
        2'b01: credits <= credits - 1;   // flit sent only
        default: ; // both or neither — no change
      endcase

      // Dispatch flit when link active and credit available
      txflitv   <= can_send;
      txflit    <= can_send ? ctl_flit : '0;
      ctl_ready <= can_send;
    end
  end
endmodule

The receiver side is symmetrical — it drives RXFLITV / RXFLIT toward the transmitter, with the transmitter returning credits via RXLCRDV.

Use Cases

ApplicationWhy CXSTypical Flit Width
CPU–GPU coherencyGPU caches snoop CPU cache lines without software intervention256 / 512 bit
CPU–FPGA acceleratorFPGA logic accesses main memory with full coherency for zero-copy DMA128 / 256 bit
ML inference acceleratorsShared weight tensors in CPU L3 accessible directly by AI chip256 / 512 bit
Smart NICs (DPU)Network offload engine shares coherent memory with host application256 bit
Multi-die chipletsDie-to-die coherent link over UCIE / AIB with CCIX on top128 bit

Interview Q&A

What is CXS and how does it relate to CCIX?
CXS (CCIX Transport Interface) is the AMBA-defined link-layer interface that sits between the CCIX Transaction Layer (CTL) and the physical transport (e.g., PCIe). CCIX is the broader coherency protocol standard; CXS is the specific hardware interface through which CCIX packets travel as fixed-width flits. Without CXS, the CTL has no standardised way to hand packets to the transport.
Why does CXS use credit-based flow control instead of a ready/valid handshake?
CXS links are often long (die-to-die or chip-to-chip over PCIe), meaning round-trip latency is large. A ready/valid scheme would stall on every cycle it waits for acknowledgement. Credit-based flow control pre-loads the transmitter with N credits, allowing it to send N flits before waiting — effectively hiding the round-trip latency. The transmitter only stalls when all credits are consumed, which is rare if credit count is sized appropriately.
What are the supported CXS flit widths and when would you choose each?
CXS supports 128-bit, 256-bit, and 512-bit flit widths. 128-bit is used for area-constrained die-to-die links where routing width is limited. 256-bit is the most common — it balances bandwidth and area for typical CCIX workloads. 512-bit maximises bandwidth for high-performance CPU–GPU or CPU–ML accelerator links where peak throughput is critical. The width is fixed at design time and cannot be changed at runtime.
How does CXS handle link power management?
CXS TX and RX paths each have an independent ACTIVEREQ / ACTIVEACK handshake. When the transmitter has no more flits to send, it de-asserts ACTIVEREQ; the receiver acknowledges by de-asserting ACTIVEACK, and the link enters INACTIVE state. The underlying transport (PCIe) can then enter a low-power state. Because the two directions are independent, one path can be deactivated while the other remains active — useful for asymmetric read-heavy or write-heavy workloads.
What is the difference between CXS and AXI4?
AXI4 is a general-purpose on-chip memory-mapped bus — it carries read/write transactions with address and data channels. CXS is not memory-mapped; it is a flit-based transport that carries opaque CCIX coherency packets. AXI4 operates within a single die at low latency; CXS is designed for inter-chip or die-to-die links, tolerating higher latency via credit buffering. CXS also carries cache-coherency semantics (snoops, responses) that AXI4 does not natively support without the ACE extension.
What happens if a CXS transmitter sends a flit with no credits remaining?
This is a protocol violation — the CXS specification requires that TXFLITV must never be asserted when the transmitter's credit count is zero. A compliant transmitter must stall (hold TXFLITV LOW) until TXLCRDV is received and credits become non-zero. If the protocol is violated, the receiver's input buffer may overflow leading to flit corruption or loss, which would propagate as a coherency error to the CCIX transaction layer above.