CXS – CCIX Transport Interface
CXS (CCIX Transport Interface) is an AMBA-family link-layer interface that carries CCIX packets between a coherent transaction layer and a transport component — enabling cache-coherent communication between CPUs and accelerators over a flit-based, credit-controlled channel.
Overview & Context
CCIX (Cache Coherent Interconnect for Accelerators) is a standard for sharing coherent memory between a host CPU and heterogeneous accelerators such as GPUs, FPGAs, and ML inference chips. CXS is the AMBA link-layer interface that sits directly below the CCIX Transaction Layer (CTL), providing the physical framing and flow control needed to move CCIX packets reliably across a point-to-point link.
Think of CXS as the "PHY handoff" layer — the CTL above it generates CCIX packets; CXS wraps them into fixed-width flits and delivers them to the underlying transport (e.g., PCIe). It is not application software-visible, but every coherency transaction crosses it.
CXS defines separate TX (transmit) and RX (receive) paths, each carrying flits unidirectionally. It uses a link-credit mechanism to prevent the transmitter from overrunning the receiver's buffers, and a simple ACTIVEREQ / ACTIVEACK handshake for link-level power management.
CXS Signal Interface
CXS has two independent unidirectional paths: TX (transmitter side) and RX (receiver side). All signals are synchronous to a shared clock CXSCLK. The width of the FLIT bus is configurable at design time (128, 256, or 512 bits).
| Signal | Dir | Width | Description |
|---|---|---|---|
| CXSCLK | – | 1 | System clock. All CXS signals are sampled on the rising edge. |
| CXSRESETn | – | 1 | Active-low asynchronous reset. De-assert synchronously after clock is stable. |
| TXFLIT[N:0] | TX→RX | 128/256/512 | Flit data bus. Carries one complete CCIX flit per cycle when TXFLITV is asserted. |
| TXFLITV | TX→RX | 1 | Transmit flit valid. When HIGH, TXFLIT contains a valid flit. |
| TXLCRDV | RX→TX | 1 | Transmit link credit valid. The receiver returns one credit to the transmitter per assertion. |
| TXACTIVEREQ | TX→RX | 1 | TX requests link to enter ACTIVE state. Part of the activation handshake. |
| TXACTIVEACK | RX→TX | 1 | RX acknowledges TX activation request. Link becomes active when both match. |
| RXFLIT[N:0] | RX→TX | 128/256/512 | Receive flit data bus. Mirror of TXFLIT in the opposite direction. |
| RXFLITV | RX→TX | 1 | Receive flit valid. |
| RXLCRDV | TX→RX | 1 | Receive link credit valid. TX returns credits to RX for the reverse channel. |
| RXACTIVEREQ | RX→TX | 1 | RX requests its transmit link to enter ACTIVE state. |
| RXACTIVEACK | TX→RX | 1 | TX acknowledges RX activation request. |
Note: TX and RX paths are fully independent. Each has its own ACTIVEREQ/ACTIVEACK pair, allowing one direction to be powered down while the other remains active — useful for asymmetric workloads.
Flit Structure
A flit (flow control unit) is the atomic transfer unit on the CXS interface. Every clock cycle in which FLITV=1, exactly one flit is transferred. The flit width is fixed per link instance and must be agreed between the two endpoints at design time.
| Field | Width | Description |
|---|---|---|
| CRC | 16 bits | CRC-16 integrity check over the entire flit. Receiver validates before passing to CTL. |
| Protocol | 8 bits | Identifies the CCIX packet type (Request, Response, Snoop, Data, etc.). |
| Header | 40 bits | Contains source/target node ID, transaction ID, message class, and ordering attributes. |
| Payload | Remaining | CCIX packet body — address, data, or control information depending on packet type. |
For 128-bit and 256-bit flit widths, large CCIX packets are segmented across multiple flits and reassembled by the receiving CTL.
Link Activation States
Each CXS direction (TX and RX) independently moves through two states controlled by the ACTIVEREQ / ACTIVEACK handshake:
The transmitter drives ACTIVEREQ HIGH to request link activation. The receiver acknowledges by driving ACTIVEACK HIGH. Flit transfers are only permitted once both signals are HIGH (ACTIVE state). To deactivate, the transmitter drops ACTIVEREQ; the receiver responds by dropping ACTIVEACK.
The ACTIVEREQ/ACTIVEACK protocol supports low-power entry: when there are no pending transactions, the transmitter can drop ACTIVEREQ to shut down the link and save power, then reactivate on demand with a single handshake cycle.
Credit-Based Flow Control
CXS uses a link credit mechanism to prevent the transmitter from overrunning the receiver's input buffers. The receiver pre-loads the transmitter with a fixed number of credits at link activation. The transmitter spends one credit per flit sent and can only transmit when it holds at least one credit.
At link activation, the receiver pre-loads the transmitter with an agreed number of credits (implementation-defined, typically 4–16). Each TXLCRDV pulse returns one credit; each TXFLITV cycle spends one. When the credit count reaches zero, the transmitter must stall — it cannot assert TXFLITV until a credit is returned.
Transfer Operation
A typical CXS flit transfer sequence proceeds as follows:
- Reset: Both
TXACTIVEREQandTXACTIVEACKare LOW. No flits are sent. CXSRESETn is asserted (LOW). - De-assert reset: CXSRESETn goes HIGH. Receiver initialises its credit counters.
- Link activation: Transmitter asserts
TXACTIVEREQ=1. Receiver responds withTXACTIVEACK=1. The receiver now pushes initial credits viaTXLCRDV. - Flit transfer: Transmitter asserts
TXFLITV=1and places flit data onTXFLIT. One credit is decremented per flit. Transfers continue back-to-back as long as credits are available. - Credit return: Receiver pulses
TXLCRDV=1once per processed flit, returning a credit to the transmitter. - Link deactivation: When idle, transmitter drops
TXACTIVEREQ=0. Receiver responds withTXACTIVEACK=0. No flits may be sent while inactive.
Verilog RTL — CXS Transmitter Controller
A simplified CXS transmitter that manages link activation and credit-based flit dispatch:
// CXS Transmitter Controller (simplified) // Manages ACTIVEREQ handshake + credit-gated flit dispatch module cxs_tx_ctrl #( parameter FLIT_W = 256, // flit width: 128, 256, or 512 parameter MAX_CRED = 8 // maximum credits ) ( input wire cxsclk, input wire cxsresetn, // Activation handshake output reg txactivereq, input wire txactiveack, // Flit interface to CTL (upper layer) input wire ctl_valid, // CTL has a flit to send input wire [FLIT_W-1:0] ctl_flit, output reg ctl_ready, // can accept flit this cycle // CXS TX output output reg txflitv, output reg [FLIT_W-1:0] txflit, // Credit return from receiver input wire txlcrdv ); reg [$clog2(MAX_CRED):0] credits; wire link_active = txactivereq & txactiveack; wire can_send = link_active & (credits > 0) & ctl_valid; always @(posedge cxsclk or negedge cxsresetn) begin if (!cxsresetn) begin txactivereq <= 1'b0; credits <= 0; txflitv <= 1'b0; txflit <= '0; ctl_ready <= 1'b0; end else begin // Activate link when CTL has data to send if (ctl_valid && !txactivereq) txactivereq <= 1'b1; // Deactivate when idle if (!ctl_valid && link_active && credits == MAX_CRED) txactivereq <= 1'b0; // Credit management: return +1, spend -1, both cancel case ({txlcrdv, (can_send && txflitv)}) 2'b10: credits <= credits + 1; // credit returned only 2'b01: credits <= credits - 1; // flit sent only default: ; // both or neither — no change endcase // Dispatch flit when link active and credit available txflitv <= can_send; txflit <= can_send ? ctl_flit : '0; ctl_ready <= can_send; end end endmodule
The receiver side is symmetrical — it drives RXFLITV / RXFLIT toward the transmitter, with the transmitter returning credits via RXLCRDV.
Use Cases
| Application | Why CXS | Typical Flit Width |
|---|---|---|
| CPU–GPU coherency | GPU caches snoop CPU cache lines without software intervention | 256 / 512 bit |
| CPU–FPGA accelerator | FPGA logic accesses main memory with full coherency for zero-copy DMA | 128 / 256 bit |
| ML inference accelerators | Shared weight tensors in CPU L3 accessible directly by AI chip | 256 / 512 bit |
| Smart NICs (DPU) | Network offload engine shares coherent memory with host application | 256 bit |
| Multi-die chiplets | Die-to-die coherent link over UCIE / AIB with CCIX on top | 128 bit |