On-Chip Communication

Hardware Protocols

Signal definitions, timing diagrams, state machines, and RTL implementation for AXI4, AHB, APB, AXI-Stream, PCIe, DDR, CXL, UCIe, JTAG, I2C, SPI, and I3C.

In Depth

Why Hardware Protocols Are Core VLSI Knowledge

Modern SoCs (System-on-Chip) contain dozens of IP blocks — CPUs, GPU cores, DMA engines, memory controllers, peripherals, communication interfaces — all connected by standardized bus protocols. The protocol defines the electrical signaling, the handshake rules, the timing requirements, the error handling, and the arbitration behavior. Every chip that goes to silicon must implement these protocols correctly or the entire system fails.

The AMBA Protocol Hierarchy

ARM's AMBA (Advanced Microcontroller Bus Architecture) defines the most widely used on-chip bus protocol family. APB (Advanced Peripheral Bus) handles low-bandwidth, low-frequency peripherals like UARTs, timers, and GPIO. AHB (Advanced High-performance Bus) handles mid-speed masters like DMA engines and instruction caches. AXI4 (Advanced eXtensible Interface) handles high-bandwidth, out-of-order transfers between CPU cores, GPU, and memory controllers. AXI-Stream is a simpler streaming variant for data pipes with no address. Each protocol occupies a different bandwidth/latency tradeoff in the SoC interconnect hierarchy, and most real chips use all three simultaneously through a bus bridge that translates between them.

AXI4 VALID/READY Handshake — The Most Important Protocol Rule

AXI4's handshake mechanism is simple but has a critical rule: a master must not de-assert VALID after asserting it, until the handshake completes (READY is seen). Violating this rule causes the slave to potentially miss or corrupt the transaction. Conversely, the slave may freely assert and de-assert READY at any time — it creates no obligation. A transfer occurs exactly on the rising clock edge when both VALID and READY are simultaneously high. This asymmetry — strict VALID behavior, loose READY behavior — is fundamental to the protocol's deadlock-free properties. AXI4 also supports out-of-order transactions through transaction IDs, allowing a high-latency memory read to complete after a short peripheral write completes, improving SoC throughput.

PCIe, CXL, and UCIe — Off-Chip High-Speed Interconnects

PCI Express (PCIe) is the dominant high-speed chip-to-chip and chip-to-device interconnect in servers, workstations, and laptops. It uses serial differential pairs (lanes) at speeds from 2.5 GT/s (Gen 1) to 64 GT/s (Gen 6), with a 3-layer architecture (Physical, Data Link, Transaction). CXL (Compute Express Link) builds on PCIe Gen 5/6 PHY to add cache coherence — allowing the CPU's cache coherency protocol to extend to attached accelerators and memory devices. UCIe (Universal Chiplet Interconnect Express) goes further, standardizing the die-to-die interconnect within a multi-chiplet package, enabling different chiplets (designed by different companies, on different process nodes) to communicate at wafer-scale bandwidths.

Why Protocol Knowledge Is Critical for RTL and Verification Engineers

An RTL engineer implementing an AXI4 slave must understand the protocol well enough to handle back-pressure correctly, never de-assert VALID inappropriately, and generate the correct response on the BRESP/RRESP channel. A verification engineer writing a UVM environment must model the protocol's valid/ready handshake accurately enough to catch corner cases — like a slave that asserts READY one cycle before data is actually available. Protocol bugs — deadlocks, incorrect RESP values, wrong burst counts — are among the hardest bugs to debug in silicon because they are timing-dependent and require specific traffic patterns to manifest. Understanding the protocol at the signal level, not just the API level, is the difference between writing correct RTL and hoping the VIP catches your bugs.

AMBA Family

Protocol Articles

From low-bandwidth peripheral buses to high-speed interconnects.

APB – Advanced Peripheral Bus
Low-bandwidth, low-power bus for peripheral access. Covers signals, IDLE/SETUP/ACCESS state machine, write/read transfers, PREADY wait states, PSLVERR, and RTL implementation.
CXS – CCIX Transport Interface
Flit-based AMBA link interface for cache-coherent CPU–accelerator communication. Covers signals, flit structure, link activation, credit-based flow control, and RTL implementation.
AXI4 – Advanced eXtensible Interface
High-performance protocol with 5 independent channels, out-of-order transactions, and burst support up to 256 beats. Covers handshake, write/read timing, burst types, response codes, and AXI4-Lite RTL.
AXI4-Stream
Unidirectional streaming interface with no address channel. Covers TDATA/TVALID/TREADY handshake, TLAST packet framing, TKEEP byte qualifiers, backpressure, skid buffer RTL, and use cases in DSP, video, and networking.
I2C – Inter-Integrated Circuit
Two-wire multi-slave serial bus. Covers open-drain pull-up, START/STOP conditions, 7-bit addressing, ACK/NACK, write/read/combined transactions, clock stretching, and speed modes.
SPI – Serial Peripheral Interface
Full-duplex off-chip serial protocol. Covers SCLK/MOSI/MISO/CS signals, CPOL/CPHA modes, Mode 0 timing diagram, Dual/Quad SPI variants, and a complete Verilog master with live simulation.
AHB – Advanced High-performance Bus
Pipelined bus with 2-stage address/data overlap, burst transfers (WRAP/INCR up to 16 beats), HTRANS transfer types, wait states, ERROR response, and AHB-Lite vs full AHB comparison with Verilog RTL.
DDR & LPDDR – Memory Protocol
DDR1 through DDR5, LPDDR4/5X — double data rate concept, memory hierarchy (channel→rank→bank→row→col), timing parameters (CL, tRCD, tRP, tRAS), DDR4 vs DDR5 sub-channels, refresh, Rowhammer, and HBM explained with diagrams.
PCIe – PCI Express
The universal high-speed interconnect for GPUs, NVMe SSDs, and NICs. Covers topology (RC → Switch → Endpoint), lane widths (x1–x16), Gen 1–6 speeds, 3-layer architecture, TLP packet structure, flow control, and BARs.
CXL – Compute Express Link
Cache-coherent interconnect built on PCIe PHY. Covers CXL.io / CXL.cache / CXL.mem sub-protocols, Type 1/2/3 devices, CXL 1.0–3.1 versions, memory pooling with CXL switch, rack-scale FAM, and comparison with PCIe DMA and NVLink.
UCIe – Universal Chiplet Interconnect Express
The open standard for die-to-die connectivity. Covers chiplet motivation, 3-layer stack (Protocol/D2D Adapter/Physical), FDI/RDI interfaces, standard vs advanced package (25 µm vs 10 µm pitch), bandwidth specs, lane repair, and comparison with AIB and BoW.
JTAG & Boundary Scan (IEEE 1149.1)
The universal 4-pin test interface. Covers TAP controller 16-state FSM, TCK/TMS/TDI/TDO signals, JTAG daisy chain, boundary scan cells, EXTEST/BYPASS/IDCODE/SAMPLE instructions, BSDL files, and JTAG vs SWD vs cJTAG.
⚡ AXI4 Handshake Lab
Live VALID/READY waveform simulator. Watch write & read channel handshakes, toggle back-pressure, count stall cycles — the #1 SoC bus interview topic.
⚗ Scrambler & Descrambler Lab
Interactive LFSR scrambling with PCIe (x²³+x²¹+1), SATA, and Ethernet presets. Watch bit-by-bit animation, verify self-cancellation, and generate Verilog.
I3C – Improved Inter-Integrated Circuit
MIPI's next-generation two-wire bus. Covers push-pull vs open-drain, dynamic address assignment (ENTDAA), common command codes, in-band interrupts, hot-join, HDR-DDR mode, and comparison with I2C and SPI.
CXL Memory Expander — Type 3 Device, CXL.mem Protocol
CXL meaning and architecture, CXL Type 1/2/3 device classification, real products (Samsung CMM-D 512 GB, Micron 128 GB, SK Hynix AiMX 96 GB), PCIe Gen5 x8 ~32 GB/s, ~150–300 ns latency, CXL.mem protocol, 4-tier memory hierarchy, CXL 3.1 fabric.
Deep Dive

Protocol Verification: How Professionals Test Compliance

Writing RTL that implements a protocol is only half the work. Verifying that the implementation is compliant with the specification — across all legal sequences of VALID and READY, all error injection scenarios, all burst types, and all boundary conditions — requires a structured verification environment. Protocol verification failures are among the most expensive bugs in ASIC design because they often manifest only in specific traffic patterns that are not covered by unit-level directed tests.

UVM Protocol VIPs

A Verification IP (VIP) is a reusable SystemVerilog UVM agent that drives and monitors a protocol interface. The AXI4 VIP drives randomized burst transfers with constrained-random READY de-assertion patterns, verifies the master never illegally withdraws VALID, checks that BRESP and RRESP carry the correct response codes, and monitors out-of-order transaction ID reuse. The APB VIP verifies that PSEL is never asserted without PENABLE following after exactly one cycle in the setup phase. Using a commercial or open-source VIP eliminates weeks of hand-writing a protocol monitor from the specification and gives the verification team a monitor that has been validated against the spec independently of the DUT being tested.

Formal Protocol Checking

Formal verification tools like Cadence JasperGold and Synopsys VC Formal can prove that a protocol property holds for all possible inputs without simulation. For AXI4, a formal app encodes the VALID-must-not-de-assert rule as an SVA (SystemVerilog Assertion) property and exhaustively proves it against the RTL model. Formal is particularly powerful for protocol compliance because the property space is bounded — the number of legal AXI4 transaction types and response combinations is finite — while simulation coverage is practically infinite. Formal catches the "one-in-a-million" traffic pattern that simulation misses and that causes a functional failure at silicon bring-up.

Protocol Fuzzing and Error Injection

Beyond legal protocol traffic, a production verification environment injects illegal transactions to test the DUT's error handling. For AXI4, this includes injecting SLVERR and DECERR on BRESP/RRESP to verify the master propagates errors correctly to software. For PCIe, it includes bit-error injection at the data link layer to verify LCRC retry mechanisms. For I2C, it includes injecting clock stretching to verify the master handles slow slave responses without timing out or corrupting subsequent transactions. Error injection requires the VIP to be configurable beyond its default legal mode — a feature that commercial VIPs include but that hand-written monitors often omit, leaving error handling paths untested until silicon.

Coverage-Driven Verification

Coverage-driven verification (CDV) measures which parts of the protocol's functional space have been exercised by simulation. For AXI4, a functional coverage model defines bins for each burst type (INCR, WRAP, FIXED), each burst length (1–256 beats), each data width (8, 16, 32, 64 bits), and each response combination. The simulation harness generates constrained-random traffic until all bins are filled, at which point the protocol's functional space has been fully explored. Crossing coverage with the DUT's RTL code coverage (which lines executed) creates a two-dimensional map of tested vs untested combinations. When all functional coverage bins are hit and RTL coverage exceeds 95%, the verification team has statistical confidence that the protocol implementation is correct — a bar that directed tests alone rarely achieve.

Related Labs & Tools
Eye Diagram Lab — Signal Integrity Visualizer AXI4 Handshake Lab