Chiplet Interconnect Standard

UCIe – Universal Chiplet
Interconnect Express

UCIe is the open industry standard for die-to-die connectivity inside a package. Published in 2022 by a consortium including Intel, AMD, Arm, Qualcomm, Samsung, and TSMC, it defines the physical bumps, electrical signaling, and upper-layer protocol handshake that allow chiplets from different vendors to interoperate — the PCIe moment for the chiplet era.

Published March 2022
30+ Consortium Members
Up to 94 GB/s/mm (Advanced Pkg)
PCIe 6 + CXL 3 over D2D
Motivation

Why Chiplets? The End of the Monolithic Die

Moore's Law scaling costs are soaring and yield loss on large dies makes monolithic integration increasingly impractical for complex SoCs.

Yield Problem
Yield drops exponentially with die area. A 800 mm² monolithic GPU may yield 60%, but splitting it into two 400 mm² dies connected by UCIe can push combined yield above 85% — delivering far better wafer economics.
Process Node Mismatch
Compute cores benefit from leading-edge nodes (2 nm, 3 nm), but analog, SerDes, and memory controllers do not. Chiplets allow each function to use the optimal process — mix N3 logic with N16 analog, no compromise needed.
Reuse & Time-to-Market
A validated SRAM or PHY chiplet can be reused across multiple products. With UCIe standardization, chiplets from Chiplet IP vendor A can plug into a package designed by SoC vendor B without custom interface design — analogous to PCIe plug-and-play.
Bandwidth Wall
Off-package DRAM bandwidth (PCIe-attached) hits ~130 GB/s. On-package chiplet interconnects via UCIe achieve hundreds of GB/s with sub-pJ/bit energy — critical for AI accelerators that demand memory bandwidth in the terabytes-per-second range.
Real-world adoption: AMD's EPYC "Genoa" uses chiplets (CCDs + IOD) connected by internal Infinity Fabric. Intel's Ponte Vecchio GPU uses 47 chiplets with EMIB and Foveros. UCIe standardizes the interface so future chiplets from any vendor can interoperate.
Key Numbers

UCIe at a Glance

2022
UCIe 1.0 Published
30+
Consortium Members
16
GB/s/mm (Standard Pkg)
94
GB/s/mm (Advanced Pkg)
3
Protocol Layers (PCIe/CXL/Stream)
<1
pJ/bit (Advanced Pkg target)
System View

Chiplet Package Architecture

Multiple dies sit side-by-side on an interposer or organic substrate. UCIe bumps bridge the die-to-die gap at the package level — all within a single IC package.

Package Substrate / Interposer CPU Compute Chiplet A 3 nm Process UCIe PHY bumps I/O & SerDes Chiplet B 7 nm Process UCIe PHY bumps AI Accelerator Chiplet C 5 nm Process UCIe Link UCIe Link BGA pads (package-to-PCB interface)

Fig 1 — Three chiplets on a common package substrate, connected by UCIe die-to-die links. Each chiplet can be manufactured on a different process node.

Architecture

UCIe 3-Layer Stack

UCIe mirrors the layered philosophy of PCIe — each layer has a well-defined responsibility and a standardized interface to the layer above and below.

Die A (Initiator) Protocol Layer PCIe 5/6 · CXL 2/3 · Streaming Generates / terminates protocol packets FDI Flit-aware DIE Interface Die-to-Die (D2D) Adapter Link Training · Scrambling · FEC Retiming · Error Detection · Flow Control RDI Raw DIE Interface Physical Layer Bump Array · Clocking · AC Coupling 25 µm (Std) or 10 µm (Advanced) pitch Die B (Target) Protocol Layer PCIe 5/6 · CXL 2/3 · Streaming Terminates / generates protocol responses Die-to-Die (D2D) Adapter Link Training · Scrambling · FEC Retiming · Error Detection · Flow Control Physical Layer Bump Array · Clocking · AC Coupling Mirror bump pattern of Die A TX → Bump Array

Fig 2 — UCIe 3-layer stack. FDI (Flit-aware DIE Interface) separates the Protocol and D2D Adapter layers; RDI (Raw DIE Interface) separates the D2D Adapter and Physical layers. These interfaces enable IP from different vendors to interoperate.

Protocol Layer
Hosts the upper-level protocol: PCIe 5.0/6.0, CXL 2.0/3.0, or a raw Streaming interface. Responsible for generating and terminating protocol packets (TLPs for PCIe, flits for CXL/PCIe 6.0). This layer is protocol-aware and talks to the D2D Adapter via the FDI.
Die-to-Die (D2D) Adapter
The intelligence of the UCIe stack. Handles link training and initialization, lane scrambling (PRBS-based), optional FEC (Reed-Solomon), retiming for clock domain crossing, cyclic redundancy check, and credit-based flow control between dies. Connects to PHY via RDI.
Physical Layer
The bump interface and analog signaling circuitry. Defines bump pitch (25 µm standard, ≤10 µm advanced), differential AC-coupled signaling, forwarded clock distribution, and the bump map layout. The PHY is the only layer that differs between Standard and Advanced packaging.
FDI & RDI Interfaces
FDI (Flit-aware DIE Interface) is the logical boundary between Protocol and D2D Adapter — passes flits and link management signals. RDI (Raw DIE Interface) is the boundary between D2D Adapter and Physical Layer. Both are standardized, enabling separate sourcing of protocol IP and PHY IP.
Packaging

Standard Package vs Advanced Package

UCIe defines two physical packaging tiers. The bump pitch dictates bandwidth density and determines which packaging technology is required.

Standard Package — 25 µm pitch Die A Die B 25 µm pitch Organic Substrate / Leadframe 16 GB/s/mm bandwidth density Advanced Package — ≤10 µm pitch Die A Die B 10 µm pitch Silicon Interposer / EMIB / Hybrid Bond 94 GB/s/mm bandwidth density (~6× higher)

Fig 3 — Standard Package (25 µm bump pitch) vs Advanced Package (≤10 µm). Smaller pitch means more bumps per mm, yielding ~6× higher bandwidth density. Advanced packages require silicon interposers, EMIB, or hybrid bonding technology.

Standard Package
Conventional organic substrate or leadframe
  • Bump pitch: 25 µm
  • Bandwidth density: up to 16 GB/s/mm
  • Max data rate: 16 GT/s per bump
  • Packaging: FCBGA, organic substrates
  • Cost: lower — uses mature packaging infra
  • Use cases: chiplets with moderate bandwidth needs, mainstream SoCs
Advanced Package
Silicon interposer, EMIB, or hybrid bonding
  • Bump pitch: ≤10 µm (hybrid bonding: ~1 µm)
  • Bandwidth density: up to 94 GB/s/mm
  • Max data rate: 32 GT/s per bump
  • Packaging: 2.5D Si interposer, Intel EMIB, TSMC SoIC
  • Cost: higher — requires advanced foundry packaging
  • Use cases: AI/HPC chiplets, GPU stacking, CPU + memory on package
Protocol Support

Supported Upper-Layer Protocols

UCIe's Protocol Layer is a carrier for existing well-defined protocols, not a new one — it re-uses PCIe and CXL to minimize adoption friction.

PCIe 5.0 / 6.0
The industry's universal I/O protocol. PCIe 5 uses 128b/130b encoding (32 GT/s). PCIe 6 uses PAM4 + FLIT mode (64 GT/s). Over UCIe, PCIe traffic traverses a die-to-die link instead of a slot connector — same software stack, new physical medium.
CXL 2.0 / 3.0
Compute Express Link for cache-coherent CPU–accelerator communication. CXL.cache, CXL.mem, and CXL.io run on top of PCIe PHY. Over UCIe, AI accelerators or memory expanders can attach coherently to the CPU chiplet on the same package.
Streaming Interface
A raw, low-latency, vendor-defined protocol channel. Allows proprietary fabric (AXI streaming, Infinity Fabric, NVLink-like) to traverse a UCIe physical link. Enables custom chiplet topologies while still using standardized packaging and PHY.
Key insight: UCIe does not invent a new protocol. It wraps existing protocols (PCIe, CXL) in a standardized die-to-die physical layer. This means existing PCIe and CXL software stacks work unchanged — only the physical transport changes from a PCIe slot to a bump array on the same package.
Performance

Bandwidth & Signaling Specs

ParameterStandard PackageAdvanced Package
Bump Pitch25 µm≤10 µm
Max Data Rate per Bump16 GT/s32 GT/s
Bandwidth Density~16 GB/s/mm~94 GB/s/mm
SignalingDifferential, AC-coupledDifferential, AC-coupled / DC (HB)
ClockForwarded clock per moduleForwarded clock per module
Packaging TechnologyOrganic substrate, FCBGASilicon interposer, EMIB, SoIC, Foveros
Energy Efficiency~2 pJ/bit typical<1 pJ/bit target
FECOptional (Reed-Solomon)Optional (Reed-Solomon)
Link Width64-bit module × N modules64-bit module × N modules
Latency~2–4 ns (PHY + D2D)~1–2 ns (PHY + D2D)
Comparison

UCIe vs Other Die-to-Die Standards

StandardOrganizationOpen?Protocol LayerMax BW DensityStatus
UCIe 1.0 UCIe Consortium Open PCIe 5/6, CXL 2/3, Streaming 94 GB/s/mm Published 2022
Intel AIB Intel (Open Domain Specific Architecture) Partially Open Vendor-defined ~2 TB/s/mm² (area) ODSA licensed
BoW (Bunch of Wires) Open Compute Project Open None (raw parallel) ~128 GB/s/mm Niche adoption
HBM (High Bandwidth Memory) JEDEC JEDEC standard Memory-only ~1 TB/s (per stack) Widely deployed
NVLink-C2C NVIDIA Proprietary NVLink ~900 GB/s total NVIDIA only
Infinity Fabric (IF) AMD Proprietary AMD-defined ~500 GB/s internal AMD only
Initialization

UCIe Link Training Sequence

Before data can flow, the D2D Adapter performs a structured link initialization handshake — similar in spirit to PCIe LTSSM but optimized for the on-package environment.

StepStateWhat Happens
1ResetBoth sides hold PHY in reset; bump drivers inactive.
2DetectElectrical detect — verifies receiver termination present on bump pins.
3InitializeClock forwarding starts; D2D adapters lock to forwarded clock.
4Lane RepairOptional: identify defective bump lanes (due to packaging defects) and remap around them. Critical for advanced packaging yield improvement.
5Data CalibrationEye diagram scan; per-lane DFE/CTLE adjustments; PRBS bit-error-rate check.
6Link UpScrambling enabled; FEC (if used) activated; RDI signals to D2D Adapter that PHY is ready.
7Protocol ActiveD2D Adapter signals FDI-ready to Protocol Layer. PCIe/CXL configuration space enumeration begins over the UCIe link.
Lane Repair is a unique UCIe feature for advanced packaging. Because bumps at 10 µm pitch can have manufacturing defects, the D2D Adapter can dynamically remap around a small number of failed bumps during link training — improving yield without manual chip replacement.
Ecosystem

UCIe Consortium & Real-World Adoption

Intel
Founding member and lead contributor. Intel's EMIB (Embedded Multi-die Interconnect Bridge) and Foveros 3D stacking are UCIe-compatible packaging technologies. Ponte Vecchio GPU (Xe-HPC) uses 47 chiplets. Intel Meteor Lake (2023) is the first Intel consumer SoC with a chiplet architecture.
AMD
Founding member. AMD's EPYC "Genoa" and "Bergamo" CPUs already use chiplet architecture (CCDs + IOD) connected by Infinity Fabric. Future products are expected to migrate the inter-chiplet interface toward UCIe for multi-vendor compatibility.
Arm
Founding member. Arm is defining UCIe-compatible interfaces for future Arm Neoverse compute chiplets and Arm Total Design ecosystem. The goal is to allow semiconductor companies to build Arm-based SoCs from pre-validated UCIe chiplets.
Qualcomm
Founding member. Qualcomm's Snapdragon architecture is exploring chiplet designs with UCIe to reduce time-to-market and cost for server-class Oryon CPU chiplets and modem chiplets that can be mixed at the package level.
TSMC
Founding member providing the packaging technology. TSMC's SoIC (System on Integrated Chips) platform using chip-on-wafer bonding is a key advanced-package technology for UCIe, enabling sub-10 µm bump pitch for maximum bandwidth density.
Samsung, Google, Meta, Microsoft
All founding/early members. Cloud hyperscalers (Google, Meta, Microsoft) are driving UCIe adoption for custom AI accelerator chiplets — they can source compute chiplets from one vendor and I/O chiplets from another, assembled into a single package.
FAQ

Frequently Asked Questions

UCIe (Universal Chiplet Interconnect Express) is an open standard published in March 2022 by a consortium of 30+ companies including Intel, AMD, Arm, Qualcomm, Samsung, and TSMC. It defines a standardized die-to-die interface so chiplets from different vendors and foundries can interoperate on the same package. It matters because it enables a chiplet marketplace — just as PCIe allowed any vendor's GPU to plug into any PC, UCIe allows any vendor's compute chiplet to connect to any I/O chiplet, reducing design cost and time-to-market.

1. Protocol Layer — hosts PCIe 5/6, CXL 2/3, or a Streaming protocol. This layer is protocol-aware and generates/terminates packets. 2. Die-to-Die (D2D) Adapter — handles link training, scrambling, optional FEC, retiming, and flow control. 3. Physical Layer — manages the bump array, differential AC-coupled signaling, and forwarded clock. The FDI interface sits between Protocol and D2D Adapter; the RDI interface sits between D2D Adapter and PHY.

Choose Standard Package (25 µm pitch, up to 16 GB/s/mm) if your die-to-die bandwidth requirement is modest and you want to use conventional organic substrate packaging with mature supply chains. Choose Advanced Package (≤10 µm pitch, up to 94 GB/s/mm) if you need the highest possible bandwidth density — typically for AI/HPC accelerators, on-package DRAM-like memory chiplets, or CPU core clusters that need near-HBM bandwidth without an external memory slot. Advanced packaging requires silicon interposers (TSMC CoWoS), Intel EMIB, or hybrid bonding (TSMC SoIC), which adds cost and process complexity.

No. UCIe is a physical die-to-die transport, not a protocol. It carries PCIe and CXL traffic on top of its physical layer. Think of UCIe as defining the cable and connector, while PCIe/CXL are the communication protocol spoken over that cable. PCIe continues to be used for chip-to-board connections (slots, M.2, etc.); UCIe extends PCIe semantics to chip-to-chip within a package.

Lane Repair is a UCIe link training feature that allows the D2D Adapter to identify defective bump lanes (due to micro-bump defects at tight pitches in advanced packaging) and remap those lanes around the fault during initialization. This is critical for advanced packaging at 10 µm pitch where individual bump yield is a real concern. By tolerating a small number of defective bumps in firmware, Lane Repair significantly improves overall chiplet assembly yield.

HBM is a stacked DRAM standard (JEDEC) with a very wide parallel interface (1024 bits per stack) designed specifically for high-bandwidth memory access. UCIe is a general-purpose die-to-die interface that can carry any protocol including PCIe and CXL. They are complementary: an AI accelerator chiplet might connect to its compute partner via UCIe (carrying CXL) and to HBM memory dies via the HBM interface simultaneously — different interfaces for different roles on the same package.

FDI stands for Flit-aware DIE Interface. It is the standardized logical boundary between the Protocol Layer and the Die-to-Die Adapter in UCIe. FDI carries protocol flits (fixed-size data units used by PCIe 6.0 and CXL 3.0) and link management control signals between the two layers. Because FDI is standardized, a company can independently source a PCIe 6 Protocol Layer IP core from one vendor and a UCIe D2D Adapter IP from another vendor, and they will interoperate via FDI without custom integration work.