UCIe is the open industry standard for die-to-die connectivity inside a package. Published in 2022 by a consortium including Intel, AMD, Arm, Qualcomm, Samsung, and TSMC, it defines the physical bumps, electrical signaling, and upper-layer protocol handshake that allow chiplets from different vendors to interoperate — the PCIe moment for the chiplet era.
Published March 2022
30+ Consortium Members
Up to 94 GB/s/mm (Advanced Pkg)
PCIe 6 + CXL 3 over D2D
Motivation
Why Chiplets? The End of the Monolithic Die
Moore's Law scaling costs are soaring and yield loss on large dies makes monolithic integration increasingly impractical for complex SoCs.
Yield Problem
Yield drops exponentially with die area. A 800 mm² monolithic GPU may yield 60%, but splitting it into two 400 mm² dies connected by UCIe can push combined yield above 85% — delivering far better wafer economics.
Process Node Mismatch
Compute cores benefit from leading-edge nodes (2 nm, 3 nm), but analog, SerDes, and memory controllers do not. Chiplets allow each function to use the optimal process — mix N3 logic with N16 analog, no compromise needed.
Reuse & Time-to-Market
A validated SRAM or PHY chiplet can be reused across multiple products. With UCIe standardization, chiplets from Chiplet IP vendor A can plug into a package designed by SoC vendor B without custom interface design — analogous to PCIe plug-and-play.
Bandwidth Wall
Off-package DRAM bandwidth (PCIe-attached) hits ~130 GB/s. On-package chiplet interconnects via UCIe achieve hundreds of GB/s with sub-pJ/bit energy — critical for AI accelerators that demand memory bandwidth in the terabytes-per-second range.
Real-world adoption: AMD's EPYC "Genoa" uses chiplets (CCDs + IOD) connected by internal Infinity Fabric. Intel's Ponte Vecchio GPU uses 47 chiplets with EMIB and Foveros. UCIe standardizes the interface so future chiplets from any vendor can interoperate.
Key Numbers
UCIe at a Glance
2022
UCIe 1.0 Published
30+
Consortium Members
16
GB/s/mm (Standard Pkg)
94
GB/s/mm (Advanced Pkg)
3
Protocol Layers (PCIe/CXL/Stream)
<1
pJ/bit (Advanced Pkg target)
System View
Chiplet Package Architecture
Multiple dies sit side-by-side on an interposer or organic substrate. UCIe bumps bridge the die-to-die gap at the package level — all within a single IC package.
Fig 1 — Three chiplets on a common package substrate, connected by UCIe die-to-die links. Each chiplet can be manufactured on a different process node.
Architecture
UCIe 3-Layer Stack
UCIe mirrors the layered philosophy of PCIe — each layer has a well-defined responsibility and a standardized interface to the layer above and below.
Fig 2 — UCIe 3-layer stack. FDI (Flit-aware DIE Interface) separates the Protocol and D2D Adapter layers; RDI (Raw DIE Interface) separates the D2D Adapter and Physical layers. These interfaces enable IP from different vendors to interoperate.
Protocol Layer
Hosts the upper-level protocol: PCIe 5.0/6.0, CXL 2.0/3.0, or a raw Streaming interface. Responsible for generating and terminating protocol packets (TLPs for PCIe, flits for CXL/PCIe 6.0). This layer is protocol-aware and talks to the D2D Adapter via the FDI.
Die-to-Die (D2D) Adapter
The intelligence of the UCIe stack. Handles link training and initialization, lane scrambling (PRBS-based), optional FEC (Reed-Solomon), retiming for clock domain crossing, cyclic redundancy check, and credit-based flow control between dies. Connects to PHY via RDI.
Physical Layer
The bump interface and analog signaling circuitry. Defines bump pitch (25 µm standard, ≤10 µm advanced), differential AC-coupled signaling, forwarded clock distribution, and the bump map layout. The PHY is the only layer that differs between Standard and Advanced packaging.
FDI & RDI Interfaces
FDI (Flit-aware DIE Interface) is the logical boundary between Protocol and D2D Adapter — passes flits and link management signals. RDI (Raw DIE Interface) is the boundary between D2D Adapter and Physical Layer. Both are standardized, enabling separate sourcing of protocol IP and PHY IP.
Packaging
Standard Package vs Advanced Package
UCIe defines two physical packaging tiers. The bump pitch dictates bandwidth density and determines which packaging technology is required.
Fig 3 — Standard Package (25 µm bump pitch) vs Advanced Package (≤10 µm). Smaller pitch means more bumps per mm, yielding ~6× higher bandwidth density. Advanced packages require silicon interposers, EMIB, or hybrid bonding technology.
Standard Package
Conventional organic substrate or leadframe
Bump pitch: 25 µm
Bandwidth density: up to 16 GB/s/mm
Max data rate: 16 GT/s per bump
Packaging: FCBGA, organic substrates
Cost: lower — uses mature packaging infra
Use cases: chiplets with moderate bandwidth needs, mainstream SoCs
Advanced Package
Silicon interposer, EMIB, or hybrid bonding
Bump pitch: ≤10 µm (hybrid bonding: ~1 µm)
Bandwidth density: up to 94 GB/s/mm
Max data rate: 32 GT/s per bump
Packaging: 2.5D Si interposer, Intel EMIB, TSMC SoIC
Use cases: AI/HPC chiplets, GPU stacking, CPU + memory on package
Protocol Support
Supported Upper-Layer Protocols
UCIe's Protocol Layer is a carrier for existing well-defined protocols, not a new one — it re-uses PCIe and CXL to minimize adoption friction.
⬡
PCIe 5.0 / 6.0
The industry's universal I/O protocol. PCIe 5 uses 128b/130b encoding (32 GT/s). PCIe 6 uses PAM4 + FLIT mode (64 GT/s). Over UCIe, PCIe traffic traverses a die-to-die link instead of a slot connector — same software stack, new physical medium.
⬡
CXL 2.0 / 3.0
Compute Express Link for cache-coherent CPU–accelerator communication. CXL.cache, CXL.mem, and CXL.io run on top of PCIe PHY. Over UCIe, AI accelerators or memory expanders can attach coherently to the CPU chiplet on the same package.
⬡
Streaming Interface
A raw, low-latency, vendor-defined protocol channel. Allows proprietary fabric (AXI streaming, Infinity Fabric, NVLink-like) to traverse a UCIe physical link. Enables custom chiplet topologies while still using standardized packaging and PHY.
Key insight: UCIe does not invent a new protocol. It wraps existing protocols (PCIe, CXL) in a standardized die-to-die physical layer. This means existing PCIe and CXL software stacks work unchanged — only the physical transport changes from a PCIe slot to a bump array on the same package.
Performance
Bandwidth & Signaling Specs
Parameter
Standard Package
Advanced Package
Bump Pitch
25 µm
≤10 µm
Max Data Rate per Bump
16 GT/s
32 GT/s
Bandwidth Density
~16 GB/s/mm
~94 GB/s/mm
Signaling
Differential, AC-coupled
Differential, AC-coupled / DC (HB)
Clock
Forwarded clock per module
Forwarded clock per module
Packaging Technology
Organic substrate, FCBGA
Silicon interposer, EMIB, SoIC, Foveros
Energy Efficiency
~2 pJ/bit typical
<1 pJ/bit target
FEC
Optional (Reed-Solomon)
Optional (Reed-Solomon)
Link Width
64-bit module × N modules
64-bit module × N modules
Latency
~2–4 ns (PHY + D2D)
~1–2 ns (PHY + D2D)
Comparison
UCIe vs Other Die-to-Die Standards
Standard
Organization
Open?
Protocol Layer
Max BW Density
Status
UCIe 1.0
UCIe Consortium
Open
PCIe 5/6, CXL 2/3, Streaming
94 GB/s/mm
Published 2022
Intel AIB
Intel (Open Domain Specific Architecture)
Partially Open
Vendor-defined
~2 TB/s/mm² (area)
ODSA licensed
BoW (Bunch of Wires)
Open Compute Project
Open
None (raw parallel)
~128 GB/s/mm
Niche adoption
HBM (High Bandwidth Memory)
JEDEC
JEDEC standard
Memory-only
~1 TB/s (per stack)
Widely deployed
NVLink-C2C
NVIDIA
Proprietary
NVLink
~900 GB/s total
NVIDIA only
Infinity Fabric (IF)
AMD
Proprietary
AMD-defined
~500 GB/s internal
AMD only
Initialization
UCIe Link Training Sequence
Before data can flow, the D2D Adapter performs a structured link initialization handshake — similar in spirit to PCIe LTSSM but optimized for the on-package environment.
Step
State
What Happens
1
Reset
Both sides hold PHY in reset; bump drivers inactive.
2
Detect
Electrical detect — verifies receiver termination present on bump pins.
3
Initialize
Clock forwarding starts; D2D adapters lock to forwarded clock.
4
Lane Repair
Optional: identify defective bump lanes (due to packaging defects) and remap around them. Critical for advanced packaging yield improvement.
Scrambling enabled; FEC (if used) activated; RDI signals to D2D Adapter that PHY is ready.
7
Protocol Active
D2D Adapter signals FDI-ready to Protocol Layer. PCIe/CXL configuration space enumeration begins over the UCIe link.
Lane Repair is a unique UCIe feature for advanced packaging. Because bumps at 10 µm pitch can have manufacturing defects, the D2D Adapter can dynamically remap around a small number of failed bumps during link training — improving yield without manual chip replacement.
Ecosystem
UCIe Consortium & Real-World Adoption
Intel
Founding member and lead contributor. Intel's EMIB (Embedded Multi-die Interconnect Bridge) and Foveros 3D stacking are UCIe-compatible packaging technologies. Ponte Vecchio GPU (Xe-HPC) uses 47 chiplets. Intel Meteor Lake (2023) is the first Intel consumer SoC with a chiplet architecture.
AMD
Founding member. AMD's EPYC "Genoa" and "Bergamo" CPUs already use chiplet architecture (CCDs + IOD) connected by Infinity Fabric. Future products are expected to migrate the inter-chiplet interface toward UCIe for multi-vendor compatibility.
Arm
Founding member. Arm is defining UCIe-compatible interfaces for future Arm Neoverse compute chiplets and Arm Total Design ecosystem. The goal is to allow semiconductor companies to build Arm-based SoCs from pre-validated UCIe chiplets.
Qualcomm
Founding member. Qualcomm's Snapdragon architecture is exploring chiplet designs with UCIe to reduce time-to-market and cost for server-class Oryon CPU chiplets and modem chiplets that can be mixed at the package level.
TSMC
Founding member providing the packaging technology. TSMC's SoIC (System on Integrated Chips) platform using chip-on-wafer bonding is a key advanced-package technology for UCIe, enabling sub-10 µm bump pitch for maximum bandwidth density.
Samsung, Google, Meta, Microsoft
All founding/early members. Cloud hyperscalers (Google, Meta, Microsoft) are driving UCIe adoption for custom AI accelerator chiplets — they can source compute chiplets from one vendor and I/O chiplets from another, assembled into a single package.
FAQ
Frequently Asked Questions
UCIe (Universal Chiplet Interconnect Express) is an open standard published in March 2022 by a consortium of 30+ companies including Intel, AMD, Arm, Qualcomm, Samsung, and TSMC. It defines a standardized die-to-die interface so chiplets from different vendors and foundries can interoperate on the same package. It matters because it enables a chiplet marketplace — just as PCIe allowed any vendor's GPU to plug into any PC, UCIe allows any vendor's compute chiplet to connect to any I/O chiplet, reducing design cost and time-to-market.
1. Protocol Layer — hosts PCIe 5/6, CXL 2/3, or a Streaming protocol. This layer is protocol-aware and generates/terminates packets. 2. Die-to-Die (D2D) Adapter — handles link training, scrambling, optional FEC, retiming, and flow control. 3. Physical Layer — manages the bump array, differential AC-coupled signaling, and forwarded clock. The FDI interface sits between Protocol and D2D Adapter; the RDI interface sits between D2D Adapter and PHY.
Choose Standard Package (25 µm pitch, up to 16 GB/s/mm) if your die-to-die bandwidth requirement is modest and you want to use conventional organic substrate packaging with mature supply chains. Choose Advanced Package (≤10 µm pitch, up to 94 GB/s/mm) if you need the highest possible bandwidth density — typically for AI/HPC accelerators, on-package DRAM-like memory chiplets, or CPU core clusters that need near-HBM bandwidth without an external memory slot. Advanced packaging requires silicon interposers (TSMC CoWoS), Intel EMIB, or hybrid bonding (TSMC SoIC), which adds cost and process complexity.
No. UCIe is a physical die-to-die transport, not a protocol. It carries PCIe and CXL traffic on top of its physical layer. Think of UCIe as defining the cable and connector, while PCIe/CXL are the communication protocol spoken over that cable. PCIe continues to be used for chip-to-board connections (slots, M.2, etc.); UCIe extends PCIe semantics to chip-to-chip within a package.
Lane Repair is a UCIe link training feature that allows the D2D Adapter to identify defective bump lanes (due to micro-bump defects at tight pitches in advanced packaging) and remap those lanes around the fault during initialization. This is critical for advanced packaging at 10 µm pitch where individual bump yield is a real concern. By tolerating a small number of defective bumps in firmware, Lane Repair significantly improves overall chiplet assembly yield.
HBM is a stacked DRAM standard (JEDEC) with a very wide parallel interface (1024 bits per stack) designed specifically for high-bandwidth memory access. UCIe is a general-purpose die-to-die interface that can carry any protocol including PCIe and CXL. They are complementary: an AI accelerator chiplet might connect to its compute partner via UCIe (carrying CXL) and to HBM memory dies via the HBM interface simultaneously — different interfaces for different roles on the same package.
FDI stands for Flit-aware DIE Interface. It is the standardized logical boundary between the Protocol Layer and the Die-to-Die Adapter in UCIe. FDI carries protocol flits (fixed-size data units used by PCIe 6.0 and CXL 3.0) and link management control signals between the two layers. Because FDI is standardized, a company can independently source a PCIe 6 Protocol Layer IP core from one vendor and a UCIe D2D Adapter IP from another vendor, and they will interoperate via FDI without custom integration work.