HomeSoft IPHBM3 Controller
⚡ Live Build · Soft IP Series

Build HBM3 Controller
with Claude

A complete, production-quality High Bandwidth Memory 3 controller built from scratch in Verilog — with AI collaboration, full documentation, and a SystemVerilog testbench for every module.

📋 View Full Roadmap 🚀 Start Phase 1
18
Modules
5
Phases
819
GB/s Bandwidth
16
Pseudo-Channels
100%
Free & Open
Standard: JEDEC JESD238 (HBM3)
Speed: 8 Gbps/pin (HBM3) · 9.6 Gbps (HBM3E)
Width: 1024-bit per stack
Channels: 8 channels × 2 pseudo-channels = 16 PC
Banks: 32 banks per pseudo-channel (8BG × 4B)
Host IF: AXI4 (synthesizable)
PHY: Behavioral model (standard practice)
ECC: SECDED per pseudo-channel
Architecture
HBM3 Controller — Block Diagram
Full stack from AXI4 host interface down to the HBM3 DRAM stack. PHY is a behavioral model; all other blocks are synthesizable Verilog.
AXI4 Host Interface AW / W / B / AR / R channels Request Scheduler & Reorder Buffer priority · bank interleaving · QoS · open/closed page policy Address Mapper Row / Col / BG / Bank / PC / Stack decode Pseudo-Channel Controller × 16 (PC0 – PC15) per-PC: Bank/BG FSM · Row Activation · tRCD/tRAS/tRP/tCL/tWL state machine · page tracking DRAM Timing FSM tRCD·tRAS·tRP·tCL tWL·tRFC·tREFI Bank & BG FSM 8 BG × 4 Banks open / closed state Refresh Controller ABR · PBR (per-bank) tREFI · tRFC Temp Monitor TUF flag · throttle TEMPCO · derate Power Management self-refresh · power-down entry / exit FSM Write Data Path write buffer · DQ/DQS burst align write leveling model · BL8/BL16 ECC Engine (SECDED) single-bit correct · double-bit detect syndrome generation · correction logic Read Data Path read FIFO · DQS capture model CL/CWL latency · data alignment CA Bus Controller Command/Address encoding · parity · pseudo-channel select · CA training model PHY Interface Model (behavioral — not synthesizable) SerDes model · DLL/PLL model · per-pin training · DQ/DQS/CA micro-bump abstraction HBM3 DRAM Stack 8 DRAM dies stacked · 2.5D interposer · micro-bumps · 8 Gbps/pin Synthesizable RTL Behavioral model
Why HBM3?
The Memory at the Heart of Every AI Chip
H100, MI300X, Gaudi3, TPU v5 — they all run on HBM. Understanding its controller is foundational to AI hardware design.
🤖
AI Training & Inference
NVIDIA H100 has 6 HBM3 stacks = 80GB, 3.35 TB/s. AMD MI300X has 8 stacks = 192GB. The memory controller is the bottleneck to GPU performance.
819 GB/s Per Stack
HBM3 at 8 Gbps/pin × 1024 bits = 819 GB/s. Compare to DDR5 at ~89 GB/s per channel. That's 9× more bandwidth on a single stack.
🔬
No Free RTL Exists
There is no open-source, fully documented HBM3 controller in Verilog anywhere. This build puts EcrioniX in completely uncharted territory.
📐
Chiplet Architecture
HBM sits on a silicon interposer alongside the compute die. Understanding HBM controllers is essential for UCIe and advanced packaging roles.
🏦
Industry Job Demand
Memory controller design roles at NVIDIA, AMD, Micron, SK Hynix, Samsung command $180–280K TC. Almost no engineers specialise in this.
🎓
Deep Learning Opportunity
Building this controller teaches DRAM timing, schedulers, ECC, AXI4, PHY abstraction, and power management — all in one project.
Full Roadmap
5 Phases · 18 Modules · Built Live
Each module = dedicated page with Verilog source, port table, waveform diagram, SystemVerilog testbench, and FAQ. Built one at a time with full documentation.
P1
Phase 1 — Foundation
Core DRAM timing and control logic · Start here to understand the heartbeat of HBM3
🔨 Building Now
📋
Hub Page — Architecture & Spec
Full block diagram, HBM3 spec summary, module index, JEDEC parameter table
✅ Done — You're here
🔄
Module 3 — Refresh Controller
All-bank refresh (ABR) and per-bank refresh (PBR). tREFI countdown, tRFC blocking window.
⏳ Phase 1
📍
Module 4 — Row Activation & Page Policy
Open-page, closed-page, and adaptive page policy FSM. Row hit/miss/conflict detection.
⏳ Phase 1
P2
Phase 2 — Data Path
Write path · Read path · ECC · Address decode · How data actually flows through the controller
⏳ Upcoming
✍️
Module 5 — Write Data Path
Write buffer, DQ/DQS burst alignment, BL8/BL16 burst control, write leveling model.
⏳ Phase 2
📖
Module 6 — Read Data Path
Read FIFO, DQS capture model, CL/CWL latency pipeline, data re-alignment.
⏳ Phase 2
🛡️
Module 7 — ECC Engine (SECDED)
Single-bit error correction, double-bit error detection. Syndrome generation and correction logic per pseudo-channel.
⏳ Phase 2
🗺️
Module 8 — Address Mapper
Decode AXI4 address into Row / Column / Bank Group / Bank / Pseudo-Channel / Stack.
⏳ Phase 2
P3
Phase 3 — Intelligence
Scheduler · Pseudo-channel arbiter · CA bus · Where performance is won or lost
⏳ Upcoming
🧠
Module 9 — Request Scheduler
Reorder buffer, bank-interleaving, priority queue, FR-FCFS policy, QoS enforcement.
⏳ Phase 3
🔀
Module 10 — Pseudo-Channel Controller
16 independent pseudo-channels, each with its own bank/row state, timing constraints, and command queue.
⏳ Phase 3
📡
Module 11 — CA Bus Controller
Command/Address bus encoding, parity generation, pseudo-channel select, CA training model.
⏳ Phase 3
🔌
Module 12 — AXI4 Host Interface
Full AXI4 slave: AW/W/B/AR/R channels, burst support, outstanding transaction tracking, ID mapping.
⏳ Phase 3
P4
Phase 4 — PHY, Power & Temperature
PHY behavioral model · Power management · Thermal throttling · Bringing it all together
⏳ Upcoming
🔌
Module 13 — PHY Interface Model
Behavioral SerDes, DLL/PLL model, per-pin training abstraction, DQ/DQS/CA micro-bump interface.
⏳ Phase 4
🌡️
Module 14 — Temperature Monitor & Throttle
TUF (Temperature Update Flag) polling, throttle FSM, TEMPCO derating, emergency self-refresh trigger.
⏳ Phase 4
💤
Module 15 — Power Management
Self-refresh entry/exit FSM, power-down mode, clock gating, idle detection, CKE control.
⏳ Phase 4
P5
Phase 5 — Verification & Integration
Full SV testbench · HBM3 DRAM behavioral model · Assertions · Coverage · Simulate the whole thing
⏳ Upcoming
🧪
Module 16 — HBM3 DRAM Memory Model
Full behavioral DRAM model: timing checks, bank state, data storage, error injection for ECC validation.
⏳ Phase 5
Module 17 — SV Testbench & Verification
Self-checking testbench, constrained random traffic, SVA protocol assertions, functional coverage report.
⏳ Phase 5
📊
Module 18 — Integration & Benchmarks
Full controller integration, bandwidth utilisation benchmark, roofline analysis, final download package.
⏳ Phase 5
Tech Stack
What We're Building With
⚙️
Verilog (RTL)
Synthesizable RTL for all controller modules. Parametrised, clean, Quartus/Vivado compatible.
🧪
SystemVerilog (TB)
Self-checking testbenches, SVA assertions, constrained random stimulus, functional coverage.
🤖
Claude (AI Pair)
Every module co-developed with Claude — RTL, testbench, documentation, and review.
📐
JEDEC JESD238
HBM3 standard specification. All timing parameters, command encoding, and CA protocol from spec.
🔁
AXI4 Protocol
Industry-standard host interface. Compatible with any AXI4 master (ARM Cortex, RISC-V SoC).
🔬
Questa / VCS / Icarus
All testbenches validated with open Icarus Verilog. Also tested on Questa and Synopsys VCS.
JEDEC Spec Reference
HBM3 Key Timing Parameters
All values from JEDEC JESD238. These drive the timing FSM in Module 1.
Parameter Symbol Min What It Controls
RAS to CAS DelaytRCD14 nsTime from ACT to first RD/WR
Row Active TimetRAS32 nsMinimum time row stays open
Precharge TimetRP14 nsPRE to next ACT on same bank
CAS Read LatencyCL35 nsRD command to first data bit
CAS Write LatencyCWL18 nsWR command to first data
Refresh IntervaltREFI3.9 µsMax time between refresh commands
Refresh Cycle TimetRFC220 nsDuration of a refresh operation
Row Cycle TimetRC46 nsACT to ACT on same bank (tRAS+tRP)
Phase 1 starts now — DRAM Timing FSM
The timing FSM is the heartbeat of every DRAM controller. Module 1 covers all JEDEC timing parameters in synthesizable Verilog with a complete SV testbench.
🚀 Start Module 1 — Timing FSM 📚 All Courses
FAQ
Common Questions
Do I need prior DRAM knowledge to follow this?

No. Each module starts from fundamentals. Module 1 explains what tRCD, tRAS, tRP mean physically before writing a single line of Verilog. If you know basic digital design (FSMs, flip-flops, synchronous logic) you can follow along.

Can this RTL actually be taped out or used in FPGA?

The controller RTL (Modules 1–15, excluding the PHY behavioral model) is fully synthesizable. It can be used with a real HBM3 PHY hardmacro from the foundry. On FPGA you'd pair it with an HBM3 soft PHY available on UltraScale+ HBM devices (Xilinx provides the PHY hardblock).

Why not DDR5 instead of HBM3?

DDR5 controllers are well-documented and several open-source implementations exist. HBM3 has no good free RTL reference, the industry demand is exploding (every AI chip uses it), and the pseudo-channel architecture makes it architecturally richer to build and understand.

How long will the full build take?

We're building it live — one module at a time. Phase 1 (4 modules) is the starting point. Each module is a self-contained page with full RTL + testbench. Follow along, bookmark, and build your own version alongside us.