RISC-V · Hardware Accelerators · SoC Design

RISC-V +
Custom Accelerator Design

Learn to design hardware accelerators and wire them into a real RISC-V processor. RoCC coprocessor interface, AXI4 memory-mapped engines, custom ISA extensions, systolic arrays, neural network inference, DMA, and a complete RISC-V AI SoC from RTL to FPGA.

You'll build: Systolic Array RoCC Coprocessor AXI4 Accelerator NN Inference Engine Full RISC-V SoC
RISC-V Core RV64GC Rocket / CVA6 RoCC Accel Tightly coupled Custom instructions RoCC bus AXI4 Accel Memory-mapped MMIO registers AXI4-Lite Systolic Array NxN MAC array Matrix multiply NN Inference INT8 quantized Conv / Linear layers DMA Engine Bulk data transfer Scatter-gather SRAM / DDR Shared memory Weights + activations C Driver Bare-metal SW Full RISC-V AI SoC: Core → RoCC/AXI4 → Systolic Array → NN Engine → DMA → Memory → C Driver
Course architecture: integrate systolic array, neural network engine, and DMA into a complete RISC-V SoC

All EcrioniX Courses

Jump to any course

15-Day Hands-On Course
RISC-V + Accelerator — Day by Day
🗺️
Day 1
Architecture Overview — 4 Ways to Add an Accelerator
MMIO vs RoCC vs AXI4 vs custom ISA — the complete tradeoff map. Block diagrams for all 4 models, latency/bandwidth comparison, real examples from Rocket+Hwacha, SiFive, CVA6.
RoCCAXI4MMIOCustom ISA
Start Day 1 →
🔤
Day 2
RISC-V Custom ISA Extension
Custom-0 to custom-3 opcode spaces, R-type encoding (funct7+funct3), GCC .insn directive, C intrinsics with inline assembly, Verilog decode logic, and testbench.
Custom Opcode.insn DirectiveVerilog Decode
Study Day 2 →
🔌
Day 3
RoCC Interface Deep Dive
Every RoCC signal explained, valid/ready handshake timing diagram, optional memory channel, pipelined MAC coprocessor in Verilog, Chipyard LazyRoCC integration, and protocol testbench.
RoCC ProtocolHandshakeChipyard
Study Day 3 →
🧮
Day 4
Systolic Array Design
Weight-stationary PE cell in Verilog, 2×2 systolic array, dataflow comparison table (output/weight/input-stationary), latency formula (3N−1), fill/drain waveform, testbench.
Systolic ArrayPE CellDataflow
Study Day 4 →
⚙️
Day 5
Systolic Array via RoCC
funct7 command decode, rocc_systolic FSM Verilog, Chipyard LazyRoCC Scala, C driver with .insn r 0x0B macros for load_weight and run_matmul.
RoCCChipyardC Driver
Study Day 5 →
🗺️
Day 6
Memory Architecture & Cache Coherency
Scratchpad vs cache trade-offs, AXI4 DMA engine Verilog, double-buffering in C, RISC-V Zicbom cache flush/inval, roofline model analysis.
ScratchpadDMAZicbom
Study Day 6 →
🚚
Day 7
AXI4 Integration & Tiling Strategy
AXI4-Lite slave register block Verilog, 8-register MMIO map, tiling algorithm for large matrices, MMIO C driver with cache flush, interrupt vs polling.
AXI4-LiteMMIOTiling
Study Day 7 →
📊
Day 8
Performance Profiling & Speedup Analysis
RISC-V CSR counters (mcycle/minstret/mhpmcounter), hardware event counter Verilog, compute/DMA/stall utilisation, Amdahl's Law, C benchmark framework.
CSR CountersSpeedupRoofline
Study Day 8 →
💻
Day 9
Bare-Metal C Software Driver
volatile MMIO macros, cache_flush/inval, polling vs interrupt-driven completion, DMA buffer alignment, timeout using time CSR, and full inference API.
volatile MMIOInterruptCache Flush
Study Day 9 →
🏗️
Day 10
Full RISC-V SoC Integration
AXI4 crossbar address map, SRAM/UART/PLIC integration, boot ROM address 0, boot.S assembly reset sequence, mtvec and interrupt enable.
AXI4 CrossbarBoot ROMPLIC
Study Day 10 →
Day 11
Performance Optimisation — Tiling & Double-Buffering
Loop tiling data reuse, double-buffered C tiling loop, INT8 vs FP32 comparison table, roofline-guided optimisation, compute vs memory bottleneck.
Double-BufferINT8Roofline
Study Day 11 →
🔍
Day 12
Verification — SV Testbench, UVM & SVA
Self-checking SV testbench with reference model, SVA properties for AXI4 protocol, functional coverage, formal verification for FSM correctness.
SVAUVM ScoreboardFormal
Study Day 12 →
🔧
Day 13
FPGA Implementation on Arty A7
Resource budget, XDC constraint file, Vivado TCL build script, block RAM initialisation, ILA debug, timing closure, UART boot loader.
VivadoXDCILA Debug
Study Day 13 →
🏭
Day 14
Physical Design — Floorplan & Power Domains
SoC floorplan strategy, two power domains (UPF), isolation cells, CTS for systolic array, accumulator critical path, IR drop, OCV derate.
FloorplanUPFCTS
Study Day 14 →
🎓
Day 15 · CAPSTONE
RISC-V AI SoC — End-to-End Inference
Complete 3-layer INT8 MLP inference. Benchmark: 153× speedup over CPU. Career path for accelerator design. RISC-V AI SoC course complete.
INT8 MLP153× SpeedupCareer Path
Complete the Course →