What is an AXI4 crossbar in a RISC-V SoC?

An AXI4 crossbar (also called an interconnect or fabric) connects multiple AXI masters (CPU instruction fetch, CPU data, DMA engine) to multiple AXI slaves (SRAM, accelerator MMIO, UART, PLIC). The crossbar decodes the address of each transaction to route it to the correct slave, and arbitrates when multiple masters want to access the same slave simultaneously. A simple priority-based round-robin arbiter is common. The crossbar is the central communication backbone of the SoC — all data movement goes through it.

RISC-V Accelerator Day 10 — Full SoC Integration: AXI4 Crossbar, Address Map & Boot ROM

SoC Address Map

Every SoC component lives at a unique, non-overlapping address range. The AXI4 crossbar decodes the upper bits of each transaction address to route it to the correct slave.

Base Address	Size	Peripheral	Masters
0x0000_0000	64 KB	Boot ROM (read-only)	CPU Instr
0x2000_0000	256 KB	SRAM (data + stack)	CPU Data, DMA
0x4000_0000	64 KB	CLINT (timer + IPI)	CPU Data
0x4C00_0000	64 MB	PLIC (interrupt controller)	CPU Data
0x5000_0000	4 KB	UART 16550	CPU Data
0x6000_0000	4 KB	Accelerator MMIO (AXI4-Lite slave)	CPU Data
0x8000_0000	512 MB	DDR DRAM (external)	CPU, DMA

Verilog — SoC top-level (simplified)

module riscv_accel_soc (
  input  clk, rst_n,
  output uart_tx, input uart_rx
);
  // AXI4 bus wires — master ports
  wire [31:0] cpu_imem_addr, cpu_dmem_addr, dma_addr;
  // ... (AWVALID/AWREADY/WVALID/WREADY/BVALID/ARVALID/RVALID etc.)

  // ── RISC-V Core ──────────────────────────────────────────
  picorv32_axi cpu (
    .clk(clk), .resetn(rst_n),
    .mem_axi_awvalid(cpu_aw_v), .mem_axi_awready(cpu_aw_r),
    .mem_axi_awaddr(cpu_dmem_addr),
    // ... connect all AXI ports
  );

  // ── AXI4 Crossbar (2 masters × 5 slaves) ─────────────────
  axi4_crossbar #(.NM(2), .NS(5),
    .SLAVE_ADDR({ 32'h60000000, 32'h50000000, 32'h4C000000, 32'h20000000, 32'h00000000 }),
    .SLAVE_MASK({ 32'hFFFF_F000, 32'hFFFF_F000, 32'hFC00_0000, 32'hFFFC_0000, 32'hFFFF_0000 })
  ) xbar (
    .clk(clk), .rst(~rst_n),
    // Masters: CPU data + DMA
    .s_axi_awaddr ({dma_addr, cpu_dmem_addr}),
    // Slaves: Boot ROM, SRAM, PLIC, UART, Accelerator
    // ... connect slave ports
  );

  // ── SRAM ─────────────────────────────────────────────────
  axi4_sram #(.DEPTH(65536)) sram (.clk(clk), ./* ports */);

  // ── UART ─────────────────────────────────────────────────
  uart_16550 uart (.clk(clk), .tx(uart_tx), .rx(uart_rx), ./* AXI */);

  // ── PLIC ─────────────────────────────────────────────────
  plic #(.N_SRC(4)) plic_inst (.clk(clk), .irq_src({accel_irq, uart_irq, 2'b0}), ./* AXI */);

  // ── Accelerator ──────────────────────────────────────────
  systolic_soc_top accel (.clk(clk), .rst(~rst_n),
    ./* AXI4-Lite slave for control */
    ./* AXI4 master for DMA data */
    .irq(accel_irq)
  );
endmodule

Boot ROM & Reset Sequence

On reset, the CPU fetches from address 0x0000_0000 (Boot ROM). The boot ROM code: (1) initialises the stack pointer, (2) zeroes the BSS segment, (3) copies data from ROM to SRAM, (4) jumps to main(). With no OS, you must do this in assembly before C code runs.

Assembly — Minimal boot.S for bare-metal RISC-V SoC

.section .boot, "ax"
.global _start
_start:
  # Set stack pointer to top of SRAM
  li   sp, 0x20040000

  # Zero BSS
  la   a0, __bss_start
  la   a1, __bss_end
zero_bss:
  bge  a0, a1, done_bss
  sw   zero, 0(a0)
  addi a0, a0, 4
  j    zero_bss
done_bss:

  # Set mtvec to trap handler
  la   t0, trap_vector
  csrw mtvec, t0

  # Enable external interrupts (mie.MEIE)
  li   t0, 0x800
  csrw mie, t0
  csrsi mstatus, 8    # mstatus.MIE = 1

  call main
hang: j hang

Day 10 — Interview Questions

Q1What is the role of the AXI4 crossbar in a SoC?

The AXI4 crossbar (interconnect fabric) is the routing backbone that connects all AXI masters to all AXI slaves. It performs two key functions: (1) Address decoding — for each transaction, it compares the AWADDR/ARADDR against a slave base-address/mask table and routes the transaction to the matching slave; (2) Arbitration — when multiple masters target the same slave simultaneously, the crossbar arbitrates (round-robin, priority, or fair queuing) and serialises the transactions. A full crossbar allows M masters and S slaves to communicate in parallel if they target different slaves; a shared bus (AHB) allows only one transaction at a time. For a 2-master (CPU + DMA), 5-slave SoC, the crossbar allows the CPU to access the UART while the DMA simultaneously transfers data to SRAM.

Q2Why is the boot ROM mapped at address 0 in most RISC-V SoCs?

RISC-V defines the reset vector as implementation-defined, but most implementations set it to 0x00000000 or 0x10000 by convention. The boot ROM at address 0 ensures the CPU starts executing known-good code immediately after reset before any SRAM or DRAM is initialised. ROM is read-only and retains its content through power cycles — unlike SRAM which is undefined at power-on. The boot code initialises memory (zeroes BSS, copies .data from ROM to SRAM), sets up the interrupt vector, and transfers control to the main application. Some SoCs place the boot ROM at 0x1000 and use a tiny jump instruction at 0x0000 to redirect there.

Q3What is the PLIC and how does it handle multiple interrupt sources?

The PLIC (Platform Level Interrupt Controller) is a RISC-V standard peripheral that aggregates multiple external interrupt sources (UART, accelerator, GPIO, etc.) and delivers them to one or more CPU harts. Each source has a configurable priority (1–7, higher = more urgent). Each hart has an enable register (one bit per source) and a threshold (ignores sources below this priority). When a source asserts, the PLIC presents the highest-priority enabled source to the hart as an external interrupt. The hart reads the PLIC claim register to get the source ID, services it, then writes the same ID to the complete register to acknowledge. This protocol allows nested interrupts and ensures no source is lost. In our SoC, the accelerator and UART are sources 1 and 2 connected to the single hart.

Q4Explain the role of BSS zeroing in the boot sequence.

BSS (Block Started by Symbol) is the segment of a C program that holds uninitialized global and static variables. The C standard guarantees that all uninitialized globals are zero at program start. In a hosted environment (Linux), the OS zeroes BSS before calling main. In bare-metal, no OS exists — SRAM contains random garbage after power-on. The boot code must explicitly zero the BSS region (from __bss_start to __bss_end, as defined by the linker script) before calling main(). Skipping this causes subtle bugs where global variables that should be 0 contain random values — often hard to reproduce because the same memory may happen to be zeroed in one power cycle and not the next. The boot also copies the .data section from ROM to SRAM for initialized globals.

Q5What is address decoding and how does the crossbar implement it?

Address decoding maps a transaction's target address to a specific slave by comparing it against each slave's base address and mask. The mask defines which bits are significant: if SLAVE_MASK = 0xFFFF_F000, only bits [31:12] are compared — meaning the slave occupies a 4 KB window. For each transaction: the crossbar computes (ADDR & MASK) for each slave, checks if it equals (SLAVE_BASE & MASK), and routes to the matching slave. If no slave matches, the crossbar returns a DECERR response. The masks must be chosen carefully so slave windows don't overlap. In hardware, this is implemented as a priority-encoded decoder: the first matching slave wins (so overlapping windows default to the higher-priority slave — useful for debug address aliases).

Q6What happens if mtvec is not initialised before an interrupt occurs?

mtvec holds the base address of the trap handler. If it is not initialised (contains 0x00000000 by default on reset), and an interrupt fires, the CPU jumps to address 0x00000000. If that address contains the first instruction of the boot ROM, the SoC will appear to reset unexpectedly. If that address contains uninitialized memory or garbage instructions, the CPU may execute undefined behaviour, corrupt registers, or enter an infinite fault loop (fault-on-fault). In a production bare-metal design: always initialise mtvec early in the boot sequence, before enabling any interrupt sources in mie or mstatus. Point it to a trap handler that at minimum identifies the mcause and halts gracefully (infinite loop + LED blink) in debug builds.

← Day 9: Bare-Metal Driver Day 11: Performance Optimisation →

Full RISC-V SoC IntegrationAXI4 Crossbar, Address Map & Boot ROM

SoC Address Map

Boot ROM & Reset Sequence

Day 10 — Interview Questions

Full RISC-V SoC Integration
AXI4 Crossbar, Address Map & Boot ROM