DAY 25 · PHASE 4 — ADVANCED & REAL HARDWARE

Caches & the Road to Linux-Capable

Q: How does a direct-mapped cache work?

A direct-mapped cache maps each memory address to exactly one cache line. The address is split into three fields: tag (high bits identify the memory block), index (selects which cache line), and offset (byte position within the cache line). On a read, the cache checks if the tag matches and the valid bit is set — that is a hit. Otherwise it is a miss and the line is fetched from main memory.

Q: What does a RISC-V CPU need to boot Linux?

Linux on RISC-V requires: S-mode (supervisor privilege level) in addition to M-mode, a page-based virtual memory system (Sv32 or Sv39 MMU), OpenSBI firmware to bootstrap the system, a device tree describing the hardware, and a timer interrupt source (CLINT). This is substantially more complex than our 25-day RV32I core.

Q: What are good Linux-capable open RISC-V cores?

CVA6 (formerly Ariane) by ETH Zurich is a 64-bit RV64GC core with MMU that runs Linux. VexRiscv by SpinalHDL is a configurable 32/64-bit core with optional MMU and Linux support. Rocket Chip from UC Berkeley is the original research Linux-capable RISC-V core generator. All three are open-source on GitHub.

By EcrioniX · Updated 2026-06-11

You have built a complete RV32I CPU, pipelined it, added peripherals, and run it on an FPGA. For the final lesson we look at the next major performance step — caches — and map out what would be needed to run Linux on your core. Understanding that gap is what separates a microcontroller-class design from a full application processor.

Why Caches?

Modern DRAM has 50–100 ns access latency. At 500 MHz that is 25–50 cycles wasted on every cache miss. A small, fast SRAM cache (L1: 32 KB, 1–4 cycles) sits between the CPU and DRAM and exploits temporal locality (recently used data is likely needed again) and spatial locality (nearby data is likely needed soon).

Cache Address Decomposition

For a 16-line, 4-byte-per-line direct-mapped cache on a 32-bit address:

32-bit address:  [ TAG (26 bits) | INDEX (4 bits) | OFFSET (2 bits) ]
                  bits 31..6       bits 5..2         bits 1..0

INDEX  selects which of the 16 cache lines to check
OFFSET selects which byte within the 4-byte cache line
TAG    stored alongside the line — must match for a hit

Cache Hit and Miss

Hit: valid bit is set AND stored tag == address tag → return cached data in 1 cycle.
Miss: valid bit clear OR tag mismatch → stall, fetch line from DRAM, fill the cache line, return data (50+ cycle penalty).
Write-through: every store writes both the cache and DRAM. Simple but wastes memory bandwidth.
Write-back: stores go only to the cache (dirty bit set). DRAM is updated only when the dirty line is evicted. More complex but much higher performance.

icache.v

// icache.v — 16-line direct-mapped instruction cache
// Line size: 1 word (4 bytes). Tag+index+offset from 32-bit address.
// On miss, fetches from the backing imem (next-level store).
module icache #(
    parameter LINES = 16 // must be power of 2
)(
    input         clk, rst,
    input  [31:0] addr,          // fetch address from PC
    input         req,           // 1 = fetch requested
    output reg [31:0] rdata,     // instruction data
    output reg    hit,           // 1 = cache hit this cycle
    // Backing store (e.g. BRAM imem)
    output [31:0] mem_addr,
    input  [31:0] mem_rdata
);
    localparam IDX_BITS = 4;     // log2(LINES) = 4
    localparam OFF_BITS = 2;     // log2(4 bytes) = 2
    localparam TAG_BITS = 32 - IDX_BITS - OFF_BITS; // = 26

    reg [TAG_BITS-1:0] tag_array  [0:LINES-1];
    reg [31:0]         data_array [0:LINES-1];
    reg                valid      [0:LINES-1];

    wire [OFF_BITS-1:0] offset = addr[OFF_BITS-1:0];
    wire [IDX_BITS-1:0] index  = addr[OFF_BITS+IDX_BITS-1:OFF_BITS];
    wire [TAG_BITS-1:0] tag    = addr[31:OFF_BITS+IDX_BITS];

    // Connect miss path to backing store
    assign mem_addr = addr;

    integer i;
    always @(posedge clk or posedge rst) begin
        if (rst) begin
            for (i = 0; i < LINES; i = i+1)
                valid[i] <= 1'b0;
            hit <= 0;
        end else if (req) begin
            if (valid[index] && (tag_array[index] == tag)) begin
                // Cache hit
                rdata <= data_array[index];
                hit   <= 1'b1;
            end else begin
                // Cache miss: fill from backing store
                data_array[index] <= mem_rdata;
                tag_array[index]  <= tag;
                valid[index]      <= 1'b1;
                rdata <= mem_rdata;
                hit   <= 1'b0; // miss this cycle; hit next cycle
            end
        end
    end
endmodule

Testbench — Hit and Miss Verification

tb_icache.v

// tb_icache.v — Verify cache hit after first miss
`timescale 1ns/1ps
module tb_icache;
    reg clk=0, rst=1;
    always #5 clk=~clk;

    reg  [31:0] addr;
    reg         req;
    wire [31:0] rdata;
    wire        hit;
    wire [31:0] mem_addr;
    // Stub memory: returns 0xDEAD_BEEF for any address
    reg  [31:0] mem_rdata;
    always @(*) mem_rdata = 32'hDEAD_BEEF;

    icache dut(.clk(clk),.rst(rst),.addr(addr),.req(req),
               .rdata(rdata),.hit(hit),
               .mem_addr(mem_addr),.mem_rdata(mem_rdata));

    initial begin
        $dumpfile("tb_icache.vcd"); $dumpvars(0,tb_icache);
        req=0; addr=0;
        @(posedge clk); @(posedge clk); rst=0;

        // First access: expect MISS
        addr=32'h0000_0010; req=1;
        @(posedge clk); req=0;
        @(posedge clk); // result available next cycle
        if(!hit) $display("PASS: first access = miss (cold)");
        else     $display("FAIL: expected miss on cold cache");

        // Second access same addr: expect HIT
        req=1;
        @(posedge clk); req=0;
        @(posedge clk);
        if(hit && rdata===32'hDEAD_BEEF) $display("PASS: second access = hit, data=DEADBEEF");
        else $display("FAIL: expected hit, hit=%b data=%h",hit,rdata);

        // Different address (same index, different tag): expect MISS (conflict)
        addr=32'h0000_0110; req=1; // different tag, same index
        @(posedge clk); req=0;
        @(posedge clk);
        if(!hit) $display("PASS: conflict miss on different tag");
        else     $display("FAIL: expected conflict miss");

        $finish;
    end
endmodule

What a Linux-Capable RISC-V Core Needs

Feature	Needed for Linux	Complexity
S-mode privilege	Linux kernel runs in S-mode; U-mode for user apps	Medium — adds supervisor CSRs, privilege switching
Sv32 MMU	Virtual memory — required for process isolation and large address spaces	High — page-table walker, TLB, PTEs
CLINT timer	Core-Local Interrupt Controller — provides mtime/mtimecmp for timer interrupts	Low — simple MMIO counter
PLIC	Platform Level Interrupt Controller — routes external interrupts to harts	Medium — priority encoder + claim/complete protocol
OpenSBI	Open Source RISC-V Supervisor Binary Interface — M-mode firmware that Linux calls via ECALL	Software-only — open source firmware
Device tree	Describes hardware to Linux (memory ranges, UART, PLIC addresses)	Low — a .dts text file

Open-Source Linux-Capable RISC-V Cores

CVA6 (Ariane) — ETH Zurich, RV64GC, 6-stage in-order, MMU, runs Linux. github.com/openhwgroup/cva6
VexRiscv — SpinalHDL, configurable 32/64-bit, MMU plugin available, very fast to synthesize. github.com/SpinalHDL/VexRiscv
Rocket Chip — UC Berkeley, Chisel, RV64GC, complete SoC generator, reference Linux platform. github.com/chipsalliance/rocket-chip

Course Complete

You have gone from a register file in Day 9 to a pipelined, hazard-handled, FPGA-proven RV32I CPU in 25 days. You understand every wire in the datapath, every pipeline register, every hazard, and the path forward to a production-grade Linux core. That knowledge is the foundation for anything in VLSI, FPGA design, or computer architecture.

What next? Explore VLSI Design, FPGA from Scratch, or study CVA6/VexRiscv source code to see how the concepts from this course scale to a full application processor.

Day 25 Takeaways

A direct-mapped cache splits addresses into tag, index, offset — valid + tag match = hit; otherwise miss.
Write-back caches use a dirty bit to defer DRAM writes until eviction, dramatically reducing memory bandwidth.
Linux needs S-mode, Sv32 MMU, CLINT, PLIC, OpenSBI firmware — these are non-trivial additions to our RV32I core.
Open-source Linux-capable cores exist: CVA6, VexRiscv, Rocket Chip — study them to see the full picture.
This course is complete — you have built a real CPU from scratch.

FAQ

How does a direct-mapped cache work?

The address is split into tag, index, and offset. The index selects a cache line. If the valid bit is set and the stored tag matches the address tag, it is a hit. Otherwise it is a miss and the line is fetched from main memory.

What does a RISC-V CPU need to boot Linux?

S-mode privilege, Sv32 virtual memory (MMU), CLINT timer, PLIC interrupt controller, OpenSBI M-mode firmware, and a device tree. All are substantially more complex than our 25-day RV32I base.

What are good Linux-capable open RISC-V cores?

CVA6 (ETH Zurich, RV64GC), VexRiscv (SpinalHDL, configurable), and Rocket Chip (UC Berkeley, full SoC generator). All are open source on GitHub.

← Full roadmap