HomeRISC-V from ScratchDay 25
DAY 25 · PHASE 4 — ADVANCED & REAL HARDWARE

Caches & the Road to Linux-Capable

By EcrioniX · Updated 2026-06-11

You have built a complete RV32I CPU, pipelined it, added peripherals, and run it on an FPGA. For the final lesson we look at the next major performance step — caches — and map out what would be needed to run Linux on your core. Understanding that gap is what separates a microcontroller-class design from a full application processor.

Why Caches?

Modern DRAM has 50–100 ns access latency. At 500 MHz that is 25–50 cycles wasted on every cache miss. A small, fast SRAM cache (L1: 32 KB, 1–4 cycles) sits between the CPU and DRAM and exploits temporal locality (recently used data is likely needed again) and spatial locality (nearby data is likely needed soon).

Cache Address Decomposition

For a 16-line, 4-byte-per-line direct-mapped cache on a 32-bit address:

32-bit address:  [ TAG (26 bits) | INDEX (4 bits) | OFFSET (2 bits) ]
                  bits 31..6       bits 5..2         bits 1..0

INDEX  selects which of the 16 cache lines to check
OFFSET selects which byte within the 4-byte cache line
TAG    stored alongside the line — must match for a hit

Cache Hit and Miss

icache.v
// icache.v — 16-line direct-mapped instruction cache
// Line size: 1 word (4 bytes). Tag+index+offset from 32-bit address.
// On miss, fetches from the backing imem (next-level store).
module icache #(
    parameter LINES = 16 // must be power of 2
)(
    input         clk, rst,
    input  [31:0] addr,          // fetch address from PC
    input         req,           // 1 = fetch requested
    output reg [31:0] rdata,     // instruction data
    output reg    hit,           // 1 = cache hit this cycle
    // Backing store (e.g. BRAM imem)
    output [31:0] mem_addr,
    input  [31:0] mem_rdata
);
    localparam IDX_BITS = 4;     // log2(LINES) = 4
    localparam OFF_BITS = 2;     // log2(4 bytes) = 2
    localparam TAG_BITS = 32 - IDX_BITS - OFF_BITS; // = 26

    reg [TAG_BITS-1:0] tag_array  [0:LINES-1];
    reg [31:0]         data_array [0:LINES-1];
    reg                valid      [0:LINES-1];

    wire [OFF_BITS-1:0] offset = addr[OFF_BITS-1:0];
    wire [IDX_BITS-1:0] index  = addr[OFF_BITS+IDX_BITS-1:OFF_BITS];
    wire [TAG_BITS-1:0] tag    = addr[31:OFF_BITS+IDX_BITS];

    // Connect miss path to backing store
    assign mem_addr = addr;

    integer i;
    always @(posedge clk or posedge rst) begin
        if (rst) begin
            for (i = 0; i < LINES; i = i+1)
                valid[i] <= 1'b0;
            hit <= 0;
        end else if (req) begin
            if (valid[index] && (tag_array[index] == tag)) begin
                // Cache hit
                rdata <= data_array[index];
                hit   <= 1'b1;
            end else begin
                // Cache miss: fill from backing store
                data_array[index] <= mem_rdata;
                tag_array[index]  <= tag;
                valid[index]      <= 1'b1;
                rdata <= mem_rdata;
                hit   <= 1'b0; // miss this cycle; hit next cycle
            end
        end
    end
endmodule

Testbench — Hit and Miss Verification

tb_icache.v
// tb_icache.v — Verify cache hit after first miss
`timescale 1ns/1ps
module tb_icache;
    reg clk=0, rst=1;
    always #5 clk=~clk;

    reg  [31:0] addr;
    reg         req;
    wire [31:0] rdata;
    wire        hit;
    wire [31:0] mem_addr;
    // Stub memory: returns 0xDEAD_BEEF for any address
    reg  [31:0] mem_rdata;
    always @(*) mem_rdata = 32'hDEAD_BEEF;

    icache dut(.clk(clk),.rst(rst),.addr(addr),.req(req),
               .rdata(rdata),.hit(hit),
               .mem_addr(mem_addr),.mem_rdata(mem_rdata));

    initial begin
        $dumpfile("tb_icache.vcd"); $dumpvars(0,tb_icache);
        req=0; addr=0;
        @(posedge clk); @(posedge clk); rst=0;

        // First access: expect MISS
        addr=32'h0000_0010; req=1;
        @(posedge clk); req=0;
        @(posedge clk); // result available next cycle
        if(!hit) $display("PASS: first access = miss (cold)");
        else     $display("FAIL: expected miss on cold cache");

        // Second access same addr: expect HIT
        req=1;
        @(posedge clk); req=0;
        @(posedge clk);
        if(hit && rdata===32'hDEAD_BEEF) $display("PASS: second access = hit, data=DEADBEEF");
        else $display("FAIL: expected hit, hit=%b data=%h",hit,rdata);

        // Different address (same index, different tag): expect MISS (conflict)
        addr=32'h0000_0110; req=1; // different tag, same index
        @(posedge clk); req=0;
        @(posedge clk);
        if(!hit) $display("PASS: conflict miss on different tag");
        else     $display("FAIL: expected conflict miss");

        $finish;
    end
endmodule

What a Linux-Capable RISC-V Core Needs

FeatureNeeded for LinuxComplexity
S-mode privilegeLinux kernel runs in S-mode; U-mode for user appsMedium — adds supervisor CSRs, privilege switching
Sv32 MMUVirtual memory — required for process isolation and large address spacesHigh — page-table walker, TLB, PTEs
CLINT timerCore-Local Interrupt Controller — provides mtime/mtimecmp for timer interruptsLow — simple MMIO counter
PLICPlatform Level Interrupt Controller — routes external interrupts to hartsMedium — priority encoder + claim/complete protocol
OpenSBIOpen Source RISC-V Supervisor Binary Interface — M-mode firmware that Linux calls via ECALLSoftware-only — open source firmware
Device treeDescribes hardware to Linux (memory ranges, UART, PLIC addresses)Low — a .dts text file

Open-Source Linux-Capable RISC-V Cores

Course Complete

You have gone from a register file in Day 9 to a pipelined, hazard-handled, FPGA-proven RV32I CPU in 25 days. You understand every wire in the datapath, every pipeline register, every hazard, and the path forward to a production-grade Linux core. That knowledge is the foundation for anything in VLSI, FPGA design, or computer architecture.

What next? Explore VLSI Design, FPGA from Scratch, or study CVA6/VexRiscv source code to see how the concepts from this course scale to a full application processor.

Day 25 Takeaways

FAQ

How does a direct-mapped cache work?

The address is split into tag, index, and offset. The index selects a cache line. If the valid bit is set and the stored tag matches the address tag, it is a hit. Otherwise it is a miss and the line is fetched from main memory.

What does a RISC-V CPU need to boot Linux?

S-mode privilege, Sv32 virtual memory (MMU), CLINT timer, PLIC interrupt controller, OpenSBI M-mode firmware, and a device tree. All are substantially more complex than our 25-day RV32I base.

What are good Linux-capable open RISC-V cores?

CVA6 (ETH Zurich, RV64GC), VexRiscv (SpinalHDL, configurable), and Rocket Chip (UC Berkeley, full SoC generator). All are open source on GitHub.

Previous
← Day 24: Run on FPGA

← Full roadmap