HomeRISC-V from ScratchDay 7
DAY 7 · PHASE 2 — BUILD THE CPU

Datapath Overview — Fetch, Decode, Execute

By EcrioniX · Updated Jun 9, 2026

Phase 1 gave you the instruction set. Phase 2 is where the real fun starts: we build the hardware that runs it. This lesson gives you the complete picture of the single-cycle RISC-V CPU — every component, every stage, every wire — before we implement each piece. By the end you'll have the datapath diagram in your head and the first real CPU module, the Program Counter, running in Verilog.

🚀 Welcome to Phase 2

Days 7–14 build a complete single-cycle RV32I processor from scratch in Verilog. Each lesson adds one module. Day 7 gives you the map; every subsequent day fills in one piece. By Day 14, it all runs together.

1. The big idea: what a CPU really is

Strip away the marketing. A CPU is just a machine that repeats one loop forever:

  1. Fetch an instruction from memory at the current address.
  2. Decode it — figure out what kind of instruction it is and which registers it needs.
  3. Execute it — compute the result.
  4. Access memory — if it's a load or store.
  5. Write back the result to a register.
  6. Advance the program counter. Go back to step 1.

That loop is what every piece of the datapath serves. Let's map the hardware to the loop.

2. The five stages, one at a time

STAGE 1
Fetch (IF)
Read 32-bit instruction from IMEM at address PC. Compute PC+4.
STAGE 2
Decode (ID)
Split instruction into fields. Read rs1, rs2 from register file. Sign-extend immediate.
STAGE 3
Execute (EX)
ALU computes result or branch target. For branches, compare operands.
STAGE 4
Memory (MEM)
Load: read DMEM. Store: write DMEM. Non-memory instructions pass through.
STAGE 5
Write-back (WB)
Write result to destination register rd.

In a single-cycle CPU all five stages happen in one clock cycle — no pipeline registers, no stalls. It's slower than a pipeline but far simpler to understand and build. Once this works, pipelining it is a natural next step.

3. Every component on the datapath

ComponentAbbrevWhat it does
Program CounterPC32-bit register holding the address of the next instruction to fetch. Clocked — updates every cycle to PC+4 or a jump/branch target.
Instruction MemoryIMEMRead-only memory (ROM) that holds the program. Takes address in, returns 32-bit instruction word out. Combinational — no clock.
Instruction DecoderDECODESplits the 32-bit instruction into fields: opcode[6:0], rd[11:7], rs1[19:15], rs2[24:20], funct3[14:12], funct7[31:25], and immediate (format-dependent).
Register FileREGFILE32 × 32-bit registers (x0–x31). Two async read ports (rs1, rs2) and one synchronous write port (rd). x0 always reads 0.
Immediate GeneratorIMMGENSign-extends the immediate field from the instruction based on the instruction format (I/S/B/U/J — from Day 3).
ALUALUArithmetic Logic Unit — performs ADD, SUB, AND, OR, XOR, SLT, SLL, SRL, SRA. Takes two 32-bit operands, outputs 32-bit result and a zero flag (for branches).
ALU MUXALUMUXSelects the ALU's second operand: either rs2 (R-type) or the immediate (I/S/B/U/J-type). Controlled by ALUSrc signal from the control unit.
Data MemoryDMEMRead/write memory for loads and stores. Address and write data come from the ALU/register file. MemRead and MemWrite control signals enable it.
Write-back MUXWBMUXSelects what gets written to rd: ALU result, DMEM read data, PC+4 (for JAL/JALR), or upper immediate (LUI/AUIPC).
Branch UnitBRANCHCombines the ALU zero flag with funct3 to decide if a branch is taken, then computes the branch target (PC + B-imm).
PC MUXPCMUXSelects the next PC value: PC+4 (no branch) or branch/jump target.
Control UnitCTRLReads the opcode (and funct3/funct7) and generates all control signals for the datapath: RegWrite, ALUSrc, MemRead, MemWrite, Branch, ALUOp, WBSel.

4. The full datapath — wired together

PC 32-bit reg IMEM inst[31:0] out DECODE opcode,rd rs1,rs2 funct3/7 imm REGFILE x0–x31 rdata1,rdata2 out ALU MUX rs2/imm ALU result[31:0] zero flag DMEM addr/wdata rdata out WB MUX → rd write-back to rd PC + 4 PC MUX → next PC CONTROL UNIT RegWrite · ALUSrc · MemR/W · Branch · ALUOp · WBSel Single-Cycle RISC-V Datapath addr inst
Figure — Single-cycle RV32I datapath. Data flows left to right; write-back loops back to the register file. The control unit (bottom) steers every mux.

5. Tracing one instruction through the datapath

Let's walk add x3, x1, x2 through every stage:

StageWhat happens
FetchIMEM reads 0x00208133 (the encoding of add x3,x1,x2) at address PC.
DecodeDecoder extracts opcode=0110011 (R-type), rd=3, rs1=1, rs2=2, funct3=000, funct7=0000000. REGFILE reads x1 and x2.
ExecuteALUSrc=0 → ALUMUX picks rs2. ALU does ADD → result = x1+x2.
MemoryMemRead=0, MemWrite=0 → DMEM idle. Result passes through.
Write-backWBSel=ALU → WBMUX routes ALU result to rd. REGFILE writes result into x3.

At the end of the cycle, PC ← PC+4 (no branch taken) and the loop repeats.

6. Build it: the Program Counter (pc.v)

Every journey starts with the PC. It's the simplest module: a 32-bit register that latches pc_next on every rising clock edge, and resets to 0.

Port table:

PortDirWidthMeaning
clkinput1clock — PC updates on rising edge
rstinput1synchronous reset — drives PC to 0x00000000
pc_nextinput32the next address to load (PC+4 or branch target)
pcoutput32current PC — feeds IMEM address input
pc.v — Program Counter
// Program Counter — 32-bit register, synchronous reset
module pc (
    input  wire        clk,      // clock
    input  wire        rst,      // synchronous reset (active high)
    input  wire [31:0] pc_next,  // next PC value (PC+4 or branch target)
    output reg  [31:0] pc        // current PC value → IMEM address
);
    always @(posedge clk) begin
        if (rst) pc <= 32'h0000_0000;
        else     pc <= pc_next;
    end
endmodule

7. Test it: testbench (tb_pc.v)

A self-checking testbench: reset the PC, then step through several cycles and verify the address advances correctly:

tb_pc.v — Program Counter testbench
`timescale 1ns/1ps
module tb_pc;
    reg        clk, rst;
    reg [31:0] pc_next;
    wire[31:0] pc;

    pc dut (.clk(clk), .rst(rst), .pc_next(pc_next), .pc(pc));

    // 10 ns clock
    initial clk = 0;
    always #5 clk = ~clk;

    integer errors = 0;
    task check(input [31:0] exp);
        if (pc !== exp) begin
            $display("FAIL: pc=%0h expected=%0h", pc, exp);
            errors = errors + 1;
        end else
            $display("ok:   pc=%0h", pc);
    endtask

    initial begin
        // --- reset ---
        rst=1; pc_next=32'h0; @(posedge clk); #1;
        check(32'h0000_0000);    // after reset, PC=0

        // --- advance PC+4 each cycle ---
        rst=0;
        pc_next=32'h0000_0004; @(posedge clk); #1; check(32'h0000_0004);
        pc_next=32'h0000_0008; @(posedge clk); #1; check(32'h0000_0008);
        pc_next=32'h0000_000C; @(posedge clk); #1; check(32'h0000_000C);

        // --- branch jump to address ---
        pc_next=32'h0000_0100; @(posedge clk); #1; check(32'h0000_0100);

        // --- reset again ---
        rst=1; pc_next=32'h0; @(posedge clk); #1; check(32'h0000_0000);

        if (errors==0) $display("ALL TESTS PASSED");
        else           $display("%0d TEST(S) FAILED", errors);
        $finish;
    end
endmodule
run.sh — compile & simulate
iverilog -o pc_tb pc.v tb_pc.v
vvp pc_tb

Expected output:

expected output
ok:   pc=00000000
ok:   pc=00000004
ok:   pc=00000008
ok:   pc=0000000c
ok:   pc=00000100
ok:   pc=00000000
ALL TESTS PASSED

💡 The datapath is a factory production line

Each stage is a workstation. The instruction is the item on the conveyor belt. It starts at the PC (the dispatch desk), gets read from IMEM, decoded at the decode station, processed by the ALU, optionally handled by the memory station, and finally the result is placed back in the register file. The control unit is the factory manager reading the instruction ticket and flipping the right switches at each station.

🎯 Day 7 takeaways

Quick check

  1. Name the five stages of the single-cycle RISC-V pipeline.
  2. Which component decides whether the next PC is PC+4 or a branch target?
  3. What does the ALUMUX select between?
  4. Why does the write-back path loop back from right to left in the datapath?

FAQ

What is a CPU datapath?

The hardware (wires, ALU, memories, muxes, registers) that instructions flow through. The control unit generates signals to steer it but doesn't touch the data.

Five stages of a single-cycle RISC-V CPU?

Fetch, Decode, Execute, Memory, Write-back. In a single-cycle design all five happen in one clock cycle.

What does the Program Counter do?

Holds the address of the next instruction to fetch. Updates every clock cycle to PC+4 or a branch/jump target.

Datapath vs control unit?

Datapath moves and transforms data. Control unit reads the opcode and generates the binary signals (ALUSrc, RegWrite, etc.) that steer the datapath muxes.

Why single-cycle first?

It's the simplest CPU — no pipeline hazards, no stalls. Once it works, pipelining is a natural upgrade.

Previous
← Day 6: Writing & running assembly

← Back to the full roadmap  ·  Open the Verilog simulator →