Phase 1 gave you the instruction set. Phase 2 is where the real fun starts: we build the hardware that runs it. This lesson gives you the complete picture of the single-cycle RISC-V CPU — every component, every stage, every wire — before we implement each piece. By the end you'll have the datapath diagram in your head and the first real CPU module, the Program Counter, running in Verilog.
Days 7–14 build a complete single-cycle RV32I processor from scratch in Verilog. Each lesson adds one module. Day 7 gives you the map; every subsequent day fills in one piece. By Day 14, it all runs together.
Strip away the marketing. A CPU is just a machine that repeats one loop forever:
That loop is what every piece of the datapath serves. Let's map the hardware to the loop.
In a single-cycle CPU all five stages happen in one clock cycle — no pipeline registers, no stalls. It's slower than a pipeline but far simpler to understand and build. Once this works, pipelining it is a natural next step.
| Component | Abbrev | What it does |
|---|---|---|
| Program Counter | PC | 32-bit register holding the address of the next instruction to fetch. Clocked — updates every cycle to PC+4 or a jump/branch target. |
| Instruction Memory | IMEM | Read-only memory (ROM) that holds the program. Takes address in, returns 32-bit instruction word out. Combinational — no clock. |
| Instruction Decoder | DECODE | Splits the 32-bit instruction into fields: opcode[6:0], rd[11:7], rs1[19:15], rs2[24:20], funct3[14:12], funct7[31:25], and immediate (format-dependent). |
| Register File | REGFILE | 32 × 32-bit registers (x0–x31). Two async read ports (rs1, rs2) and one synchronous write port (rd). x0 always reads 0. |
| Immediate Generator | IMMGEN | Sign-extends the immediate field from the instruction based on the instruction format (I/S/B/U/J — from Day 3). |
| ALU | ALU | Arithmetic Logic Unit — performs ADD, SUB, AND, OR, XOR, SLT, SLL, SRL, SRA. Takes two 32-bit operands, outputs 32-bit result and a zero flag (for branches). |
| ALU MUX | ALUMUX | Selects the ALU's second operand: either rs2 (R-type) or the immediate (I/S/B/U/J-type). Controlled by ALUSrc signal from the control unit. |
| Data Memory | DMEM | Read/write memory for loads and stores. Address and write data come from the ALU/register file. MemRead and MemWrite control signals enable it. |
| Write-back MUX | WBMUX | Selects what gets written to rd: ALU result, DMEM read data, PC+4 (for JAL/JALR), or upper immediate (LUI/AUIPC). |
| Branch Unit | BRANCH | Combines the ALU zero flag with funct3 to decide if a branch is taken, then computes the branch target (PC + B-imm). |
| PC MUX | PCMUX | Selects the next PC value: PC+4 (no branch) or branch/jump target. |
| Control Unit | CTRL | Reads the opcode (and funct3/funct7) and generates all control signals for the datapath: RegWrite, ALUSrc, MemRead, MemWrite, Branch, ALUOp, WBSel. |
Let's walk add x3, x1, x2 through every stage:
| Stage | What happens |
|---|---|
| Fetch | IMEM reads 0x00208133 (the encoding of add x3,x1,x2) at address PC. |
| Decode | Decoder extracts opcode=0110011 (R-type), rd=3, rs1=1, rs2=2, funct3=000, funct7=0000000. REGFILE reads x1 and x2. |
| Execute | ALUSrc=0 → ALUMUX picks rs2. ALU does ADD → result = x1+x2. |
| Memory | MemRead=0, MemWrite=0 → DMEM idle. Result passes through. |
| Write-back | WBSel=ALU → WBMUX routes ALU result to rd. REGFILE writes result into x3. |
At the end of the cycle, PC ← PC+4 (no branch taken) and the loop repeats.
Every journey starts with the PC. It's the simplest module: a 32-bit register that latches pc_next on every rising clock edge, and resets to 0.
Port table:
| Port | Dir | Width | Meaning |
|---|---|---|---|
| clk | input | 1 | clock — PC updates on rising edge |
| rst | input | 1 | synchronous reset — drives PC to 0x00000000 |
| pc_next | input | 32 | the next address to load (PC+4 or branch target) |
| pc | output | 32 | current PC — feeds IMEM address input |
// Program Counter — 32-bit register, synchronous reset
module pc (
input wire clk, // clock
input wire rst, // synchronous reset (active high)
input wire [31:0] pc_next, // next PC value (PC+4 or branch target)
output reg [31:0] pc // current PC value → IMEM address
);
always @(posedge clk) begin
if (rst) pc <= 32'h0000_0000;
else pc <= pc_next;
end
endmodule
A self-checking testbench: reset the PC, then step through several cycles and verify the address advances correctly:
`timescale 1ns/1ps
module tb_pc;
reg clk, rst;
reg [31:0] pc_next;
wire[31:0] pc;
pc dut (.clk(clk), .rst(rst), .pc_next(pc_next), .pc(pc));
// 10 ns clock
initial clk = 0;
always #5 clk = ~clk;
integer errors = 0;
task check(input [31:0] exp);
if (pc !== exp) begin
$display("FAIL: pc=%0h expected=%0h", pc, exp);
errors = errors + 1;
end else
$display("ok: pc=%0h", pc);
endtask
initial begin
// --- reset ---
rst=1; pc_next=32'h0; @(posedge clk); #1;
check(32'h0000_0000); // after reset, PC=0
// --- advance PC+4 each cycle ---
rst=0;
pc_next=32'h0000_0004; @(posedge clk); #1; check(32'h0000_0004);
pc_next=32'h0000_0008; @(posedge clk); #1; check(32'h0000_0008);
pc_next=32'h0000_000C; @(posedge clk); #1; check(32'h0000_000C);
// --- branch jump to address ---
pc_next=32'h0000_0100; @(posedge clk); #1; check(32'h0000_0100);
// --- reset again ---
rst=1; pc_next=32'h0; @(posedge clk); #1; check(32'h0000_0000);
if (errors==0) $display("ALL TESTS PASSED");
else $display("%0d TEST(S) FAILED", errors);
$finish;
end
endmodule
iverilog -o pc_tb pc.v tb_pc.v vvp pc_tb
Expected output:
ok: pc=00000000 ok: pc=00000004 ok: pc=00000008 ok: pc=0000000c ok: pc=00000100 ok: pc=00000000 ALL TESTS PASSED
Each stage is a workstation. The instruction is the item on the conveyor belt. It starts at the PC (the dispatch desk), gets read from IMEM, decoded at the decode station, processed by the ALU, optionally handled by the memory station, and finally the result is placed back in the register file. The control unit is the factory manager reading the instruction ticket and flipping the right switches at each station.
pc.v) — the first real CPU module.The hardware (wires, ALU, memories, muxes, registers) that instructions flow through. The control unit generates signals to steer it but doesn't touch the data.
Fetch, Decode, Execute, Memory, Write-back. In a single-cycle design all five happen in one clock cycle.
Holds the address of the next instruction to fetch. Updates every clock cycle to PC+4 or a branch/jump target.
Datapath moves and transforms data. Control unit reads the opcode and generates the binary signals (ALUSrc, RegWrite, etc.) that steer the datapath muxes.
It's the simplest CPU — no pipeline hazards, no stalls. Once it works, pipelining is a natural upgrade.