HomeRISC-V from ScratchDay 17
DAY 17 · PHASE 3 — PIPELINE & OPTIMIZE

The 5-Stage Pipeline

By EcrioniX · Updated 2026-06-11

The single-cycle CPU from Days 7–15 works correctly, but it is slow — the clock must wait for the longest instruction path. Pipelining splits execution into 5 parallel stages so that while one instruction is in EX, the next is in ID and the one after is already being fetched. The result: one instruction completes every clock cycle at a much higher frequency.

The 5 Pipeline Stages

StageNameWhat HappensHardware
IFInstruction FetchRead instruction at PC; PC += 4IMEM, PC register
IDInstruction DecodeRead rs1/rs2 from RegFile; generate immediate; decode control signalsRegFile, ImmGen, Control
EXExecuteALU computes result or load/store address; branch condition evaluatedALU, BranchUnit
MEMMemoryLoad or store to DMEMDMEM
WBWrite-BackWrite result to register fileRegFile write port, WB mux

Pipeline Throughput vs Latency

Consider 5 instructions executing on both CPU types:

Single-cycle (5 ns/instr):
  I1: |─────────────────────|  5 ns
  I2:                        |─────────────────────|  10 ns
  I3:                                               |─────────────────────| 15 ns

Pipelined (1 ns/stage):
  Cycle:  1    2    3    4    5    6    7    8    9
  I1:    [IF] [ID] [EX] [MEM][WB]
  I2:         [IF] [ID] [EX] [MEM][WB]
  I3:              [IF] [ID] [EX] [MEM][WB]
  I4:                   [IF] [ID] [EX] [MEM][WB]
  I5:                        [IF] [ID] [EX] [MEM][WB]

  5 instructions complete in 9 cycles (not 25). Throughput: 1 instr/cycle.

Pipeline Registers

Between each pair of stages sits a pipeline register — a bank of flip-flops that captures all the values the next stage needs. At each rising clock edge, every pipeline register latches its inputs simultaneously, moving every in-flight instruction one stage forward.

There are four pipeline registers:

pipe_regs.v
// pipe_regs.v — The four inter-stage pipeline registers
// for a 5-stage RISC-V pipeline

// ── IF/ID Pipeline Register ───────────────────────────────────────
module if_id_reg (
    input         clk, rst,
    input         stall,   // hold current value if stall=1
    input  [31:0] if_pc, if_inst,
    output reg [31:0] id_pc, id_inst
);
    always @(posedge clk or posedge rst) begin
        if (rst) begin
            id_pc   <= 0;
            id_inst <= 32'h00000013; // NOP
        end else if (!stall) begin
            id_pc   <= if_pc;
            id_inst <= if_inst;
        end
    end
endmodule

// ── ID/EX Pipeline Register ───────────────────────────────────────
module id_ex_reg (
    input        clk, rst, flush,
    // Control signals
    input        id_RegWrite, id_ALUSrc, id_MemRead,
    input        id_MemWrite, id_WBSel, id_Branch,
    input        id_Jal, id_Jalr,
    input [3:0]  id_ALUOp,
    // Data
    input [31:0] id_pc, id_rdata1, id_rdata2, id_imm,
    input [4:0]  id_rs1, id_rs2, id_rd,
    input [2:0]  id_funct3,
    // Outputs
    output reg        ex_RegWrite, ex_ALUSrc, ex_MemRead,
    output reg        ex_MemWrite, ex_WBSel, ex_Branch,
    output reg        ex_Jal, ex_Jalr,
    output reg [3:0]  ex_ALUOp,
    output reg [31:0] ex_pc, ex_rdata1, ex_rdata2, ex_imm,
    output reg [4:0]  ex_rs1, ex_rs2, ex_rd,
    output reg [2:0]  ex_funct3
);
    always @(posedge clk or posedge rst) begin
        if (rst || flush) begin
            // Insert NOP bubble
            ex_RegWrite <= 0; ex_ALUSrc <= 0; ex_MemRead <= 0;
            ex_MemWrite <= 0; ex_WBSel  <= 0; ex_Branch  <= 0;
            ex_Jal <= 0; ex_Jalr <= 0; ex_ALUOp <= 0;
            ex_pc <= 0; ex_rdata1 <= 0; ex_rdata2 <= 0; ex_imm <= 0;
            ex_rs1 <= 0; ex_rs2 <= 0; ex_rd <= 0; ex_funct3 <= 0;
        end else begin
            ex_RegWrite <= id_RegWrite; ex_ALUSrc <= id_ALUSrc;
            ex_MemRead  <= id_MemRead;  ex_MemWrite <= id_MemWrite;
            ex_WBSel    <= id_WBSel;   ex_Branch   <= id_Branch;
            ex_Jal      <= id_Jal;     ex_Jalr     <= id_Jalr;
            ex_ALUOp    <= id_ALUOp;
            ex_pc       <= id_pc;   ex_rdata1 <= id_rdata1;
            ex_rdata2   <= id_rdata2; ex_imm   <= id_imm;
            ex_rs1      <= id_rs1;  ex_rs2    <= id_rs2;
            ex_rd       <= id_rd;   ex_funct3 <= id_funct3;
        end
    end
endmodule

// ── EX/MEM Pipeline Register ──────────────────────────────────────
module ex_mem_reg (
    input        clk, rst,
    input        ex_RegWrite, ex_MemRead, ex_MemWrite,
    input        ex_WBSel, ex_Branch, ex_branch_taken,
    input [31:0] ex_alu_out, ex_rdata2, ex_branch_target,
    input [4:0]  ex_rd,
    input [2:0]  ex_funct3,
    output reg        mem_RegWrite, mem_MemRead, mem_MemWrite,
    output reg        mem_WBSel, mem_Branch, mem_branch_taken,
    output reg [31:0] mem_alu_out, mem_rdata2, mem_branch_target,
    output reg [4:0]  mem_rd,
    output reg [2:0]  mem_funct3
);
    always @(posedge clk or posedge rst) begin
        if (rst) begin
            mem_RegWrite <= 0; mem_MemRead <= 0; mem_MemWrite <= 0;
            mem_WBSel <= 0; mem_Branch <= 0; mem_branch_taken <= 0;
            mem_alu_out <= 0; mem_rdata2 <= 0; mem_branch_target <= 0;
            mem_rd <= 0; mem_funct3 <= 0;
        end else begin
            mem_RegWrite     <= ex_RegWrite;
            mem_MemRead      <= ex_MemRead;
            mem_MemWrite     <= ex_MemWrite;
            mem_WBSel        <= ex_WBSel;
            mem_Branch       <= ex_Branch;
            mem_branch_taken <= ex_branch_taken;
            mem_alu_out      <= ex_alu_out;
            mem_rdata2       <= ex_rdata2;
            mem_branch_target <= ex_branch_target;
            mem_rd           <= ex_rd;
            mem_funct3       <= ex_funct3;
        end
    end
endmodule

// ── MEM/WB Pipeline Register ──────────────────────────────────────
module mem_wb_reg (
    input        clk, rst,
    input        mem_RegWrite, mem_WBSel,
    input [31:0] mem_alu_out, mem_dmem_rdata,
    input [4:0]  mem_rd,
    output reg        wb_RegWrite, wb_WBSel,
    output reg [31:0] wb_alu_out, wb_dmem_rdata,
    output reg [4:0]  wb_rd
);
    always @(posedge clk or posedge rst) begin
        if (rst) begin
            wb_RegWrite <= 0; wb_WBSel <= 0;
            wb_alu_out <= 0; wb_dmem_rdata <= 0; wb_rd <= 0;
        end else begin
            wb_RegWrite   <= mem_RegWrite;
            wb_WBSel      <= mem_WBSel;
            wb_alu_out    <= mem_alu_out;
            wb_dmem_rdata <= mem_dmem_rdata;
            wb_rd         <= mem_rd;
        end
    end
endmodule

Hazards — The Challenge of Pipelining

Pipelining creates hazards — situations where a later instruction depends on a result that has not yet been written back. There are three types:

Day 17 Takeaways

FAQ

What are the 5 stages of a RISC-V pipeline?

IF fetches the instruction; ID decodes it and reads registers; EX runs the ALU; MEM accesses data memory; WB writes the result back to the register file.

What are pipeline registers?

Banks of flip-flops between stages that hold all values needed by the next stage. They update on every clock edge, moving instructions forward one stage per cycle.

How does pipelining improve throughput?

Different instructions occupy different stages simultaneously. Once full, the pipeline completes one instruction per cycle — the same throughput as 5 single-cycle CPUs running in parallel at 5× the clock frequency.

Previous
← Day 16: Testbench & Simulation

← Full roadmap