DAY 17 · PHASE 3 — PIPELINE & OPTIMIZE

The 5-Stage Pipeline

Q: What are the 5 stages of a RISC-V pipeline?

IF (Instruction Fetch) reads the instruction from imem using the current PC. ID (Instruction Decode) reads register operands from the register file and generates the immediate. EX (Execute) runs the ALU operation or computes the load/store address. MEM (Memory) reads or writes data memory. WB (Write-Back) writes the result back to the register file.

Q: What are pipeline registers?

Pipeline registers are flip-flop banks placed between pipeline stages. They hold all the values that the next stage needs — instruction fields, PC, control signals, ALU results. At each clock edge all pipeline registers update simultaneously, advancing each instruction one stage forward.

Q: How does pipelining improve throughput?

In a single-cycle CPU, the clock period must be long enough for the slowest instruction (e.g. a load takes 5 sub-operations). In a pipelined CPU each stage runs in parallel on different instructions. Once the pipeline is full, one instruction completes every clock cycle — five times higher throughput at a clock period equal to just one stage delay.

By EcrioniX · Updated 2026-06-11

The single-cycle CPU from Days 7–15 works correctly, but it is slow — the clock must wait for the longest instruction path. Pipelining splits execution into 5 parallel stages so that while one instruction is in EX, the next is in ID and the one after is already being fetched. The result: one instruction completes every clock cycle at a much higher frequency.

The 5 Pipeline Stages

Stage	Name	What Happens	Hardware
IF	Instruction Fetch	Read instruction at PC; PC += 4	IMEM, PC register
ID	Instruction Decode	Read rs1/rs2 from RegFile; generate immediate; decode control signals	RegFile, ImmGen, Control
EX	Execute	ALU computes result or load/store address; branch condition evaluated	ALU, BranchUnit
MEM	Memory	Load or store to DMEM	DMEM
WB	Write-Back	Write result to register file	RegFile write port, WB mux

Pipeline Throughput vs Latency

Consider 5 instructions executing on both CPU types:

Single-cycle (5 ns/instr):
  I1: |─────────────────────|  5 ns
  I2:                        |─────────────────────|  10 ns
  I3:                                               |─────────────────────| 15 ns

Pipelined (1 ns/stage):
  Cycle:  1    2    3    4    5    6    7    8    9
  I1:    [IF] [ID] [EX] [MEM][WB]
  I2:         [IF] [ID] [EX] [MEM][WB]
  I3:              [IF] [ID] [EX] [MEM][WB]
  I4:                   [IF] [ID] [EX] [MEM][WB]
  I5:                        [IF] [ID] [EX] [MEM][WB]

  5 instructions complete in 9 cycles (not 25). Throughput: 1 instr/cycle.

Pipeline Registers

Between each pair of stages sits a pipeline register — a bank of flip-flops that captures all the values the next stage needs. At each rising clock edge, every pipeline register latches its inputs simultaneously, moving every in-flight instruction one stage forward.

There are four pipeline registers:

IF/ID — holds the fetched instruction word and the PC
ID/EX — holds decoded operands (rdata1, rdata2, imm), control signals, rd, PC
EX/MEM — holds ALU result, rdata2 (for stores), control signals, rd
MEM/WB — holds ALU result or DMEM read data, control signals, rd

pipe_regs.v

// pipe_regs.v — The four inter-stage pipeline registers
// for a 5-stage RISC-V pipeline

// ── IF/ID Pipeline Register ───────────────────────────────────────
module if_id_reg (
    input         clk, rst,
    input         stall,   // hold current value if stall=1
    input  [31:0] if_pc, if_inst,
    output reg [31:0] id_pc, id_inst
);
    always @(posedge clk or posedge rst) begin
        if (rst) begin
            id_pc   <= 0;
            id_inst <= 32'h00000013; // NOP
        end else if (!stall) begin
            id_pc   <= if_pc;
            id_inst <= if_inst;
        end
    end
endmodule

// ── ID/EX Pipeline Register ───────────────────────────────────────
module id_ex_reg (
    input        clk, rst, flush,
    // Control signals
    input        id_RegWrite, id_ALUSrc, id_MemRead,
    input        id_MemWrite, id_WBSel, id_Branch,
    input        id_Jal, id_Jalr,
    input [3:0]  id_ALUOp,
    // Data
    input [31:0] id_pc, id_rdata1, id_rdata2, id_imm,
    input [4:0]  id_rs1, id_rs2, id_rd,
    input [2:0]  id_funct3,
    // Outputs
    output reg        ex_RegWrite, ex_ALUSrc, ex_MemRead,
    output reg        ex_MemWrite, ex_WBSel, ex_Branch,
    output reg        ex_Jal, ex_Jalr,
    output reg [3:0]  ex_ALUOp,
    output reg [31:0] ex_pc, ex_rdata1, ex_rdata2, ex_imm,
    output reg [4:0]  ex_rs1, ex_rs2, ex_rd,
    output reg [2:0]  ex_funct3
);
    always @(posedge clk or posedge rst) begin
        if (rst || flush) begin
            // Insert NOP bubble
            ex_RegWrite <= 0; ex_ALUSrc <= 0; ex_MemRead <= 0;
            ex_MemWrite <= 0; ex_WBSel  <= 0; ex_Branch  <= 0;
            ex_Jal <= 0; ex_Jalr <= 0; ex_ALUOp <= 0;
            ex_pc <= 0; ex_rdata1 <= 0; ex_rdata2 <= 0; ex_imm <= 0;
            ex_rs1 <= 0; ex_rs2 <= 0; ex_rd <= 0; ex_funct3 <= 0;
        end else begin
            ex_RegWrite <= id_RegWrite; ex_ALUSrc <= id_ALUSrc;
            ex_MemRead  <= id_MemRead;  ex_MemWrite <= id_MemWrite;
            ex_WBSel    <= id_WBSel;   ex_Branch   <= id_Branch;
            ex_Jal      <= id_Jal;     ex_Jalr     <= id_Jalr;
            ex_ALUOp    <= id_ALUOp;
            ex_pc       <= id_pc;   ex_rdata1 <= id_rdata1;
            ex_rdata2   <= id_rdata2; ex_imm   <= id_imm;
            ex_rs1      <= id_rs1;  ex_rs2    <= id_rs2;
            ex_rd       <= id_rd;   ex_funct3 <= id_funct3;
        end
    end
endmodule

// ── EX/MEM Pipeline Register ──────────────────────────────────────
module ex_mem_reg (
    input        clk, rst,
    input        ex_RegWrite, ex_MemRead, ex_MemWrite,
    input        ex_WBSel, ex_Branch, ex_branch_taken,
    input [31:0] ex_alu_out, ex_rdata2, ex_branch_target,
    input [4:0]  ex_rd,
    input [2:0]  ex_funct3,
    output reg        mem_RegWrite, mem_MemRead, mem_MemWrite,
    output reg        mem_WBSel, mem_Branch, mem_branch_taken,
    output reg [31:0] mem_alu_out, mem_rdata2, mem_branch_target,
    output reg [4:0]  mem_rd,
    output reg [2:0]  mem_funct3
);
    always @(posedge clk or posedge rst) begin
        if (rst) begin
            mem_RegWrite <= 0; mem_MemRead <= 0; mem_MemWrite <= 0;
            mem_WBSel <= 0; mem_Branch <= 0; mem_branch_taken <= 0;
            mem_alu_out <= 0; mem_rdata2 <= 0; mem_branch_target <= 0;
            mem_rd <= 0; mem_funct3 <= 0;
        end else begin
            mem_RegWrite     <= ex_RegWrite;
            mem_MemRead      <= ex_MemRead;
            mem_MemWrite     <= ex_MemWrite;
            mem_WBSel        <= ex_WBSel;
            mem_Branch       <= ex_Branch;
            mem_branch_taken <= ex_branch_taken;
            mem_alu_out      <= ex_alu_out;
            mem_rdata2       <= ex_rdata2;
            mem_branch_target <= ex_branch_target;
            mem_rd           <= ex_rd;
            mem_funct3       <= ex_funct3;
        end
    end
endmodule

// ── MEM/WB Pipeline Register ──────────────────────────────────────
module mem_wb_reg (
    input        clk, rst,
    input        mem_RegWrite, mem_WBSel,
    input [31:0] mem_alu_out, mem_dmem_rdata,
    input [4:0]  mem_rd,
    output reg        wb_RegWrite, wb_WBSel,
    output reg [31:0] wb_alu_out, wb_dmem_rdata,
    output reg [4:0]  wb_rd
);
    always @(posedge clk or posedge rst) begin
        if (rst) begin
            wb_RegWrite <= 0; wb_WBSel <= 0;
            wb_alu_out <= 0; wb_dmem_rdata <= 0; wb_rd <= 0;
        end else begin
            wb_RegWrite   <= mem_RegWrite;
            wb_WBSel      <= mem_WBSel;
            wb_alu_out    <= mem_alu_out;
            wb_dmem_rdata <= mem_dmem_rdata;
            wb_rd         <= mem_rd;
        end
    end
endmodule

Hazards — The Challenge of Pipelining

Pipelining creates hazards — situations where a later instruction depends on a result that has not yet been written back. There are three types:

Data hazard (RAW) — Read After Write: instruction N+1 needs a register that instruction N is still computing. Solved by forwarding (Day 18).
Load-use hazard — A load result is not available until after MEM. Requires a 1-cycle stall even with forwarding (Day 20).
Control hazard — A branch target is not known until EX. The two instructions after the branch may need to be flushed (Day 19).

Day 17 Takeaways

The 5 stages are IF → ID → EX → MEM → WB. Each takes one clock cycle.
Pipeline registers are the only new hardware — four banks of flip-flops between stages.
Throughput becomes 1 instruction per cycle (IPC = 1) once the pipeline is full.
Latency for a single instruction is still 5 cycles — but with many instructions in flight, throughput is much higher.
Days 18–20 handle the three types of hazards that prevent the pipeline from achieving IPC=1 in all programs.

FAQ

What are the 5 stages of a RISC-V pipeline?

IF fetches the instruction; ID decodes it and reads registers; EX runs the ALU; MEM accesses data memory; WB writes the result back to the register file.

What are pipeline registers?

Banks of flip-flops between stages that hold all values needed by the next stage. They update on every clock edge, moving instructions forward one stage per cycle.

How does pipelining improve throughput?

Different instructions occupy different stages simultaneously. Once full, the pipeline completes one instruction per cycle — the same throughput as 5 single-cycle CPUs running in parallel at 5× the clock frequency.

← Full roadmap