The single-cycle CPU from Days 7–15 works correctly, but it is slow — the clock must wait for the longest instruction path. Pipelining splits execution into 5 parallel stages so that while one instruction is in EX, the next is in ID and the one after is already being fetched. The result: one instruction completes every clock cycle at a much higher frequency.
| Stage | Name | What Happens | Hardware |
|---|---|---|---|
| IF | Instruction Fetch | Read instruction at PC; PC += 4 | IMEM, PC register |
| ID | Instruction Decode | Read rs1/rs2 from RegFile; generate immediate; decode control signals | RegFile, ImmGen, Control |
| EX | Execute | ALU computes result or load/store address; branch condition evaluated | ALU, BranchUnit |
| MEM | Memory | Load or store to DMEM | DMEM |
| WB | Write-Back | Write result to register file | RegFile write port, WB mux |
Consider 5 instructions executing on both CPU types:
Single-cycle (5 ns/instr): I1: |─────────────────────| 5 ns I2: |─────────────────────| 10 ns I3: |─────────────────────| 15 ns Pipelined (1 ns/stage): Cycle: 1 2 3 4 5 6 7 8 9 I1: [IF] [ID] [EX] [MEM][WB] I2: [IF] [ID] [EX] [MEM][WB] I3: [IF] [ID] [EX] [MEM][WB] I4: [IF] [ID] [EX] [MEM][WB] I5: [IF] [ID] [EX] [MEM][WB] 5 instructions complete in 9 cycles (not 25). Throughput: 1 instr/cycle.
Between each pair of stages sits a pipeline register — a bank of flip-flops that captures all the values the next stage needs. At each rising clock edge, every pipeline register latches its inputs simultaneously, moving every in-flight instruction one stage forward.
There are four pipeline registers:
// pipe_regs.v — The four inter-stage pipeline registers
// for a 5-stage RISC-V pipeline
// ── IF/ID Pipeline Register ───────────────────────────────────────
module if_id_reg (
input clk, rst,
input stall, // hold current value if stall=1
input [31:0] if_pc, if_inst,
output reg [31:0] id_pc, id_inst
);
always @(posedge clk or posedge rst) begin
if (rst) begin
id_pc <= 0;
id_inst <= 32'h00000013; // NOP
end else if (!stall) begin
id_pc <= if_pc;
id_inst <= if_inst;
end
end
endmodule
// ── ID/EX Pipeline Register ───────────────────────────────────────
module id_ex_reg (
input clk, rst, flush,
// Control signals
input id_RegWrite, id_ALUSrc, id_MemRead,
input id_MemWrite, id_WBSel, id_Branch,
input id_Jal, id_Jalr,
input [3:0] id_ALUOp,
// Data
input [31:0] id_pc, id_rdata1, id_rdata2, id_imm,
input [4:0] id_rs1, id_rs2, id_rd,
input [2:0] id_funct3,
// Outputs
output reg ex_RegWrite, ex_ALUSrc, ex_MemRead,
output reg ex_MemWrite, ex_WBSel, ex_Branch,
output reg ex_Jal, ex_Jalr,
output reg [3:0] ex_ALUOp,
output reg [31:0] ex_pc, ex_rdata1, ex_rdata2, ex_imm,
output reg [4:0] ex_rs1, ex_rs2, ex_rd,
output reg [2:0] ex_funct3
);
always @(posedge clk or posedge rst) begin
if (rst || flush) begin
// Insert NOP bubble
ex_RegWrite <= 0; ex_ALUSrc <= 0; ex_MemRead <= 0;
ex_MemWrite <= 0; ex_WBSel <= 0; ex_Branch <= 0;
ex_Jal <= 0; ex_Jalr <= 0; ex_ALUOp <= 0;
ex_pc <= 0; ex_rdata1 <= 0; ex_rdata2 <= 0; ex_imm <= 0;
ex_rs1 <= 0; ex_rs2 <= 0; ex_rd <= 0; ex_funct3 <= 0;
end else begin
ex_RegWrite <= id_RegWrite; ex_ALUSrc <= id_ALUSrc;
ex_MemRead <= id_MemRead; ex_MemWrite <= id_MemWrite;
ex_WBSel <= id_WBSel; ex_Branch <= id_Branch;
ex_Jal <= id_Jal; ex_Jalr <= id_Jalr;
ex_ALUOp <= id_ALUOp;
ex_pc <= id_pc; ex_rdata1 <= id_rdata1;
ex_rdata2 <= id_rdata2; ex_imm <= id_imm;
ex_rs1 <= id_rs1; ex_rs2 <= id_rs2;
ex_rd <= id_rd; ex_funct3 <= id_funct3;
end
end
endmodule
// ── EX/MEM Pipeline Register ──────────────────────────────────────
module ex_mem_reg (
input clk, rst,
input ex_RegWrite, ex_MemRead, ex_MemWrite,
input ex_WBSel, ex_Branch, ex_branch_taken,
input [31:0] ex_alu_out, ex_rdata2, ex_branch_target,
input [4:0] ex_rd,
input [2:0] ex_funct3,
output reg mem_RegWrite, mem_MemRead, mem_MemWrite,
output reg mem_WBSel, mem_Branch, mem_branch_taken,
output reg [31:0] mem_alu_out, mem_rdata2, mem_branch_target,
output reg [4:0] mem_rd,
output reg [2:0] mem_funct3
);
always @(posedge clk or posedge rst) begin
if (rst) begin
mem_RegWrite <= 0; mem_MemRead <= 0; mem_MemWrite <= 0;
mem_WBSel <= 0; mem_Branch <= 0; mem_branch_taken <= 0;
mem_alu_out <= 0; mem_rdata2 <= 0; mem_branch_target <= 0;
mem_rd <= 0; mem_funct3 <= 0;
end else begin
mem_RegWrite <= ex_RegWrite;
mem_MemRead <= ex_MemRead;
mem_MemWrite <= ex_MemWrite;
mem_WBSel <= ex_WBSel;
mem_Branch <= ex_Branch;
mem_branch_taken <= ex_branch_taken;
mem_alu_out <= ex_alu_out;
mem_rdata2 <= ex_rdata2;
mem_branch_target <= ex_branch_target;
mem_rd <= ex_rd;
mem_funct3 <= ex_funct3;
end
end
endmodule
// ── MEM/WB Pipeline Register ──────────────────────────────────────
module mem_wb_reg (
input clk, rst,
input mem_RegWrite, mem_WBSel,
input [31:0] mem_alu_out, mem_dmem_rdata,
input [4:0] mem_rd,
output reg wb_RegWrite, wb_WBSel,
output reg [31:0] wb_alu_out, wb_dmem_rdata,
output reg [4:0] wb_rd
);
always @(posedge clk or posedge rst) begin
if (rst) begin
wb_RegWrite <= 0; wb_WBSel <= 0;
wb_alu_out <= 0; wb_dmem_rdata <= 0; wb_rd <= 0;
end else begin
wb_RegWrite <= mem_RegWrite;
wb_WBSel <= mem_WBSel;
wb_alu_out <= mem_alu_out;
wb_dmem_rdata <= mem_dmem_rdata;
wb_rd <= mem_rd;
end
end
endmodule
Pipelining creates hazards — situations where a later instruction depends on a result that has not yet been written back. There are three types:
IF fetches the instruction; ID decodes it and reads registers; EX runs the ALU; MEM accesses data memory; WB writes the result back to the register file.
Banks of flip-flops between stages that hold all values needed by the next stage. They update on every clock edge, moving instructions forward one stage per cycle.
Different instructions occupy different stages simultaneously. Once full, the pipeline completes one instruction per cycle — the same throughput as 5 single-cycle CPUs running in parallel at 5× the clock frequency.