The pipeline from Day 17 completes one instruction per cycle — in theory. In practice, instructions depend on each other. When instruction N+1 needs the result of instruction N, and that result is still flowing through the pipeline, we have a RAW (Read After Write) hazard. The solution — forwarding — routes results directly from later pipeline stages back to the ALU inputs.
Consider this sequence:
add x1, x2, x3 // writes x1 in WB (cycle 5) sub x4, x1, x5 // reads x1 in ID (cycle 3) — WRONG VALUE! and x6, x1, x7 // reads x1 in ID (cycle 4) — still wrong!
Without forwarding, the register file read in ID returns the old value of x1 because WB has not happened yet. The pipeline would produce incorrect results.
Instead of waiting for WB, we forward the result from whichever pipeline register already holds it:
| Port | Direction | Width | Description |
|---|---|---|---|
| ex_rs1 | Input | 5 | Source register 1 of current EX-stage instruction |
| ex_rs2 | Input | 5 | Source register 2 of current EX-stage instruction |
| mem_rd | Input | 5 | Destination register of instruction in MEM stage |
| mem_RegWrite | Input | 1 | MEM stage instruction writes a register |
| wb_rd | Input | 5 | Destination register of instruction in WB stage |
| wb_RegWrite | Input | 1 | WB stage instruction writes a register |
| forwardA | Output | 2 | ALU input A mux select: 00=regfile, 10=EX/MEM, 01=MEM/WB |
| forwardB | Output | 2 | ALU input B mux select: 00=regfile, 10=EX/MEM, 01=MEM/WB |
// forward_unit.v — Detects RAW hazards and generates forwarding mux selects
// forwardA/B encoding:
// 2'b00 — use register file value (no hazard)
// 2'b10 — forward from EX/MEM pipeline register (1 instruction ago)
// 2'b01 — forward from MEM/WB pipeline register (2 instructions ago)
module forward_unit (
input [4:0] ex_rs1,
input [4:0] ex_rs2,
input [4:0] mem_rd,
input mem_RegWrite,
input [4:0] wb_rd,
input wb_RegWrite,
output reg [1:0] forwardA,
output reg [1:0] forwardB
);
always @(*) begin
// Default: no forwarding
forwardA = 2'b00;
forwardB = 2'b00;
// EX/MEM forwarding (higher priority — more recent result)
if (mem_RegWrite && (mem_rd != 5'b0)) begin
if (mem_rd == ex_rs1) forwardA = 2'b10;
if (mem_rd == ex_rs2) forwardB = 2'b10;
end
// MEM/WB forwarding (lower priority)
if (wb_RegWrite && (wb_rd != 5'b0)) begin
// Only forward from WB if EX/MEM didn't already cover it
if (wb_rd == ex_rs1 && !(mem_RegWrite && mem_rd == ex_rs1))
forwardA = 2'b01;
if (wb_rd == ex_rs2 && !(mem_RegWrite && mem_rd == ex_rs2))
forwardB = 2'b01;
end
end
endmodule
The ALU inputs are now driven by 3-input muxes. The forwardA and forwardB signals from forward_unit select which value to feed:
// In the EX stage of the pipelined CPU:
wire [31:0] fwd_a =
(forwardA == 2'b10) ? mem_alu_out : // from EX/MEM
(forwardA == 2'b01) ? wb_result : // from MEM/WB
ex_rdata1; // from register file
wire [31:0] fwd_b_pre =
(forwardB == 2'b10) ? mem_alu_out :
(forwardB == 2'b01) ? wb_result :
ex_rdata2;
// ALUSrc still selects between register and immediate
wire [31:0] fwd_b = ex_ALUSrc ? ex_imm : fwd_b_pre;
alu alu0 (.a(fwd_a), .b(fwd_b), .op(ex_ALUOp),
.result(alu_out), .zero(zero), .lt(lt), .ltu(ltu));
// tb_forward.v — Test EX/MEM and MEM/WB forwarding
// add x1,x0,10 → add x2,x1,5 (needs EX->EX forward for x1)
// Expected: x1=10, x2=15
`timescale 1ns/1ps
module tb_forward;
reg clk=0, rst=1;
always #5 clk=~clk;
// Assume pipelined_core is the pipelined CPU module
pipelined_core dut(.clk(clk),.rst(rst));
initial begin
// addi x1,x0,10 = 00a00093
dut.imem0.mem[0] = 32'h00a00093;
// add x2,x1,x1 = 00108133 (x2 = x1+x1 = 20 — tests forwarding twice)
dut.imem0.mem[1] = 32'h00108133;
// addi x3,x2,3 = 00310193 (x3 = x2+3 = 23 — tests MEM/WB fwd)
dut.imem0.mem[2] = 32'h00310193;
dut.imem0.mem[3] = 32'h0000006f; // halt
$dumpfile("tb_forward.vcd"); $dumpvars(0,tb_forward);
@(posedge clk); @(posedge clk); rst=0;
repeat(15) @(posedge clk); #1;
if(dut.rf.regs[1]===32'd10) $display("PASS: x1=10");
else $display("FAIL: x1=%0d",dut.rf.regs[1]);
if(dut.rf.regs[2]===32'd20) $display("PASS: x2=20");
else $display("FAIL: x2=%0d",dut.rf.regs[2]);
if(dut.rf.regs[3]===32'd23) $display("PASS: x3=23");
else $display("FAIL: x3=%0d",dut.rf.regs[3]);
$finish;
end
endmodule
Read After Write: instruction N+1 reads a register before instruction N's write has reached the register file. In a 5-stage pipeline, WB happens 4 cycles after ID reads, so any two back-to-back instructions with overlapping register names create a RAW hazard.
Forwarding routes the ALU result from the EX/MEM or MEM/WB pipeline register directly to the ALU input of the current EX-stage instruction, bypassing the register file. It eliminates most RAW stalls without adding clock cycles.
Load-use hazards: a LW result isn't available until after MEM, one cycle too late for forwarding. The hazard unit (Day 20) inserts a one-cycle stall in this case.