HomeRISC-V from ScratchDay 18
DAY 18 · PHASE 3 — PIPELINE & OPTIMIZE

Data Hazards & Forwarding

By EcrioniX · Updated 2026-06-11

The pipeline from Day 17 completes one instruction per cycle — in theory. In practice, instructions depend on each other. When instruction N+1 needs the result of instruction N, and that result is still flowing through the pipeline, we have a RAW (Read After Write) hazard. The solution — forwarding — routes results directly from later pipeline stages back to the ALU inputs.

The RAW Hazard Problem

Consider this sequence:

add  x1, x2, x3   // writes x1 in WB (cycle 5)
sub  x4, x1, x5   // reads  x1 in ID (cycle 3) — WRONG VALUE!
and  x6, x1, x7   // reads  x1 in ID (cycle 4) — still wrong!

Without forwarding, the register file read in ID returns the old value of x1 because WB has not happened yet. The pipeline would produce incorrect results.

The Solution: Forwarding

Instead of waiting for WB, we forward the result from whichever pipeline register already holds it:

forward_unit.v — Port Table

PortDirectionWidthDescription
ex_rs1Input5Source register 1 of current EX-stage instruction
ex_rs2Input5Source register 2 of current EX-stage instruction
mem_rdInput5Destination register of instruction in MEM stage
mem_RegWriteInput1MEM stage instruction writes a register
wb_rdInput5Destination register of instruction in WB stage
wb_RegWriteInput1WB stage instruction writes a register
forwardAOutput2ALU input A mux select: 00=regfile, 10=EX/MEM, 01=MEM/WB
forwardBOutput2ALU input B mux select: 00=regfile, 10=EX/MEM, 01=MEM/WB
forward_unit.v
// forward_unit.v — Detects RAW hazards and generates forwarding mux selects
// forwardA/B encoding:
//   2'b00 — use register file value (no hazard)
//   2'b10 — forward from EX/MEM pipeline register (1 instruction ago)
//   2'b01 — forward from MEM/WB pipeline register (2 instructions ago)
module forward_unit (
    input [4:0] ex_rs1,
    input [4:0] ex_rs2,
    input [4:0] mem_rd,
    input       mem_RegWrite,
    input [4:0] wb_rd,
    input       wb_RegWrite,
    output reg [1:0] forwardA,
    output reg [1:0] forwardB
);
    always @(*) begin
        // Default: no forwarding
        forwardA = 2'b00;
        forwardB = 2'b00;

        // EX/MEM forwarding (higher priority — more recent result)
        if (mem_RegWrite && (mem_rd != 5'b0)) begin
            if (mem_rd == ex_rs1) forwardA = 2'b10;
            if (mem_rd == ex_rs2) forwardB = 2'b10;
        end

        // MEM/WB forwarding (lower priority)
        if (wb_RegWrite && (wb_rd != 5'b0)) begin
            // Only forward from WB if EX/MEM didn't already cover it
            if (wb_rd == ex_rs1 && !(mem_RegWrite && mem_rd == ex_rs1))
                forwardA = 2'b01;
            if (wb_rd == ex_rs2 && !(mem_RegWrite && mem_rd == ex_rs2))
                forwardB = 2'b01;
        end
    end
endmodule

Using the Forward Unit in the EX Stage

The ALU inputs are now driven by 3-input muxes. The forwardA and forwardB signals from forward_unit select which value to feed:

alu_mux_snippet.v
// In the EX stage of the pipelined CPU:
wire [31:0] fwd_a =
    (forwardA == 2'b10) ? mem_alu_out  : // from EX/MEM
    (forwardA == 2'b01) ? wb_result    : // from MEM/WB
                          ex_rdata1;     // from register file

wire [31:0] fwd_b_pre =
    (forwardB == 2'b10) ? mem_alu_out  :
    (forwardB == 2'b01) ? wb_result    :
                          ex_rdata2;

// ALUSrc still selects between register and immediate
wire [31:0] fwd_b = ex_ALUSrc ? ex_imm : fwd_b_pre;

alu alu0 (.a(fwd_a), .b(fwd_b), .op(ex_ALUOp),
          .result(alu_out), .zero(zero), .lt(lt), .ltu(ltu));

Testbench — Forwarding Verification

tb_forward.v
// tb_forward.v — Test EX/MEM and MEM/WB forwarding
// add x1,x0,10   → add x2,x1,5   (needs EX->EX forward for x1)
// Expected: x1=10, x2=15
`timescale 1ns/1ps
module tb_forward;
    reg clk=0, rst=1;
    always #5 clk=~clk;

    // Assume pipelined_core is the pipelined CPU module
    pipelined_core dut(.clk(clk),.rst(rst));

    initial begin
        // addi x1,x0,10  = 00a00093
        dut.imem0.mem[0] = 32'h00a00093;
        // add  x2,x1,x1  = 00108133  (x2 = x1+x1 = 20 — tests forwarding twice)
        dut.imem0.mem[1] = 32'h00108133;
        // addi x3,x2,3   = 00310193  (x3 = x2+3 = 23 — tests MEM/WB fwd)
        dut.imem0.mem[2] = 32'h00310193;
        dut.imem0.mem[3] = 32'h0000006f; // halt
        $dumpfile("tb_forward.vcd"); $dumpvars(0,tb_forward);
        @(posedge clk); @(posedge clk); rst=0;
        repeat(15) @(posedge clk); #1;
        if(dut.rf.regs[1]===32'd10) $display("PASS: x1=10");
        else $display("FAIL: x1=%0d",dut.rf.regs[1]);
        if(dut.rf.regs[2]===32'd20) $display("PASS: x2=20");
        else $display("FAIL: x2=%0d",dut.rf.regs[2]);
        if(dut.rf.regs[3]===32'd23) $display("PASS: x3=23");
        else $display("FAIL: x3=%0d",dut.rf.regs[3]);
        $finish;
    end
endmodule

Day 18 Takeaways

FAQ

What is a RAW data hazard?

Read After Write: instruction N+1 reads a register before instruction N's write has reached the register file. In a 5-stage pipeline, WB happens 4 cycles after ID reads, so any two back-to-back instructions with overlapping register names create a RAW hazard.

What is forwarding (bypassing)?

Forwarding routes the ALU result from the EX/MEM or MEM/WB pipeline register directly to the ALU input of the current EX-stage instruction, bypassing the register file. It eliminates most RAW stalls without adding clock cycles.

When does forwarding not help?

Load-use hazards: a LW result isn't available until after MEM, one cycle too late for forwarding. The hazard unit (Day 20) inserts a one-cycle stall in this case.

Previous
← Day 17: The 5-Stage Pipeline

← Full roadmap