HomeRISC-V from ScratchDay 20
DAY 20 · PHASE 3 — PIPELINE & OPTIMIZE

Load-Use Stalls & the Hazard Unit

By EcrioniX · Updated 2026-06-11

Forwarding (Day 18) eliminates most RAW hazards. But there is one case forwarding cannot solve: a load followed immediately by an instruction that uses the loaded value. The load data is not available until after the MEM stage — one cycle too late for the next instruction's EX stage. The solution is a mandatory 1-cycle stall. Today we build the hazard_unit.v that detects this and acts.

Why Forwarding Cannot Help

Cycle:   1    2    3    4    5    6    7
LW x1:  [IF] [ID] [EX] [MEM][WB]
ADD x2,x1: [IF] [ID] [EX]...
                       ↑ needs x1 here (cycle 4, EX start)
                  ↑ but x1 is in MEM here (cycle 4, MEM end)

Even with EX/MEM forwarding, the data arrives one cycle too late.

The only correct solution is to delay ADD by one cycle, so it enters EX in cycle 5 when the load data is available from MEM/WB forwarding.

hazard_unit.v — Port Table

PortDirectionWidthDescription
id_ex_MemReadInput11 if instruction in EX is a load (LW/LH/LB)
id_ex_rdInput5Destination register of the load in EX
if_id_rs1Input5Source register 1 of next instruction (in ID)
if_id_rs2Input5Source register 2 of next instruction (in ID)
stallOutput11 = freeze PC and IF/ID; insert bubble in ID/EX
hazard_unit.v
// hazard_unit.v — Detects load-use hazards and generates stall
// When a load is in EX and the next instruction in ID uses the
// loaded register, we must stall for one cycle.
module hazard_unit (
    input       id_ex_MemRead, // is EX instruction a load?
    input [4:0] id_ex_rd,      // load destination register
    input [4:0] if_id_rs1,     // ID instruction source 1
    input [4:0] if_id_rs2,     // ID instruction source 2
    output reg  stall          // 1 = insert stall
);
    always @(*) begin
        stall = 1'b0;
        if (id_ex_MemRead && (id_ex_rd != 5'd0)) begin
            if ((id_ex_rd == if_id_rs1) || (id_ex_rd == if_id_rs2))
                stall = 1'b1;
        end
    end
endmodule

How the Stall Works

When stall=1 is asserted, three things happen simultaneously on the next clock edge:

  1. PC is frozen — it holds its current value so the same instruction is fetched again next cycle.
  2. IF/ID register is frozen — the instruction in ID is replayed next cycle (held, not advanced).
  3. ID/EX register is cleared — a NOP bubble is inserted, so the EX stage does nothing useful this cycle.

The net effect: the load instruction advances normally through MEM, but the dependent instruction gets to re-read the register (via MEM/WB forwarding) one cycle later.

stall_wiring_snippet.v
wire stall;
hazard_unit hu (
    .id_ex_MemRead(ex_MemRead),
    .id_ex_rd     (ex_rd),
    .if_id_rs1    (id_rs1),
    .if_id_rs2    (id_rs2),
    .stall        (stall)
);

// PC: hold when stall
always @(posedge clk or posedge rst)
    if (rst)        pc <= 0;
    else if (!stall) pc <= pc_next;
    // if stall, pc keeps its current value

// IF/ID: pass stall through to the register
if_id_reg ifid (.stall(stall), ...);

// ID/EX: force flush (NOP bubble) on stall
id_ex_reg idex (.flush(stall), ...);

The Complete Hazard Picture

Hazard TypeDetectionResolutionCPI impact
RAW (non-load)forward_unit: EX/MEM rd == EX rs1/rs2Forwarding — no stall0 cycles lost
Load-usehazard_unit: EX is load AND rd matches ID rs1-cycle stall + MEM/WB forward1 cycle per load-use pair
Branch takenEX branch_taken=1Flush IF/ID and ID/EX (2 NOP bubbles)2 cycles per taken branch

Day 20 Takeaways

FAQ

What is a load-use hazard?

When a load instruction (LW/LH/LB) is immediately followed by an instruction that reads the loaded register, the data arrives from DMEM one cycle too late for forwarding. A 1-cycle stall is mandatory.

How does the hazard unit stall the pipeline?

It asserts stall=1. The PC and IF/ID register hold their values (replay the same instruction). The ID/EX register is flushed to a NOP bubble. One cycle later, forwarding from MEM/WB provides the loaded value.

How do compilers help with load-use hazards?

Compilers perform instruction scheduling — placing an independent instruction between the load and the user, filling the delay slot with useful work and eliminating the stall cycle.

Previous
← Day 19: Control Hazards & Branch

← Full roadmap