Forwarding (Day 18) eliminates most RAW hazards. But there is one case forwarding cannot solve: a load followed immediately by an instruction that uses the loaded value. The load data is not available until after the MEM stage — one cycle too late for the next instruction's EX stage. The solution is a mandatory 1-cycle stall. Today we build the hazard_unit.v that detects this and acts.
Cycle: 1 2 3 4 5 6 7
LW x1: [IF] [ID] [EX] [MEM][WB]
ADD x2,x1: [IF] [ID] [EX]...
↑ needs x1 here (cycle 4, EX start)
↑ but x1 is in MEM here (cycle 4, MEM end)
Even with EX/MEM forwarding, the data arrives one cycle too late.
The only correct solution is to delay ADD by one cycle, so it enters EX in cycle 5 when the load data is available from MEM/WB forwarding.
| Port | Direction | Width | Description |
|---|---|---|---|
| id_ex_MemRead | Input | 1 | 1 if instruction in EX is a load (LW/LH/LB) |
| id_ex_rd | Input | 5 | Destination register of the load in EX |
| if_id_rs1 | Input | 5 | Source register 1 of next instruction (in ID) |
| if_id_rs2 | Input | 5 | Source register 2 of next instruction (in ID) |
| stall | Output | 1 | 1 = freeze PC and IF/ID; insert bubble in ID/EX |
// hazard_unit.v — Detects load-use hazards and generates stall
// When a load is in EX and the next instruction in ID uses the
// loaded register, we must stall for one cycle.
module hazard_unit (
input id_ex_MemRead, // is EX instruction a load?
input [4:0] id_ex_rd, // load destination register
input [4:0] if_id_rs1, // ID instruction source 1
input [4:0] if_id_rs2, // ID instruction source 2
output reg stall // 1 = insert stall
);
always @(*) begin
stall = 1'b0;
if (id_ex_MemRead && (id_ex_rd != 5'd0)) begin
if ((id_ex_rd == if_id_rs1) || (id_ex_rd == if_id_rs2))
stall = 1'b1;
end
end
endmodule
When stall=1 is asserted, three things happen simultaneously on the next clock edge:
The net effect: the load instruction advances normally through MEM, but the dependent instruction gets to re-read the register (via MEM/WB forwarding) one cycle later.
wire stall;
hazard_unit hu (
.id_ex_MemRead(ex_MemRead),
.id_ex_rd (ex_rd),
.if_id_rs1 (id_rs1),
.if_id_rs2 (id_rs2),
.stall (stall)
);
// PC: hold when stall
always @(posedge clk or posedge rst)
if (rst) pc <= 0;
else if (!stall) pc <= pc_next;
// if stall, pc keeps its current value
// IF/ID: pass stall through to the register
if_id_reg ifid (.stall(stall), ...);
// ID/EX: force flush (NOP bubble) on stall
id_ex_reg idex (.flush(stall), ...);
| Hazard Type | Detection | Resolution | CPI impact |
|---|---|---|---|
| RAW (non-load) | forward_unit: EX/MEM rd == EX rs1/rs2 | Forwarding — no stall | 0 cycles lost |
| Load-use | hazard_unit: EX is load AND rd matches ID rs | 1-cycle stall + MEM/WB forward | 1 cycle per load-use pair |
| Branch taken | EX branch_taken=1 | Flush IF/ID and ID/EX (2 NOP bubbles) | 2 cycles per taken branch |
When a load instruction (LW/LH/LB) is immediately followed by an instruction that reads the loaded register, the data arrives from DMEM one cycle too late for forwarding. A 1-cycle stall is mandatory.
It asserts stall=1. The PC and IF/ID register hold their values (replay the same instruction). The ID/EX register is flushed to a NOP bubble. One cycle later, forwarding from MEM/WB provides the loaded value.
Compilers perform instruction scheduling — placing an independent instruction between the load and the user, filling the delay slot with useful work and eliminating the stall cycle.