Watch stall bubbles appear in the Gantt chart. Toggle forwarding and see them disappear. Trigger a branch flush and count the penalty cycles. CPI updates in real time.
Pipelining overlaps the execution of multiple instructions by dividing the processor into stages. While I1 is in EX, I2 is in ID, and I3 is in IF — all simultaneously. An ideal N-stage pipeline executes one instruction per clock cycle: CPI = 1.0.
The classic 5-stage RISC pipeline: IF (fetch instruction) → ID (decode + read registers) → EX (ALU operation) → MEM (memory access) → WB (write result to register file).
A data hazard occurs when an instruction needs a value that a previous instruction hasn't finished computing yet. The three types:
| Type | Full Name | Description | In 5-stage pipeline? |
|---|---|---|---|
| RAW | Read After Write | I2 reads a register before I1 writes it | Yes — the main hazard |
| WAR | Write After Read | I2 writes a register before I1 reads it | Not in a simple in-order pipeline |
| WAW | Write After Write | I2 writes a register before I1 writes it | Not in a simple in-order pipeline |
Forwarding (bypassing) routes the ALU result directly from the EX/MEM pipeline register back to the ALU input — before WB writes it to the register file. This eliminates the 2-cycle stall completely.
A Load-Use hazard occurs when the instruction immediately after a LW (load word) reads the loaded register. The load result is only available at the end of MEM, but the dependent instruction needs it at the start of EX — one cycle too early. Even with forwarding, exactly 1 stall cycle is always required. The compiler can often hide this by reordering instructions.
A control hazard occurs when a branch instruction changes the PC. In a 5-stage pipeline, the branch target is resolved at the end of the EX stage (cycle 3). By that time, two instructions have already been fetched from the wrong path — they must be flushed (turned into bubbles). This is a 2-cycle branch penalty.
Branch prediction eliminates this penalty for correctly-predicted branches. Modern CPUs achieve >95% prediction accuracy, reducing the average branch penalty to <0.1 cycles per branch.
| Strategy | Branch Penalty | Notes |
|---|---|---|
| Flush on branch | 2 cycles always | Simple, correct, used in basic RISC-V cores |
| Static prediction (not taken) | 0 if not taken, 2 if taken | Good for loops with few taken branches |
| Dynamic 2-bit predictor | <0.2 cycles avg | Modern CPUs; 95%+ accuracy |
| Delayed branch slot | 0 (one useful instr after branch) | Used in MIPS; fills branch shadow with useful work |
A structural hazard occurs when two instructions need the same hardware resource simultaneously. Example: if there is only one memory port, a load instruction in MEM and an instruction fetch in IF cannot both proceed in the same cycle. Solution: separate instruction memory (I-cache) from data memory (D-cache). Modern processors implement this with separate L1-I and L1-D caches, eliminating structural hazards in the standard 5-stage pipeline.
| Hazard Type | Cause | Solution | CPI Impact |
|---|---|---|---|
| RAW (no fwd) | Register written 2–3 cycles later | Forwarding (bypassing) | +2 stalls per dependency |
| RAW (with fwd) | EX/MEM result needed immediately | EX→EX + MEM→EX forward paths | 0 extra (except load-use) |
| Load-Use | Load result in MEM, needed in EX | 1 stall + instruction reordering | +1 stall per occurrence |
| Control (branch) | Target not known until EX | Branch prediction | +2 cycles per taken branch (no prediction) |
| Structural | Single memory port | Split I/D cache | 0 with separate caches |
Detects RAW and load-use hazards. Outputs a stall signal that freezes the IF/ID pipeline registers and inserts a bubble in EX.
// Hazard Detection Unit
// Stalls the pipeline when a RAW hazard cannot be forwarded
module hazard_detect (
input wire id_rs1_valid, id_rs2_valid, // source regs used?
input wire [4:0] id_rs1, id_rs2, // source registers (ID stage)
input wire [4:0] ex_rd, mem_rd, // dest registers (EX, MEM stages)
input wire ex_reg_write, mem_reg_write, // do EX/MEM stages write?
input wire ex_is_load, // is EX a load instruction?
input wire forwarding_enabled, // forwarding unit present?
output reg stall // 1 = insert bubble, freeze IF/ID
);
wire raw_ex = ex_reg_write && (ex_rd != 5'd0) &&
((id_rs1_valid && ex_rd == id_rs1) ||
(id_rs2_valid && ex_rd == id_rs2));
wire raw_mem = mem_reg_write && (mem_rd != 5'd0) &&
((id_rs1_valid && mem_rd == id_rs1) ||
(id_rs2_valid && mem_rd == id_rs2));
always @(*) begin
stall = 1'b0;
if (ex_is_load && raw_ex)
stall = 1'b1; // Load-use: always 1 stall
else if (!forwarding_enabled) begin
if (raw_ex || raw_mem)
stall = 1'b1; // No forwarding: stall for RAW
end
// With forwarding + non-load RAW: no stall (forwarding unit handles it)
end
endmodule
// Forwarding Unit — selects ALU input source
// ForwardA/B: 00=register file, 01=MEM/WB, 10=EX/MEM
module forwarding_unit (
input wire [4:0] ex_rs1, ex_rs2, // source regs in EX stage
input wire [4:0] mem_rd, wb_rd, // dest regs in MEM, WB stages
input wire mem_reg_write, wb_reg_write,// do they write?
output reg [1:0] forwardA, forwardB // mux select for ALU inputs
);
always @(*) begin
// Default: use register file value
forwardA = 2'b00; forwardB = 2'b00;
// EX/MEM forwarding (higher priority — most recent value)
if (mem_reg_write && mem_rd != 5'd0) begin
if (mem_rd == ex_rs1) forwardA = 2'b10; // EX→EX forward
if (mem_rd == ex_rs2) forwardB = 2'b10;
end
// MEM/WB forwarding (lower priority)
if (wb_reg_write && wb_rd != 5'd0) begin
if (wb_rd == ex_rs1 && forwardA == 2'b00) forwardA = 2'b01;
if (wb_rd == ex_rs2 && forwardB == 2'b00) forwardB = 2'b01;
end
end
endmodule
// Pipeline register flush for control hazards
// When a branch resolves in EX, flush the IF/ID and ID/EX registers
module pipeline_ctrl (
input wire clk, rst_n,
input wire stall, // from hazard detection unit
input wire branch_taken, // branch resolved in EX stage
// Control signals for pipeline registers
output reg if_id_write, // 1=update, 0=freeze (stall)
output reg pc_write, // 1=update PC, 0=freeze
output reg if_id_flush, // flush IF/ID register
output reg id_ex_flush // flush ID/EX register (insert NOP)
);
always @(*) begin
if_id_write = ~stall;
pc_write = ~stall;
if_id_flush = branch_taken; // wrong-path instr in IF → NOP
id_ex_flush = stall | branch_taken; // stall bubble OR branch flush
end
endmodule
// In the IF/ID pipeline register:
// if (if_id_flush) if_id_reg <= NOP;
// else if (if_id_write) if_id_reg <= {pc+4, instruction};
// In the ID/EX pipeline register:
// if (id_ex_flush) id_ex_reg <= NOP; // insert bubble
// else id_ex_reg <= {control_signals, ...};
A pipeline hazard is any condition that prevents the next instruction from executing in the next clock cycle. The three types are: data hazards (instruction needs a result not yet produced), control hazards (branch target not yet known), and structural hazards (two instructions need the same hardware resource simultaneously). Hazards reduce pipeline efficiency and increase CPI above the ideal value of 1.0.
A RAW (Read After Write) hazard occurs when an instruction reads a register before a preceding instruction has written its result. Without forwarding, a 5-stage pipeline requires 2 stall cycles per RAW dependency. With EX→EX forwarding, zero stalls are needed for arithmetic dependencies. The exception is load-use hazards, which always require 1 stall even with forwarding.
Forwarding routes the ALU result directly from the EX/MEM pipeline register (or MEM/WB register) back to the ALU input — without waiting for the result to be written to the register file. A forwarding unit detects the dependency and inserts a multiplexer in front of the ALU. The correct value arrives just in time for the dependent instruction's EX stage, eliminating the stall entirely.
A load instruction produces its result at the end of the MEM stage. The dependent instruction needs that value at the start of its EX stage. If the dependent instruction is immediately after the load, both would be in MEM and EX simultaneously — but the load hasn't finished MEM yet when the dependent instruction starts EX. The result simply doesn't exist early enough, so one stall cycle is unavoidable. The compiler can often reorder instructions to fill this slot.
In a classic 5-stage pipeline, the branch target is resolved at the end of the EX stage (cycle 3). By that point, two instructions have been fetched from the wrong path (in cycles 2 and 3) and must be flushed — this is the 2-cycle branch penalty. Modern processors use dynamic branch predictors to achieve >95% accuracy, reducing the average penalty to under 0.1 cycles per branch instruction.
CPI (Cycles Per Instruction) = total cycles / instructions completed. An ideal pipeline achieves CPI = 1.0. RAW hazards without forwarding add 2 stall cycles per dependency, pushing CPI toward 3.0 for dependent instruction streams. Load-use hazards add 1 stall. Taken branches add 2 cycles penalty. With full forwarding and good branch prediction, real programs typically achieve CPI of 1.05–1.3.