How does forwarding (bypassing) work?

Forwarding routes the result of an instruction directly from the pipeline register where it appears to the stage that needs it, without waiting for the result to be written to the register file. EX→EX forwarding takes the result from the EX/MEM pipeline register and feeds it back to the ALU input in the same cycle. MEM→EX forwarding takes the result from the MEM/WB register. Together these eliminate most RAW stalls except load-use hazards.

What is a load-use hazard?

A load-use hazard occurs when the instruction immediately following a load instruction reads the register being loaded. Since the load result is not available until the end of the MEM stage, and the dependent instruction needs it at the start of the EX stage, one stall cycle is always required — even with full forwarding. The hardware inserts a pipeline bubble and the compiler can often eliminate this by reordering instructions.

Pipeline Hazard Lab – Data, Control & Structural Hazards

Q: What is a pipeline hazard?

A pipeline hazard is any condition that prevents the next instruction from executing in the next clock cycle. The three types are: data hazards (instruction needs a result not yet produced), control hazards (branch target not yet known), and structural hazards (two instructions need the same hardware resource simultaneously). Hazards reduce pipeline efficiency and increase CPI above the ideal value of 1.0.

Q: What is a RAW data hazard?

A RAW (Read After Write) hazard occurs when an instruction tries to read a register before a preceding instruction has written its result. In a 5-stage pipeline, if instruction I1 writes register R1 in the WB stage (cycle 5) and instruction I2 reads R1 in the ID stage (cycle 3), I2 reads a stale value. Without forwarding, 2 stall cycles must be inserted. With EX→EX forwarding, the stall is eliminated entirely.

Q: What is CPI and how do hazards affect it?

CPI (Cycles Per Instruction) measures pipeline efficiency: CPI = total cycles / instructions completed. An ideal pipeline achieves CPI = 1.0. Data hazards without forwarding add 2 stall cycles per RAW dependency, pushing CPI toward 3.0. Load-use hazards always add 1 stall cycle. A taken branch adds a 2-cycle penalty. Forwarding reduces data hazard penalty to near zero. Real programs with forwarding typically achieve CPI of 1.1–1.4.

What is Pipelining?

Pipelining overlaps the execution of multiple instructions by dividing the processor into stages. While I1 is in EX, I2 is in ID, and I3 is in IF — all simultaneously. An ideal N-stage pipeline executes one instruction per clock cycle: CPI = 1.0.

The classic 5-stage RISC pipeline: IF (fetch instruction) → ID (decode + read registers) → EX (ALU operation) → MEM (memory access) → WB (write result to register file).

Ideal Pipeline Timing (No Hazard)

Data Hazards

A data hazard occurs when an instruction needs a value that a previous instruction hasn't finished computing yet. The three types:

Type	Full Name	Description	In 5-stage pipeline?
RAW	Read After Write	I2 reads a register before I1 writes it	Yes — the main hazard
WAR	Write After Read	I2 writes a register before I1 reads it	Not in a simple in-order pipeline
WAW	Write After Write	I2 writes a register before I1 writes it	Not in a simple in-order pipeline

RAW Hazard — Without Forwarding

RAW Hazard — With EX→EX Forwarding

Forwarding (bypassing) routes the ALU result directly from the EX/MEM pipeline register back to the ALU input — before WB writes it to the register file. This eliminates the 2-cycle stall completely.

Load-Use Hazard — Always 1 Stall (even with forwarding)

A Load-Use hazard occurs when the instruction immediately after a LW (load word) reads the loaded register. The load result is only available at the end of MEM, but the dependent instruction needs it at the start of EX — one cycle too early. Even with forwarding, exactly 1 stall cycle is always required. The compiler can often hide this by reordering instructions.

Control Hazards — Branch

A control hazard occurs when a branch instruction changes the PC. In a 5-stage pipeline, the branch target is resolved at the end of the EX stage (cycle 3). By that time, two instructions have already been fetched from the wrong path — they must be flushed (turned into bubbles). This is a 2-cycle branch penalty.

Branch prediction eliminates this penalty for correctly-predicted branches. Modern CPUs achieve >95% prediction accuracy, reducing the average branch penalty to <0.1 cycles per branch.

Strategy	Branch Penalty	Notes
Flush on branch	2 cycles always	Simple, correct, used in basic RISC-V cores
Static prediction (not taken)	0 if not taken, 2 if taken	Good for loops with few taken branches
Dynamic 2-bit predictor	<0.2 cycles avg	Modern CPUs; 95%+ accuracy
Delayed branch slot	0 (one useful instr after branch)	Used in MIPS; fills branch shadow with useful work

Structural Hazards

A structural hazard occurs when two instructions need the same hardware resource simultaneously. Example: if there is only one memory port, a load instruction in MEM and an instruction fetch in IF cannot both proceed in the same cycle. Solution: separate instruction memory (I-cache) from data memory (D-cache). Modern processors implement this with separate L1-I and L1-D caches, eliminating structural hazards in the standard 5-stage pipeline.

Hazard Type	Cause	Solution	CPI Impact
RAW (no fwd)	Register written 2–3 cycles later	Forwarding (bypassing)	+2 stalls per dependency
RAW (with fwd)	EX/MEM result needed immediately	EX→EX + MEM→EX forward paths	0 extra (except load-use)
Load-Use	Load result in MEM, needed in EX	1 stall + instruction reordering	+1 stall per occurrence
Control (branch)	Target not known until EX	Branch prediction	+2 cycles per taken branch (no prediction)
Structural	Single memory port	Split I/D cache	0 with separate caches

Hazard Detection Unit

Detects RAW and load-use hazards. Outputs a stall signal that freezes the IF/ID pipeline registers and inserts a bubble in EX.

verilog

// Hazard Detection Unit
// Stalls the pipeline when a RAW hazard cannot be forwarded
module hazard_detect (
  input  wire       id_rs1_valid, id_rs2_valid,  // source regs used?
  input  wire [4:0] id_rs1, id_rs2,              // source registers (ID stage)
  input  wire [4:0] ex_rd, mem_rd,               // dest registers (EX, MEM stages)
  input  wire       ex_reg_write, mem_reg_write, // do EX/MEM stages write?
  input  wire       ex_is_load,                  // is EX a load instruction?
  input  wire       forwarding_enabled,          // forwarding unit present?
  output reg        stall                        // 1 = insert bubble, freeze IF/ID
);
  wire raw_ex  = ex_reg_write  && (ex_rd  != 5'd0) &&
                 ((id_rs1_valid && ex_rd  == id_rs1) ||
                  (id_rs2_valid && ex_rd  == id_rs2));

  wire raw_mem = mem_reg_write && (mem_rd != 5'd0) &&
                 ((id_rs1_valid && mem_rd == id_rs1) ||
                  (id_rs2_valid && mem_rd == id_rs2));

  always @(*) begin
    stall = 1'b0;
    if (ex_is_load && raw_ex)
      stall = 1'b1;                  // Load-use: always 1 stall
    else if (!forwarding_enabled) begin
      if (raw_ex || raw_mem)
        stall = 1'b1;                // No forwarding: stall for RAW
    end
    // With forwarding + non-load RAW: no stall (forwarding unit handles it)
  end
endmodule

Forwarding Unit

verilog

// Forwarding Unit — selects ALU input source
// ForwardA/B: 00=register file, 01=MEM/WB, 10=EX/MEM
module forwarding_unit (
  input  wire [4:0] ex_rs1, ex_rs2,             // source regs in EX stage
  input  wire [4:0] mem_rd, wb_rd,              // dest regs in MEM, WB stages
  input  wire       mem_reg_write, wb_reg_write,// do they write?
  output reg  [1:0] forwardA, forwardB          // mux select for ALU inputs
);
  always @(*) begin
    // Default: use register file value
    forwardA = 2'b00; forwardB = 2'b00;

    // EX/MEM forwarding (higher priority — most recent value)
    if (mem_reg_write && mem_rd != 5'd0) begin
      if (mem_rd == ex_rs1) forwardA = 2'b10;  // EX→EX forward
      if (mem_rd == ex_rs2) forwardB = 2'b10;
    end

    // MEM/WB forwarding (lower priority)
    if (wb_reg_write && wb_rd != 5'd0) begin
      if (wb_rd == ex_rs1 && forwardA == 2'b00) forwardA = 2'b01;
      if (wb_rd == ex_rs2 && forwardB == 2'b00) forwardB = 2'b01;
    end
  end
endmodule

Branch Flush Logic (in Pipeline Controller)

verilog

// Pipeline register flush for control hazards
// When a branch resolves in EX, flush the IF/ID and ID/EX registers
module pipeline_ctrl (
  input  wire clk, rst_n,
  input  wire stall,           // from hazard detection unit
  input  wire branch_taken,    // branch resolved in EX stage
  // Control signals for pipeline registers
  output reg  if_id_write,     // 1=update, 0=freeze (stall)
  output reg  pc_write,        // 1=update PC, 0=freeze
  output reg  if_id_flush,     // flush IF/ID register
  output reg  id_ex_flush      // flush ID/EX register (insert NOP)
);
  always @(*) begin
    if_id_write  = ~stall;
    pc_write     = ~stall;
    if_id_flush  = branch_taken;   // wrong-path instr in IF → NOP
    id_ex_flush  = stall | branch_taken; // stall bubble OR branch flush
  end
endmodule

// In the IF/ID pipeline register:
//   if (if_id_flush) if_id_reg <= NOP;
//   else if (if_id_write) if_id_reg <= {pc+4, instruction};

// In the ID/EX pipeline register:
//   if (id_ex_flush) id_ex_reg <= NOP;  // insert bubble
//   else id_ex_reg <= {control_signals, ...};

Frequently Asked Questions

What is a pipeline hazard?

A pipeline hazard is any condition that prevents the next instruction from executing in the next clock cycle. The three types are: data hazards (instruction needs a result not yet produced), control hazards (branch target not yet known), and structural hazards (two instructions need the same hardware resource simultaneously). Hazards reduce pipeline efficiency and increase CPI above the ideal value of 1.0.

What is a RAW data hazard and how many stalls does it cause?

A RAW (Read After Write) hazard occurs when an instruction reads a register before a preceding instruction has written its result. Without forwarding, a 5-stage pipeline requires 2 stall cycles per RAW dependency. With EX→EX forwarding, zero stalls are needed for arithmetic dependencies. The exception is load-use hazards, which always require 1 stall even with forwarding.

How does forwarding eliminate stalls?

Forwarding routes the ALU result directly from the EX/MEM pipeline register (or MEM/WB register) back to the ALU input — without waiting for the result to be written to the register file. A forwarding unit detects the dependency and inserts a multiplexer in front of the ALU. The correct value arrives just in time for the dependent instruction's EX stage, eliminating the stall entirely.

Why can't forwarding eliminate load-use stalls?

A load instruction produces its result at the end of the MEM stage. The dependent instruction needs that value at the start of its EX stage. If the dependent instruction is immediately after the load, both would be in MEM and EX simultaneously — but the load hasn't finished MEM yet when the dependent instruction starts EX. The result simply doesn't exist early enough, so one stall cycle is unavoidable. The compiler can often reorder instructions to fill this slot.

What is the branch penalty in a 5-stage pipeline?

In a classic 5-stage pipeline, the branch target is resolved at the end of the EX stage (cycle 3). By that point, two instructions have been fetched from the wrong path (in cycles 2 and 3) and must be flushed — this is the 2-cycle branch penalty. Modern processors use dynamic branch predictors to achieve >95% accuracy, reducing the average penalty to under 0.1 cycles per branch instruction.

What is CPI and how do hazards affect it?

CPI (Cycles Per Instruction) = total cycles / instructions completed. An ideal pipeline achieves CPI = 1.0. RAW hazards without forwarding add 2 stall cycles per dependency, pushing CPI toward 3.0 for dependent instruction streams. Load-use hazards add 1 stall. Taken branches add 2 cycles penalty. With full forwarding and good branch prediction, real programs typically achieve CPI of 1.05–1.3.

Pipeline Hazard Lab

5-Stage Pipeline — Live Gantt Chart

What is Pipelining?

Ideal Pipeline Timing (No Hazard)

Data Hazards

RAW Hazard — Without Forwarding

RAW Hazard — With EX→EX Forwarding

Load-Use Hazard — Always 1 Stall (even with forwarding)

Control Hazards — Branch

Structural Hazards

Hazard Detection Unit

Forwarding Unit

Branch Flush Logic (in Pipeline Controller)