Topic 17 · Digital Electronics

Pipelining in Digital Design
Stages · Hazards · Throughput

How to multiply throughput by splitting long paths into stages — and how to handle the hazards that arise when stages interact.

Pipeline StagesLatency vs Throughput Data HazardForwardingStall Control HazardVerilog

What is Pipelining?

Without pipelining, the next computation can only start after the current one fully completes — the entire combinational path must settle. Pipelining inserts registers between stages, letting each stage work on a different data item every clock cycle, like a factory assembly line.

Unpipelined: T_logic = 10 ns → Fmax = 100 MHz 5-stage pipeline: Stage 1 · 2ns Stage 2 · 2ns Stage 3 · 2ns Stage 4 · 2ns Stage 5 T_stage = 2 ns → Fmax = 500 MHz (5× speedup)
Throughput = 1 output / Tclk (steady state)  |  Latency = k × Tclk (k stages)

Interactive Pipeline Simulator

Clock
0
Completed
0
Throughput
Efficiency

Throughput Calculator

5
2 ns
0.2 ns
5%
Fmax
Throughput
Latency
Speedup vs. 1-stage

Pipeline Hazards

Hazard TypeCauseResolution
Data — RAW
Read-After-Write
Stage N needs a value being written by stage N-1 Forwarding (bypass), or stall if load-use
Data — WAR
Write-After-Read
Stage N writes a register that stage N-1 hasn't read yet Rare in in-order pipelines; register renaming in OOO
Data — WAW
Write-After-Write
Two stages both write the same destination Stall or register renaming
Structural Two stages need the same hardware (e.g., one memory port) Separate instruction/data memories; dual-port RAM
Control Branch/jump changes PC before fetched instructions are correct Flush misfetched instructions; branch prediction; delay slot

Load-Use Hazard (must stall)

LW R1, 0(R2)
IFIDEXMEMWB
ADD R3, R1, R4
IFIDSTALLEXMEMWB

R1 not available until end of MEM stage — ADD must wait one cycle (even with forwarding)

Verilog — Pipeline Register

// 3-stage pipeline: IF → EX → WB
module pipeline3 #(parameter W=32) (
  input  logic         clk, rst_n, stall, flush,
  input  logic [W-1:0] if_data,
  output logic [W-1:0] wb_result
);
  logic [W-1:0] if_ex, ex_wb;

  // IF → EX pipeline register
  always_ff @(posedge clk or negedge rst_n) begin
    if      (!rst_n) if_ex <= '0;
    else if (flush)  if_ex <= '0;   // bubble on branch flush
    else if (!stall)  if_ex <= if_data; // hold on stall
  end

  // EX stage: combinational logic (e.g., ALU)
  logic [W-1:0] ex_out;
  assign ex_out = if_ex + 1;   // placeholder ALU op

  // EX → WB pipeline register
  always_ff @(posedge clk or negedge rst_n) begin
    if      (!rst_n) ex_wb <= '0;
    else if (flush)  ex_wb <= '0;
    else if (!stall)  ex_wb <= ex_out;
  end

  assign wb_result = ex_wb;
endmodule

Forwarding Logic (RAW Hazard Resolution)

// Forward EX/MEM result back to EX stage input
always_comb begin
  // Forward from EX/MEM pipeline register
  if (ex_mem_regwrite && ex_mem_rd != 0 &&
      ex_mem_rd == id_ex_rs1)
    alu_a = ex_mem_aluout;      // forward from EX/MEM
  // Forward from MEM/WB pipeline register
  else if (mem_wb_regwrite && mem_wb_rd != 0 &&
           mem_wb_rd == id_ex_rs1)
    alu_a = mem_wb_result;      // forward from MEM/WB
  else
    alu_a = id_ex_rs1_val;      // use register file
end

// Load-use stall detection
assign load_use_stall = id_ex_memread &&
  (id_ex_rd == if_id_rs1 || id_ex_rd == if_id_rs2);

Pipeline Performance Analysis

Speedup = k / (1 + stall_fraction × k)   (ideal: Speedup → k for large N)
DesignTlogicStagesTclkFmaxStallsEffective throughput
Unpipelined10 ns110 ns100 MHz100 Mop/s
5-stage (ideal)2 ns52 ns500 MHz0%500 Mop/s
5-stage (10% stall)2 ns52 ns500 MHz10%450 Mop/s
5-stage (30% stall)2 ns52 ns500 MHz30%350 Mop/s
Amdahl's Law applies: pipeline speedup is limited by the portion of time stalled. A 5-stage pipeline with 30% stalls achieves only 3.5× speedup, not 5×. Reducing hazards (forwarding, branch prediction) is essential.

Frequently Asked Questions

What is pipelining and why use it?

Pipelining splits a long combinational path into k shorter stages with registers between them. Each stage runs in parallel on different data — achieving up to k× throughput at the cost of k× latency.

What is the difference between a data hazard and a control hazard?

Data hazard: a later stage needs a value a previous stage hasn't produced yet (RAW/WAR/WAW). Control hazard: a branch changes the program counter while the pipeline has already fetched wrong instructions.

Can you always fix a data hazard with forwarding?

No — load-use hazards require one stall cycle even with forwarding, because the load result isn't available until after the memory stage. Forwarding can eliminate all other RAW hazards.

How do you implement a stall in Verilog?

Hold all pipeline registers upstream of the stalled stage (freeze the data), and insert a bubble (all-zero control signals) into the stalled stage's output register: if (!stall) pipe_reg <= next; // else hold