Memory Design Series

Register File Verilog

2R1W, 2R2W, flip-flop based design, read-during-write forwarding, and a full RISC-V style 32×32 register file — with interactive register explorer.

2R1W 2R2W Forwarding x0 = zero RISC-V regfile Flip-Flop
What is a Register File?

A register file is a small, fast, multi-port memory array that stores the CPU's working registers. Unlike RAM (one read port), a register file supports multiple simultaneous reads and writes per cycle, enabling instructions to fetch two source operands and write one result in a single clock cycle.

Register File 32 × 32-bit (RISC-V) x0 = 0x00000000 (hardwired 0) x1 = 0x1000ABCD x2 = 0xDEADBEEF x3 = 0x00000042 … x31 raddr_a[4:0] rdata_a[31:0] raddr_b[4:0] rdata_b[31:0] waddr[4:0] wdata[31:0] we clk
Key property: All reads are combinational (zero latency) — data appears on the same cycle as the address. The write is synchronous — registered on the clock edge. This makes the register file suitable for the decode and write-back stages of a classic 5-stage pipeline.
2R1W Register File

Two read ports, one write port. The most common configuration — matches one ALU instruction (read rs1, read rs2, write rd).

module regfile_2r1w #(
  parameter WIDTH = 32,
  parameter DEPTH = 32,
  parameter AW    = $clog2(DEPTH)
)(
  input  wire          clk,
  // Read port A (combinational)
  input  wire [AW-1:0] raddr_a,
  output wire [WIDTH-1:0] rdata_a,
  // Read port B (combinational)
  input  wire [AW-1:0] raddr_b,
  output wire [WIDTH-1:0] rdata_b,
  // Write port (synchronous)
  input  wire          we,
  input  wire [AW-1:0] waddr,
  input  wire [WIDTH-1:0] wdata
);
  reg [WIDTH-1:0] rf [0:DEPTH-1];

  // Synchronous write
  always @(posedge clk) begin
    if (we) rf[waddr] <= wdata;
  end

  // Combinational reads
  assign rdata_a = rf[raddr_a];
  assign rdata_b = rf[raddr_b];
endmodule
// Read-during-write forwarding (write-first / bypass)
// If read address == write address AND write enable is asserted,
// return the new write data instead of the stale register value.
module regfile_2r1w_fwd #(
  parameter WIDTH = 32,
  parameter DEPTH = 32,
  parameter AW    = $clog2(DEPTH)
)(
  input  wire             clk,
  input  wire [AW-1:0]    raddr_a,
  output wire [WIDTH-1:0] rdata_a,
  input  wire [AW-1:0]    raddr_b,
  output wire [WIDTH-1:0] rdata_b,
  input  wire             we,
  input  wire [AW-1:0]    waddr,
  input  wire [WIDTH-1:0] wdata
);
  reg [WIDTH-1:0] rf [0:DEPTH-1];

  always @(posedge clk) begin
    if (we) rf[waddr] <= wdata;
  end

  // Forward new data when addresses collide
  assign rdata_a = (we && (waddr == raddr_a)) ? wdata : rf[raddr_a];
  assign rdata_b = (we && (waddr == raddr_b)) ? wdata : rf[raddr_b];
endmodule
// RISC-V style: x0 hardwired to 0, with forwarding
module regfile_riscv (
  input  wire        clk,
  input  wire        rst_n,
  // Read port A
  input  wire [4:0]  rs1,
  output wire [31:0] rdata1,
  // Read port B
  input  wire [4:0]  rs2,
  output wire [31:0] rdata2,
  // Write port
  input  wire        we,
  input  wire [4:0]  rd,
  input  wire [31:0] wdata
);
  reg [31:0] rf [1:31];   // x1–x31 only; x0 is a constant

  // x0 is never written — gate the write enable
  always @(posedge clk or negedge rst_n) begin
    if (!rst_n) begin : clr
      integer i;
      for (i = 1; i <= 31; i = i+1) rf[i] <= 32'h0;
    end else begin
      if (we && rd != 5'b0) rf[rd] <= wdata;
    end
  end

  // Read: x0 always returns 0; forwarding on all others
  assign rdata1 = (rs1 == 5'b0)               ? 32'b0  :
                  (we && (rd == rs1) && rd!=0) ? wdata  :
                  rf[rs1];

  assign rdata2 = (rs2 == 5'b0)               ? 32'b0  :
                  (we && (rd == rs2) && rd!=0) ? wdata  :
                  rf[rs2];
endmodule
2R2W Register File

Two read ports and two write ports — required for superscalar CPUs that retire two instructions per cycle. Port arbitration handles simultaneous writes to the same register.

Verilog — 2R2W Register File with port-A priority
module regfile_2r2w #(
  parameter WIDTH = 32,
  parameter DEPTH = 32,
  parameter AW    = $clog2(DEPTH)
)(
  input  wire             clk,
  // Read ports (combinational)
  input  wire [AW-1:0]    raddr_a,
  output wire [WIDTH-1:0] rdata_a,
  input  wire [AW-1:0]    raddr_b,
  output wire [WIDTH-1:0] rdata_b,
  // Write port A
  input  wire             wea,
  input  wire [AW-1:0]    waddr_a,
  input  wire [WIDTH-1:0] wdata_a,
  // Write port B
  input  wire             web,
  input  wire [AW-1:0]    waddr_b,
  input  wire [WIDTH-1:0] wdata_b
);
  reg [WIDTH-1:0] rf [0:DEPTH-1];

  always @(posedge clk) begin
    // Port B writes first, then port A overwrites if same address (A has priority)
    if (web) rf[waddr_b] <= wdata_b;
    if (wea) rf[waddr_a] <= wdata_a;
  end

  // Read with forwarding from both write ports
  // Port A write takes priority over Port B on same address
  wire [WIDTH-1:0] fwd_a = (wea && waddr_a == raddr_a) ? wdata_a :
                            (web && waddr_b == raddr_a) ? wdata_b :
                            rf[raddr_a];

  wire [WIDTH-1:0] fwd_b = (wea && waddr_a == raddr_b) ? wdata_a :
                            (web && waddr_b == raddr_b) ? wdata_b :
                            rf[raddr_b];

  assign rdata_a = fwd_a;
  assign rdata_b = fwd_b;
endmodule
Read-During-Write Forwarding

Without forwarding, a hazard occurs when the write-back stage writes to the same register that the decode stage is reading in the same cycle. The read sees the stale value — one cycle too early. Forwarding resolves this by muxing the new write data onto the read output when addresses match.

we waddr == raddr? No forwarding With forwarding
0 rf[raddr] (correct) rf[raddr] (same)
1 No rf[raddr] (correct) rf[raddr] (same)
1 Yes rf[raddr] — stale data! wdata — new data forwarded
Forwarding in pipelined CPU: In a 5-stage pipeline (IF → ID → EX → MEM → WB), the write-back stage writes register results while the decode stage reads source registers. If both access the same register in the same cycle, forwarding in the register file eliminates a pipeline stall without adding extra hardware in the hazard unit.
Implementation Notes
Aspect Flip-Flop based Latch-based (custom cells) SRAM based
Read ports Unlimited (wire per bit) 2–8 (shared bitline) 1 per SRAM port
Area per bit Large (~6 transistors) Medium (~8–10T) Small (~6T SRAM cell)
Speed Fastest Fast Slowest (sense amps)
FPGA synthesis Infers FF registers or LUT RAM Not available BRAM (if synthesizer maps it)
Typical depth ≤64 entries 32–128 entries ≥256 entries
FPGA note: Vivado maps small register files to LUT RAM (distributed RAM) when the pattern is recognized. Use (* ram_style = "distributed" *) to force this. With two read ports and one write port, Vivado typically infers dual-read-port LUT RAM correctly.
Testbench
SystemVerilog — 2R1W register file with forwarding check
module tb_regfile;
  logic        clk;
  logic [4:0]  rs1, rs2, rd;
  logic [31:0] rdata1, rdata2, wdata;
  logic        we;

  regfile_riscv dut(.*);

  initial clk = 0;
  always #5 clk = ~clk;

  task write(input [4:0] r, input [31:0] d);
    @(posedge clk); we=1; rd=r; wdata=d;
    @(posedge clk); we=0;
  endtask

  initial begin
    we=0; rs1=0; rs2=0; rd=0; wdata=0;
    #12;

    // Write x1=0xABCD, x2=0x1234
    write(5'd1, 32'hABCD);
    write(5'd2, 32'h1234);

    // Normal read
    rs1=5'd1; rs2=5'd2; #1;
    $display("x1=%0h x2=%0h", rdata1, rdata2);
    // Expect: x1=abcd x2=1234

    // Read-during-write forwarding: write x1=0xDEAD while reading x1
    @(posedge clk); we=1; rd=5'd1; wdata=32'hDEAD;
    rs1=5'd1; #1;
    $display("Forward x1=%0h (expect DEAD)", rdata1);

    // x0 should always return 0
    write(5'd0, 32'hFFFF);   // write to x0 — should be ignored
    rs1=5'd0; #1;
    $display("x0=%0h (expect 0)", rdata1);

    $finish;
  end
endmodule
Interactive Register File — 32 × 32-bit RISC-V
RISC-V Register File (x0–x31, forwarding enabled)
Read Port A
Read Port B
Write Port (x0 writes ignored)
Register File (x0 = 0 always)
Operation log
FAQ
What is a register file in CPU design?

A register file is a fast multi-port memory array storing the CPU's working registers. Unlike RAM, it supports multiple simultaneous reads (and sometimes writes) per cycle, enabling operand fetch and result writeback in a single cycle without stalls.

What is a 2R1W register file?

2R1W = two read ports + one write port. In one cycle, two source registers (rs1, rs2) can be read simultaneously while one destination register (rd) is written. Matches the datapath of a standard ALU instruction (rd = rs1 OP rs2).

What is read-during-write forwarding?

If a read and a write address the same register in the same clock cycle, forwarding returns the new write data instead of the stale stored value. Implemented as: rdata = (we && waddr == raddr) ? wdata : rf[raddr]. Eliminates one-cycle pipeline stalls in the decode/writeback overlap.

Why is x0 hardwired to zero in RISC-V?

x0 = 0 always. Writes to x0 are silently discarded: if (we && rd != 5'b0) rf[rd] <= wdata. This lets the ISA encode NOP, move, and unconditional branches using x0 without dedicated opcodes, reducing instruction set complexity.

How is a register file different from SRAM?

Register files are flip-flop or latch based — each read port is a separate wire fanout with zero latency. SRAM uses shared bit lines with one read per port per access, requiring sense amplifiers. Register files are faster but much larger per bit. Practical CPU designs use flip-flop regfiles for ≤64 entries and custom SRAM for larger arrays.

Previous
ROM — Read-Only Memory
Next
CAM — Content Addressable Memory