2R1W, 2R2W, flip-flop based design, read-during-write forwarding, and a full RISC-V style 32×32 register file — with interactive register explorer.
A register file is a small, fast, multi-port memory array that stores the CPU's working registers. Unlike RAM (one read port), a register file supports multiple simultaneous reads and writes per cycle, enabling instructions to fetch two source operands and write one result in a single clock cycle.
Two read ports, one write port. The most common configuration — matches one ALU instruction (read rs1, read rs2, write rd).
module regfile_2r1w #(
parameter WIDTH = 32,
parameter DEPTH = 32,
parameter AW = $clog2(DEPTH)
)(
input wire clk,
// Read port A (combinational)
input wire [AW-1:0] raddr_a,
output wire [WIDTH-1:0] rdata_a,
// Read port B (combinational)
input wire [AW-1:0] raddr_b,
output wire [WIDTH-1:0] rdata_b,
// Write port (synchronous)
input wire we,
input wire [AW-1:0] waddr,
input wire [WIDTH-1:0] wdata
);
reg [WIDTH-1:0] rf [0:DEPTH-1];
// Synchronous write
always @(posedge clk) begin
if (we) rf[waddr] <= wdata;
end
// Combinational reads
assign rdata_a = rf[raddr_a];
assign rdata_b = rf[raddr_b];
endmodule
// Read-during-write forwarding (write-first / bypass)
// If read address == write address AND write enable is asserted,
// return the new write data instead of the stale register value.
module regfile_2r1w_fwd #(
parameter WIDTH = 32,
parameter DEPTH = 32,
parameter AW = $clog2(DEPTH)
)(
input wire clk,
input wire [AW-1:0] raddr_a,
output wire [WIDTH-1:0] rdata_a,
input wire [AW-1:0] raddr_b,
output wire [WIDTH-1:0] rdata_b,
input wire we,
input wire [AW-1:0] waddr,
input wire [WIDTH-1:0] wdata
);
reg [WIDTH-1:0] rf [0:DEPTH-1];
always @(posedge clk) begin
if (we) rf[waddr] <= wdata;
end
// Forward new data when addresses collide
assign rdata_a = (we && (waddr == raddr_a)) ? wdata : rf[raddr_a];
assign rdata_b = (we && (waddr == raddr_b)) ? wdata : rf[raddr_b];
endmodule
// RISC-V style: x0 hardwired to 0, with forwarding
module regfile_riscv (
input wire clk,
input wire rst_n,
// Read port A
input wire [4:0] rs1,
output wire [31:0] rdata1,
// Read port B
input wire [4:0] rs2,
output wire [31:0] rdata2,
// Write port
input wire we,
input wire [4:0] rd,
input wire [31:0] wdata
);
reg [31:0] rf [1:31]; // x1–x31 only; x0 is a constant
// x0 is never written — gate the write enable
always @(posedge clk or negedge rst_n) begin
if (!rst_n) begin : clr
integer i;
for (i = 1; i <= 31; i = i+1) rf[i] <= 32'h0;
end else begin
if (we && rd != 5'b0) rf[rd] <= wdata;
end
end
// Read: x0 always returns 0; forwarding on all others
assign rdata1 = (rs1 == 5'b0) ? 32'b0 :
(we && (rd == rs1) && rd!=0) ? wdata :
rf[rs1];
assign rdata2 = (rs2 == 5'b0) ? 32'b0 :
(we && (rd == rs2) && rd!=0) ? wdata :
rf[rs2];
endmodule
Two read ports and two write ports — required for superscalar CPUs that retire two instructions per cycle. Port arbitration handles simultaneous writes to the same register.
module regfile_2r2w #(
parameter WIDTH = 32,
parameter DEPTH = 32,
parameter AW = $clog2(DEPTH)
)(
input wire clk,
// Read ports (combinational)
input wire [AW-1:0] raddr_a,
output wire [WIDTH-1:0] rdata_a,
input wire [AW-1:0] raddr_b,
output wire [WIDTH-1:0] rdata_b,
// Write port A
input wire wea,
input wire [AW-1:0] waddr_a,
input wire [WIDTH-1:0] wdata_a,
// Write port B
input wire web,
input wire [AW-1:0] waddr_b,
input wire [WIDTH-1:0] wdata_b
);
reg [WIDTH-1:0] rf [0:DEPTH-1];
always @(posedge clk) begin
// Port B writes first, then port A overwrites if same address (A has priority)
if (web) rf[waddr_b] <= wdata_b;
if (wea) rf[waddr_a] <= wdata_a;
end
// Read with forwarding from both write ports
// Port A write takes priority over Port B on same address
wire [WIDTH-1:0] fwd_a = (wea && waddr_a == raddr_a) ? wdata_a :
(web && waddr_b == raddr_a) ? wdata_b :
rf[raddr_a];
wire [WIDTH-1:0] fwd_b = (wea && waddr_a == raddr_b) ? wdata_a :
(web && waddr_b == raddr_b) ? wdata_b :
rf[raddr_b];
assign rdata_a = fwd_a;
assign rdata_b = fwd_b;
endmodule
Without forwarding, a hazard occurs when the write-back stage writes to the same register that the decode stage is reading in the same cycle. The read sees the stale value — one cycle too early. Forwarding resolves this by muxing the new write data onto the read output when addresses match.
| we | waddr == raddr? | No forwarding | With forwarding |
|---|---|---|---|
| 0 | — | rf[raddr] (correct) | rf[raddr] (same) |
| 1 | No | rf[raddr] (correct) | rf[raddr] (same) |
| 1 | Yes | rf[raddr] — stale data! | wdata — new data forwarded |
| Aspect | Flip-Flop based | Latch-based (custom cells) | SRAM based |
|---|---|---|---|
| Read ports | Unlimited (wire per bit) | 2–8 (shared bitline) | 1 per SRAM port |
| Area per bit | Large (~6 transistors) | Medium (~8–10T) | Small (~6T SRAM cell) |
| Speed | Fastest | Fast | Slowest (sense amps) |
| FPGA synthesis | Infers FF registers or LUT RAM | Not available | BRAM (if synthesizer maps it) |
| Typical depth | ≤64 entries | 32–128 entries | ≥256 entries |
(* ram_style = "distributed" *) to force this. With two read ports and one write port, Vivado typically infers dual-read-port LUT RAM correctly.
module tb_regfile;
logic clk;
logic [4:0] rs1, rs2, rd;
logic [31:0] rdata1, rdata2, wdata;
logic we;
regfile_riscv dut(.*);
initial clk = 0;
always #5 clk = ~clk;
task write(input [4:0] r, input [31:0] d);
@(posedge clk); we=1; rd=r; wdata=d;
@(posedge clk); we=0;
endtask
initial begin
we=0; rs1=0; rs2=0; rd=0; wdata=0;
#12;
// Write x1=0xABCD, x2=0x1234
write(5'd1, 32'hABCD);
write(5'd2, 32'h1234);
// Normal read
rs1=5'd1; rs2=5'd2; #1;
$display("x1=%0h x2=%0h", rdata1, rdata2);
// Expect: x1=abcd x2=1234
// Read-during-write forwarding: write x1=0xDEAD while reading x1
@(posedge clk); we=1; rd=5'd1; wdata=32'hDEAD;
rs1=5'd1; #1;
$display("Forward x1=%0h (expect DEAD)", rdata1);
// x0 should always return 0
write(5'd0, 32'hFFFF); // write to x0 — should be ignored
rs1=5'd0; #1;
$display("x0=%0h (expect 0)", rdata1);
$finish;
end
endmodule
A register file is a fast multi-port memory array storing the CPU's working registers. Unlike RAM, it supports multiple simultaneous reads (and sometimes writes) per cycle, enabling operand fetch and result writeback in a single cycle without stalls.
2R1W = two read ports + one write port. In one cycle, two source registers (rs1, rs2) can be read simultaneously while one destination register (rd) is written. Matches the datapath of a standard ALU instruction (rd = rs1 OP rs2).
If a read and a write address the same register in the same clock cycle, forwarding returns the new write data instead of the stale stored value. Implemented as: rdata = (we && waddr == raddr) ? wdata : rf[raddr]. Eliminates one-cycle pipeline stalls in the decode/writeback overlap.
x0 = 0 always. Writes to x0 are silently discarded: if (we && rd != 5'b0) rf[rd] <= wdata. This lets the ISA encode NOP, move, and unconditional branches using x0 without dedicated opcodes, reducing instruction set complexity.
Register files are flip-flop or latch based — each read port is a separate wire fanout with zero latency. SRAM uses shared bit lines with one read per port per access, requiring sense amplifiers. Register files are faster but much larger per bit. Practical CPU designs use flip-flop regfiles for ≤64 entries and custom SRAM for larger arrays.