Single Port RAM — Verilog Guide
The most common memory primitive — one port, shared read and write. Covers synchronous read-first, write-first, no-change modes, asynchronous read, BRAM inference, and an interactive memory simulator.
Single Port RAM — Block Diagram
Port Description
| Port | Width | Direction | Description |
|---|---|---|---|
| clk | 1 | Input | Clock — all synchronous operations occur on rising edge |
| en | 1 | Input | Chip enable — when 0, no read or write occurs (optional in some designs) |
| we | 1 | Input | Write enable — 1 = write, 0 = read |
| addr | ADDR_W | Input | Address bus — selects memory location. DEPTH = 2^ADDR_W |
| din | DATA_W | Input | Write data — valid when we=1 |
| dout | DATA_W | Output | Read data — registered (sync) or combinational (async) |
Synchronous Read Modes — Verilog Code
When a write and read happen to the same address in the same clock cycle, the three modes differ in what dout shows. This choice determines which BRAM primitive is inferred.
On a write, dout = new data (din). The write happens first, then the read sees the written value. Maps to Xilinx "write-first" BRAM mode.
// Single Port RAM — Write-First (Write-Through) Mode
// dout shows the NEW data on a write-read to same address
module sp_ram_write_first #(
parameter DATA_W = 8,
parameter ADDR_W = 8 // depth = 2^8 = 256 locations
)(
input wire clk,
input wire we, // write enable
input wire [ADDR_W-1:0] addr,
input wire [DATA_W-1:0] din,
output reg [DATA_W-1:0] dout
);
// Memory array
reg [DATA_W-1:0] mem [0:(2**ADDR_W)-1];
always @(posedge clk) begin
if (we) begin
mem[addr] <= din; // write the new data
dout <= din; // output = new data (write-first)
end else begin
dout <= mem[addr]; // read normally
end
end
endmodule
On a write, dout = old data (value before the write). Read happens before the write. Maps to Xilinx "read-first" / Intel "old data" mode.
// Single Port RAM — Read-First Mode
// dout shows the OLD data on a write-read to same address
module sp_ram_read_first #(
parameter DATA_W = 8,
parameter ADDR_W = 8
)(
input wire clk,
input wire we,
input wire [ADDR_W-1:0] addr,
input wire [DATA_W-1:0] din,
output reg [DATA_W-1:0] dout
);
reg [DATA_W-1:0] mem [0:(2**ADDR_W)-1];
always @(posedge clk) begin
if (we)
mem[addr] <= din; // write
dout <= mem[addr]; // read AFTER write — sees OLD value
// Note: read is always registered regardless of we
end
endmodule
On a write, dout holds its previous value — it does not update. This is the most power-efficient mode. Maps to Xilinx "no-change" BRAM mode.
// Single Port RAM — No-Change Mode
// dout holds its previous value during a write cycle
module sp_ram_no_change #(
parameter DATA_W = 8,
parameter ADDR_W = 8
)(
input wire clk,
input wire we,
input wire [ADDR_W-1:0] addr,
input wire [DATA_W-1:0] din,
output reg [DATA_W-1:0] dout
);
reg [DATA_W-1:0] mem [0:(2**ADDR_W)-1];
always @(posedge clk) begin
if (we) begin
mem[addr] <= din; // write only
// dout NOT updated — holds previous value
end else begin
dout <= mem[addr]; // read only on non-write cycles
end
end
endmodule
dout changes immediately when addr changes — no clock needed. Write is still synchronous. Infers distributed (LUT) RAM, not BRAM. Lower latency, higher LUT usage.
// Single Port RAM — Asynchronous Read (Distributed RAM)
// Read is combinational: dout changes when addr changes
// Write is synchronous: data captured on posedge clk
// Synthesis: infers LUT-based distributed RAM, NOT BRAM
module sp_ram_async_read #(
parameter DATA_W = 8,
parameter ADDR_W = 6 // 64 locations — typical for distributed RAM
)(
input wire clk,
input wire we,
input wire [ADDR_W-1:0] addr,
input wire [DATA_W-1:0] din,
output wire [DATA_W-1:0] dout // wire, not reg — combinational output
);
reg [DATA_W-1:0] mem [0:(2**ADDR_W)-1];
// Synchronous write
always @(posedge clk) begin
if (we)
mem[addr] <= din; // non-blocking: write on clock edge
end
// Asynchronous (combinational) read
assign dout = mem[addr]; // output changes immediately with addr
endmodule
With Byte Write Enable (Byte-Enable RAM)
Real SoC designs need byte-granularity writes — write only byte 0, 1, 2, or 3 of a 32-bit word. Each bit in we controls one byte lane.
// Single Port RAM — Byte Write Enable (32-bit word, 4 byte lanes)
// we[0] = byte 0 (bits 7:0), we[1] = byte 1, we[2] = byte 2, we[3] = byte 3
module sp_ram_byte_en #(
parameter DATA_W = 32,
parameter ADDR_W = 10 // 1024 words × 4 bytes = 4 KB
)(
input wire clk,
input wire [3:0] we, // per-byte write enables
input wire [ADDR_W-1:0] addr,
input wire [DATA_W-1:0] din,
output reg [DATA_W-1:0] dout
);
reg [DATA_W-1:0] mem [0:(2**ADDR_W)-1];
always @(posedge clk) begin
// Byte-granularity write
if (we[0]) mem[addr][ 7: 0] <= din[ 7: 0];
if (we[1]) mem[addr][15: 8] <= din[15: 8];
if (we[2]) mem[addr][23:16] <= din[23:16];
if (we[3]) mem[addr][31:24] <= din[31:24];
// Read: always registered
dout <= mem[addr]; // read-first mode (reads before writes take effect)
end
endmodule
Timing Diagram — Read-First vs Write-First
Read Mode Comparison
Write-First
Same-address simultaneous R/W: dout = din (new data). Useful when you write and immediately need to read the new value. Xilinx RAMB36: WRITE_MODE = "WRITE_FIRST".
Read-First
Same-address simultaneous R/W: dout = old data. Needed for shift-register-based constructs. Default mode for many tools. Supports ECC in Xilinx BRAMs. WRITE_MODE = "READ_FIRST".
No-Change
dout unchanged during write. Best power efficiency — output register doesn't toggle. WRITE_MODE = "NO_CHANGE". Cannot be used if you need read-while-write behavior.
Async Read
dout = mem[addr] combinationally. Zero read latency but infers distributed LUT RAM (not BRAM). Higher frequency penalty. Use only for small memories (<64 entries).
BRAM Inference Tips
| Tool | Requirement for BRAM Inference | Attribute to Force |
|---|---|---|
| Xilinx Vivado | Synchronous read, array ≥ 1Kbit, no reset on dout | (* ram_style = "block" *) |
| Intel Quartus | Synchronous read, registered output, single clock | // synthesis ramstyle = "M20K" |
| Synopsys DC | Memory compiler + rf2gen for ASIC; BRAM not applicable | Instantiate hard macro directly |
| Cadence Genus | Same as DC — use memory compiler for SRAM macros | Use // cadence map_to_module |
// Force BRAM inference — Xilinx/Vivado attribute
(* ram_style = "block" *) // force Block RAM (not distributed)
module sp_ram_bram #(
parameter DATA_W = 18,
parameter ADDR_W = 10 // 1024 × 18 = 18 Kbits — fits in one RAMB18
)(
input wire clk,
input wire we,
input wire [ADDR_W-1:0] addr,
input wire [DATA_W-1:0] din,
output reg [DATA_W-1:0] dout
);
(* ram_style = "block" *)
reg [DATA_W-1:0] mem [0:(2**ADDR_W)-1];
// Initialize from file (optional — for ROM-like use)
// initial $readmemh("init.hex", mem);
always @(posedge clk) begin
if (we)
mem[addr] <= din; // synchronous write
dout <= mem[addr]; // synchronous read (read-first)
end
endmodule
Interactive Memory Simulator
Simulate a 16×8 single port RAM. Enter address (0–15) and data (0–255 or hex like 0xAB), then Read or Write.
Other Memory Types
FAQ
Can a single port RAM read and write simultaneously?
No — a single port RAM has one address bus shared between read and write. You can only do one operation per cycle. If your design needs simultaneous read and write, use a simple dual-port RAM (separate read address + write address) or a true dual-port RAM.
Why does adding a reset to dout break BRAM inference?
BRAM output registers in most FPGAs cannot be reset to an arbitrary value — they have a fixed reset behavior (typically to 0 on GSR). If your Verilog specifies if (!rst_n) dout <= 0, the synthesis tool may not be able to map to the BRAM primitive and will fall back to LUT-based flip-flops. Solution: remove the synchronous reset on dout, or accept the tool will use flip-flops for the output register.
What is the latency of a synchronous RAM read?
One clock cycle. You present the address on cycle N, and dout is valid on cycle N+1. This is called "1-cycle read latency" or "registered read." Some BRAMs support a pipeline register that adds a second cycle of latency but allows higher clock frequency — useful for very deep memory arrays where internal propagation is the bottleneck.