Memory Design

Single Port RAM — Verilog Guide

The most common memory primitive — one port, shared read and write. Covers synchronous read-first, write-first, no-change modes, asynchronous read, BRAM inference, and an interactive memory simulator.

SynchronousAsynchronousWrite-FirstRead-FirstNo-ChangeBRAM Inference

Single Port RAM — Block Diagram

Single Port RAM DEPTH × WIDTH e.g. 256 × 8 = 2 Kbits addr[A-1:0] din[W-1:0] we clk en dout[W-1:0] addr din we clk en dout One port shared: READ or WRITE per cycle (not both simultaneously)

Port Description

PortWidthDirectionDescription
clk1InputClock — all synchronous operations occur on rising edge
en1InputChip enable — when 0, no read or write occurs (optional in some designs)
we1InputWrite enable — 1 = write, 0 = read
addrADDR_WInputAddress bus — selects memory location. DEPTH = 2^ADDR_W
dinDATA_WInputWrite data — valid when we=1
doutDATA_WOutputRead data — registered (sync) or combinational (async)

Synchronous Read Modes — Verilog Code

When a write and read happen to the same address in the same clock cycle, the three modes differ in what dout shows. This choice determines which BRAM primitive is inferred.

On a write, dout = new data (din). The write happens first, then the read sees the written value. Maps to Xilinx "write-first" BRAM mode.

// Single Port RAM — Write-First (Write-Through) Mode
// dout shows the NEW data on a write-read to same address
module sp_ram_write_first #(
  parameter DATA_W = 8,
  parameter ADDR_W = 8   // depth = 2^8 = 256 locations
)(
  input  wire              clk,
  input  wire              we,      // write enable
  input  wire [ADDR_W-1:0] addr,
  input  wire [DATA_W-1:0] din,
  output reg  [DATA_W-1:0] dout
);
  // Memory array
  reg [DATA_W-1:0] mem [0:(2**ADDR_W)-1];

  always @(posedge clk) begin
    if (we) begin
      mem[addr] <= din;   // write the new data
      dout      <= din;   // output = new data (write-first)
    end else begin
      dout <= mem[addr];  // read normally
    end
  end
endmodule

On a write, dout = old data (value before the write). Read happens before the write. Maps to Xilinx "read-first" / Intel "old data" mode.

// Single Port RAM — Read-First Mode
// dout shows the OLD data on a write-read to same address
module sp_ram_read_first #(
  parameter DATA_W = 8,
  parameter ADDR_W = 8
)(
  input  wire              clk,
  input  wire              we,
  input  wire [ADDR_W-1:0] addr,
  input  wire [DATA_W-1:0] din,
  output reg  [DATA_W-1:0] dout
);
  reg [DATA_W-1:0] mem [0:(2**ADDR_W)-1];

  always @(posedge clk) begin
    if (we)
      mem[addr] <= din;    // write
    dout <= mem[addr];     // read AFTER write — sees OLD value
    // Note: read is always registered regardless of we
  end
endmodule

On a write, dout holds its previous value — it does not update. This is the most power-efficient mode. Maps to Xilinx "no-change" BRAM mode.

// Single Port RAM — No-Change Mode
// dout holds its previous value during a write cycle
module sp_ram_no_change #(
  parameter DATA_W = 8,
  parameter ADDR_W = 8
)(
  input  wire              clk,
  input  wire              we,
  input  wire [ADDR_W-1:0] addr,
  input  wire [DATA_W-1:0] din,
  output reg  [DATA_W-1:0] dout
);
  reg [DATA_W-1:0] mem [0:(2**ADDR_W)-1];

  always @(posedge clk) begin
    if (we) begin
      mem[addr] <= din;    // write only
      // dout NOT updated — holds previous value
    end else begin
      dout <= mem[addr];   // read only on non-write cycles
    end
  end
endmodule

dout changes immediately when addr changes — no clock needed. Write is still synchronous. Infers distributed (LUT) RAM, not BRAM. Lower latency, higher LUT usage.

// Single Port RAM — Asynchronous Read (Distributed RAM)
// Read is combinational: dout changes when addr changes
// Write is synchronous: data captured on posedge clk
// Synthesis: infers LUT-based distributed RAM, NOT BRAM
module sp_ram_async_read #(
  parameter DATA_W = 8,
  parameter ADDR_W = 6   // 64 locations — typical for distributed RAM
)(
  input  wire              clk,
  input  wire              we,
  input  wire [ADDR_W-1:0] addr,
  input  wire [DATA_W-1:0] din,
  output wire [DATA_W-1:0] dout  // wire, not reg — combinational output
);
  reg [DATA_W-1:0] mem [0:(2**ADDR_W)-1];

  // Synchronous write
  always @(posedge clk) begin
    if (we)
      mem[addr] <= din;   // non-blocking: write on clock edge
  end

  // Asynchronous (combinational) read
  assign dout = mem[addr];   // output changes immediately with addr
endmodule

With Byte Write Enable (Byte-Enable RAM)

Real SoC designs need byte-granularity writes — write only byte 0, 1, 2, or 3 of a 32-bit word. Each bit in we controls one byte lane.

// Single Port RAM — Byte Write Enable (32-bit word, 4 byte lanes)
// we[0] = byte 0 (bits 7:0), we[1] = byte 1, we[2] = byte 2, we[3] = byte 3
module sp_ram_byte_en #(
  parameter DATA_W = 32,
  parameter ADDR_W = 10   // 1024 words × 4 bytes = 4 KB
)(
  input  wire              clk,
  input  wire [3:0]        we,     // per-byte write enables
  input  wire [ADDR_W-1:0] addr,
  input  wire [DATA_W-1:0] din,
  output reg  [DATA_W-1:0] dout
);
  reg [DATA_W-1:0] mem [0:(2**ADDR_W)-1];

  always @(posedge clk) begin
    // Byte-granularity write
    if (we[0]) mem[addr][ 7: 0] <= din[ 7: 0];
    if (we[1]) mem[addr][15: 8] <= din[15: 8];
    if (we[2]) mem[addr][23:16] <= din[23:16];
    if (we[3]) mem[addr][31:24] <= din[31:24];
    // Read: always registered
    dout <= mem[addr];  // read-first mode (reads before writes take effect)
  end
endmodule

Timing Diagram — Read-First vs Write-First

CLK ADDR 0x05 (write+read addr) WE WE=1 (write din=0xAB) DIN 0xAB DOUT (W-First) old old (1 clk delay) 0xAB (new data) DOUT (R-First) old (prev value) 0xAB ↑clk ↑clk

Read Mode Comparison

Write-First

Same-address simultaneous R/W: dout = din (new data). Useful when you write and immediately need to read the new value. Xilinx RAMB36: WRITE_MODE = "WRITE_FIRST".

Read-First

Same-address simultaneous R/W: dout = old data. Needed for shift-register-based constructs. Default mode for many tools. Supports ECC in Xilinx BRAMs. WRITE_MODE = "READ_FIRST".

No-Change

dout unchanged during write. Best power efficiency — output register doesn't toggle. WRITE_MODE = "NO_CHANGE". Cannot be used if you need read-while-write behavior.

Async Read

dout = mem[addr] combinationally. Zero read latency but infers distributed LUT RAM (not BRAM). Higher frequency penalty. Use only for small memories (<64 entries).

BRAM Inference Tips

ToolRequirement for BRAM InferenceAttribute to Force
Xilinx VivadoSynchronous read, array ≥ 1Kbit, no reset on dout(* ram_style = "block" *)
Intel QuartusSynchronous read, registered output, single clock// synthesis ramstyle = "M20K"
Synopsys DCMemory compiler + rf2gen for ASIC; BRAM not applicableInstantiate hard macro directly
Cadence GenusSame as DC — use memory compiler for SRAM macrosUse // cadence map_to_module
// Force BRAM inference — Xilinx/Vivado attribute
(* ram_style = "block" *)    // force Block RAM (not distributed)
module sp_ram_bram #(
  parameter DATA_W = 18,
  parameter ADDR_W = 10    // 1024 × 18 = 18 Kbits — fits in one RAMB18
)(
  input  wire              clk,
  input  wire              we,
  input  wire [ADDR_W-1:0] addr,
  input  wire [DATA_W-1:0] din,
  output reg  [DATA_W-1:0] dout
);
  (* ram_style = "block" *)
  reg [DATA_W-1:0] mem [0:(2**ADDR_W)-1];

  // Initialize from file (optional — for ROM-like use)
  // initial $readmemh("init.hex", mem);

  always @(posedge clk) begin
    if (we)
      mem[addr] <= din;    // synchronous write
    dout <= mem[addr];     // synchronous read (read-first)
  end
endmodule

Interactive Memory Simulator

Simulate a 16×8 single port RAM. Enter address (0–15) and data (0–255 or hex like 0xAB), then Read or Write.

Memory contents (16 × 8 bits):
Operation log (newest first):
dout =

Other Memory Types

FAQ

Can a single port RAM read and write simultaneously?

No — a single port RAM has one address bus shared between read and write. You can only do one operation per cycle. If your design needs simultaneous read and write, use a simple dual-port RAM (separate read address + write address) or a true dual-port RAM.

Why does adding a reset to dout break BRAM inference?

BRAM output registers in most FPGAs cannot be reset to an arbitrary value — they have a fixed reset behavior (typically to 0 on GSR). If your Verilog specifies if (!rst_n) dout <= 0, the synthesis tool may not be able to map to the BRAM primitive and will fall back to LUT-based flip-flops. Solution: remove the synchronous reset on dout, or accept the tool will use flip-flops for the output register.

What is the latency of a synchronous RAM read?

One clock cycle. You present the address on cycle N, and dout is valid on cycle N+1. This is called "1-cycle read latency" or "registered read." Some BRAMs support a pipeline register that adds a second cycle of latency but allows higher clock frequency — useful for very deep memory arrays where internal propagation is the bottleneck.