HomeHBM3 ControllerModule 2 — Bank & BG FSM
⚡ Phase 1 · Module 2 of 4

HBM3 Bank & Bank-Group FSM

Models all 32 banks (8 bank groups × 4 banks) per HBM3 pseudo-channel. Tracks open/precharged state per bank and enforces inter-bank timing: tRRDs, tRRDl, tCCDs, tCCDl, tFAW and tWTR. Fully synthesizable Verilog.

📁 hbm3_bank_fsm.v 🧪 tb_hbm3_bank_fsm.sv ✅ Synthesizable RTL 🏦 32 banks / pseudo-channel JEDEC JESD238

Bank Groups — The Key to HBM3 Bandwidth

HBM3 divides its 32 banks per pseudo-channel into 8 bank groups (BG0–BG7), each containing 4 banks (B0–B3). This is not just an addressing scheme — it unlocks tighter CAS-to-CAS spacing.

Within the same bank group, consecutive CAS commands need tCCDs = 4 cycles (2 ns). But if you alternate between different bank groups, the constraint loosens to tCCDl = 8 cycles (4 ns) — but the bus can still sustain near-continuous data because the BGs run in parallel. A smart scheduler interleaves across BGs to maximise throughput.

The bank group architecture is the single biggest bandwidth multiplier in HBM3. Correct BG-aware scheduling is what separates a fast controller from a slow one.
HBM3 Pseudo-Channel — 8 Bank Groups × 4 Banks = 32 Banks BG0 B0 IDLE B1 ACTIVE B2 IDLE B3 IDLE BG1 B0 ACTIVE B1 IDLE B2 IDLE B3 IDLE BG2 4 banks BG3 4 banks BG4 4 banks BG5 4 banks BG6 4 banks BG7 4 banks 8 BG × 4 Banks = 32 banks total per pseudo-channel · 16 PCs per HBM3 stack = 512 banks total

Inter-Bank Timing Constraints

ParameterSymbolCycles (2 GHz)TimeApplies To
ACT-to-ACT, same BGtRRDs42 nsTwo ACTs targeting the same bank group
ACT-to-ACT, diff BGtRRDl84 nsTwo ACTs targeting different bank groups
CAS-to-CAS, same BGtCCDs42 nsRD/WR back-to-back, same bank group
CAS-to-CAS, diff BGtCCDl84 nsRD/WR back-to-back, different bank groups
Four-Activate WindowtFAW3216 nsMax 4 ACTs in any rolling 32-cycle window
Write-to-Read, same BGtWTRs84 nsWR then RD on same bank group
Write-to-Read, diff BGtWTRl168 nsWR then RD on different bank groups

Port Reference

PortDirWidthDescription
i_clk / i_rst_nin1Clock / active-low synchronous reset
i_cmd_act/rd/wr/prein1 eachACTIVATE / READ / WRITE / PRECHARGE command pulses
i_cmd_preain1PRECHARGE ALL — closes every open bank in one command
i_cmd_refin1ALL-BANK REFRESH — all 32 banks enter refresh state
i_bg_sel[2:0]in3Target bank group (0–7)
i_ba_sel[1:0]in2Target bank within group (0–3)
o_bank_activeout1Selected bank has an open row (ACTIVE state)
o_bank_idleout1Selected bank is precharged (IDLE state)
o_act_allowedout1ACT permitted: tRRDs + tRRDl + tFAW all satisfied
o_cas_allowedout1CAS permitted: tCCDs + tCCDl + tWTR all satisfied
o_banks_open[31:0]out32Bitmap of all open banks (bit = BG*4 + BA)
o_open_count[4:0]out5Number of currently open banks (0–32)

Verilog Source — hbm3_bank_fsm.v

verilog · hbm3_bank_fsm.v
// ================================================================
//  hbm3_bank_fsm.v
//  HBM3 Bank & Bank-Group State Machine  —  Phase 1 · Module 2
//  8 Bank Groups x 4 Banks = 32 banks per pseudo-channel
//  Enforces tRRDs/tRRDl, tCCDs/tCCDl, tFAW, tWTRs/tWTRl
//  Synthesizable Verilog — EcrioniX HBM3 Controller Series
// ================================================================
module hbm3_bank_fsm #(
  // Defaults calibrated for 2 GHz controller clock (0.5 ns/cycle)
  parameter tRRDs = 4,    // ACT-to-ACT, same BG       2 ns
  parameter tRRDl = 8,    // ACT-to-ACT, diff BG       4 ns
  parameter tCCDs = 4,    // CAS-to-CAS, same BG       2 ns
  parameter tCCDl = 8,    // CAS-to-CAS, diff BG       4 ns
  parameter tFAW  = 32,   // Four-Activate Window     16 ns
  parameter tWTRs = 8,    // Write-to-Read, same BG    4 ns
  parameter tWTRl = 16    // Write-to-Read, diff BG    8 ns
)(
  input  wire        i_clk,
  input  wire        i_rst_n,     // Active-low synchronous reset

  // Command inputs (one-hot pulses from scheduler)
  input  wire        i_cmd_act,   // ACTIVATE
  input  wire        i_cmd_rd,    // READ
  input  wire        i_cmd_wr,    // WRITE
  input  wire        i_cmd_pre,   // PRECHARGE single bank
  input  wire        i_cmd_prea,  // PRECHARGE ALL banks
  input  wire        i_cmd_ref,   // ALL-BANK REFRESH

  // Target address
  input  wire [2:0]  i_bg_sel,    // Bank group select (0-7)
  input  wire [1:0]  i_ba_sel,    // Bank address within group (0-3)

  // Status outputs
  output wire        o_bank_active, // Selected bank is ACTIVE (row open)
  output wire        o_bank_idle,   // Selected bank is IDLE (precharged)
  output wire        o_act_allowed, // ACT may be issued to selected bank
  output wire        o_cas_allowed, // RD/WR may be issued to selected bank
  output wire [31:0] o_banks_open,  // Bitmap: bit[BG*4+BA] = 1 if ACTIVE
  output wire [4:0]  o_open_count   // Count of currently open banks
);

  // -- Bank state encoding ------------------------------------------
  localparam [1:0]
    BS_IDLE    = 2'd0,   // Precharged, ready for ACT
    BS_ACTIVE  = 2'd1,   // Row open, ready for RD/WR
    BS_REFRESH = 2'd2;   // Refresh in progress

  // Per-bank state array [BG][Bank]
  reg [1:0] bstate [0:7][0:3];

  // Per-BG timing counters (same-BG constraints)
  reg [4:0] bg_act_cnt [0:7];   // tRRDs countdown per BG
  reg [4:0] bg_cas_cnt [0:7];   // tCCDs countdown per BG
  reg [4:0] bg_wtr_cnt [0:7];   // tWTRs countdown per BG

  // Global timing counters (cross-BG constraints)
  reg [4:0] gl_act_cnt;          // tRRDl countdown
  reg [4:0] gl_cas_cnt;          // tCCDl countdown
  reg [4:0] gl_wtr_cnt;          // tWTRl countdown

  // Four-Activate Window (tFAW)
  // Saturating counter: max 4 ACTs allowed in tFAW cycles
  reg [2:0] act_in_faw;          // ACTs in current window (0-4)
  reg [5:0] faw_timer;           // Reloads on each ACT

  integer i, j;

  // -- Sequential: bank states and timing counters ------------------
  always @(posedge i_clk) begin
    if (!i_rst_n) begin
      for (i = 0; i < 8; i = i+1) begin
        bg_act_cnt[i] <= 5'd0;
        bg_cas_cnt[i] <= 5'd0;
        bg_wtr_cnt[i] <= 5'd0;
        for (j = 0; j < 4; j = j+1)
          bstate[i][j] <= BS_IDLE;
      end
      gl_act_cnt <= 5'd0;
      gl_cas_cnt <= 5'd0;
      gl_wtr_cnt <= 5'd0;
      act_in_faw <= 3'd0;
      faw_timer  <= 6'd0;
    end else begin

      // -- Decrement all per-BG timers -------------------------
      for (i = 0; i < 8; i = i+1) begin
        if (bg_act_cnt[i] > 0) bg_act_cnt[i] <= bg_act_cnt[i] - 1;
        if (bg_cas_cnt[i] > 0) bg_cas_cnt[i] <= bg_cas_cnt[i] - 1;
        if (bg_wtr_cnt[i] > 0) bg_wtr_cnt[i] <= bg_wtr_cnt[i] - 1;
      end
      if (gl_act_cnt > 0) gl_act_cnt <= gl_act_cnt - 1;
      if (gl_cas_cnt > 0) gl_cas_cnt <= gl_cas_cnt - 1;
      if (gl_wtr_cnt > 0) gl_wtr_cnt <= gl_wtr_cnt - 1;

      // -- FAW sliding window timer ----------------------------
      // Each ACT reloads the window; oldest ACT falls out as timer expires
      if (faw_timer > 0) begin
        faw_timer <= faw_timer - 1;
      end else if (act_in_faw > 0) begin
        act_in_faw <= act_in_faw - 1;  // oldest ACT exits the window
      end

      // -- ACTIVATE --------------------------------------------
      if (i_cmd_act && o_act_allowed &&
          bstate[i_bg_sel][i_ba_sel] == BS_IDLE) begin
        bstate[i_bg_sel][i_ba_sel] <= BS_ACTIVE;
        bg_act_cnt[i_bg_sel]       <= tRRDs[4:0] - 1;
        gl_act_cnt                 <= tRRDl[4:0] - 1;
        faw_timer                  <= tFAW[5:0]  - 1;
        if (act_in_faw < 4) act_in_faw <= act_in_faw + 1;
      end

      // -- READ ------------------------------------------------
      if (i_cmd_rd && bstate[i_bg_sel][i_ba_sel] == BS_ACTIVE) begin
        bg_cas_cnt[i_bg_sel] <= tCCDs[4:0] - 1;
        gl_cas_cnt           <= tCCDl[4:0] - 1;
      end

      // -- WRITE -----------------------------------------------
      if (i_cmd_wr && bstate[i_bg_sel][i_ba_sel] == BS_ACTIVE) begin
        bg_cas_cnt[i_bg_sel] <= tCCDs[4:0] - 1;
        gl_cas_cnt           <= tCCDl[4:0] - 1;
        bg_wtr_cnt[i_bg_sel] <= tWTRs[4:0] - 1;
        gl_wtr_cnt           <= tWTRl[4:0] - 1;
      end

      // -- PRECHARGE single bank -------------------------------
      if (i_cmd_pre && bstate[i_bg_sel][i_ba_sel] == BS_ACTIVE)
        bstate[i_bg_sel][i_ba_sel] <= BS_IDLE;

      // -- PRECHARGE ALL ---------------------------------------
      if (i_cmd_prea) begin
        for (i = 0; i < 8; i = i+1)
          for (j = 0; j < 4; j = j+1)
            if (bstate[i][j] == BS_ACTIVE)
              bstate[i][j] <= BS_IDLE;
      end

      // -- ALL-BANK REFRESH ------------------------------------
      if (i_cmd_ref) begin
        for (i = 0; i < 8; i = i+1)
          for (j = 0; j < 4; j = j+1)
            bstate[i][j] <= BS_REFRESH;
      end
    end
  end

  // -- Combinational outputs ----------------------------------------
  assign o_bank_active = (bstate[i_bg_sel][i_ba_sel] == BS_ACTIVE);
  assign o_bank_idle   = (bstate[i_bg_sel][i_ba_sel] == BS_IDLE);

  // ACT allowed: same-BG tRRDs met AND global tRRDl met AND FAW < 4
  assign o_act_allowed = (bg_act_cnt[i_bg_sel] == 0) &&
                         (gl_act_cnt            == 0) &&
                         (act_in_faw            <  4);

  // CAS allowed: tCCDs + tCCDl + tWTR all satisfied
  assign o_cas_allowed = (bg_cas_cnt[i_bg_sel] == 0) &&
                         (gl_cas_cnt            == 0) &&
                         (bg_wtr_cnt[i_bg_sel]  == 0) &&
                         (gl_wtr_cnt            == 0);

  // All-banks open bitmap (synthesises as 32 comparators)
  genvar gi, gj;
  generate
    for (gi = 0; gi < 8; gi = gi+1)
      for (gj = 0; gj < 4; gj = gj+1)
        assign o_banks_open[gi*4 + gj] =
               (bstate[gi][gj] == BS_ACTIVE);
  endgenerate

  // Popcount of o_banks_open (o_open_count)
  reg [4:0] cnt_tmp;
  integer k;
  always @(*) begin
    cnt_tmp = 5'd0;
    for (k = 0; k < 32; k = k+1)
      cnt_tmp = cnt_tmp + o_banks_open[k];
  end
  assign o_open_count = cnt_tmp;

endmodule
The act_in_faw counter uses a simplified sliding window — it increments on each ACT and decrements when faw_timer expires. This is a conservative tFAW model: once 4 ACTs are counted, no more are allowed until the oldest exits the window. For a cycle-accurate model, use a 4-entry timestamp FIFO.

SystemVerilog Testbench — tb_hbm3_bank_fsm.sv

systemverilog · tb_hbm3_bank_fsm.sv
// ================================================================
//  tb_hbm3_bank_fsm.sv
//  Self-checking testbench for hbm3_bank_fsm
//  Uses reduced timing parameters for simulation speed
// ================================================================
`timescale 1ns/1ps
module tb_hbm3_bank_fsm;

  localparam tRRDs = 3;
  localparam tRRDl = 5;
  localparam tCCDs = 3;
  localparam tCCDl = 5;
  localparam tFAW  = 12;
  localparam tWTRs = 4;
  localparam tWTRl = 6;

  reg        i_clk=0, i_rst_n=0;
  reg        i_cmd_act=0, i_cmd_rd=0, i_cmd_wr=0;
  reg        i_cmd_pre=0, i_cmd_prea=0, i_cmd_ref=0;
  reg [2:0]  i_bg_sel=0;
  reg [1:0]  i_ba_sel=0;
  wire       o_bank_active, o_bank_idle, o_act_allowed, o_cas_allowed;
  wire [31:0] o_banks_open;
  wire [4:0]  o_open_count;

  hbm3_bank_fsm #(
    .tRRDs(tRRDs),.tRRDl(tRRDl),.tCCDs(tCCDs),
    .tCCDl(tCCDl),.tFAW(tFAW),.tWTRs(tWTRs),.tWTRl(tWTRl)
  ) dut (.*);

  always #0.25 i_clk = ~i_clk;

  // Helper
  task pulse(ref reg sig);
    @(negedge i_clk); sig=1; @(posedge i_clk); #0.1; sig=0;
  endtask

  task set_addr(input [2:0] bg, input [1:0] ba);
    i_bg_sel = bg; i_ba_sel = ba;
  endtask

  integer pass=0, fail=0;
  task check(input string name, input logic got, input logic exp);
    if (got===exp) begin $display("  PASS: %s",name); pass++; end
    else           begin $display("  FAIL: %s (got=%b exp=%b)",name,got,exp); fail++; end
  endtask

  // -- SVA Assertions -----------------------------------------------
  // Cannot activate a bank that is already open
  property p_no_double_act;
    @(posedge i_clk) (i_cmd_act && o_bank_active) |-> o_act_allowed == 0;
  endproperty
  assert property(p_no_double_act) else
    $error("ASSERT FAIL: ACT issued to already-open bank");

  // CAS on idle bank is illegal
  property p_cas_needs_active;
    @(posedge i_clk) (i_cmd_rd || i_cmd_wr) |-> o_bank_active;
  endproperty
  assert property(p_cas_needs_active) else
    $error("ASSERT FAIL: CAS issued without active row");

  // FAW must never exceed 4
  property p_faw_limit;
    @(posedge i_clk) o_open_count <= 32;
  endproperty
  assert property(p_faw_limit);

  // -- Tests --------------------------------------------------------
  initial begin
    $dumpfile("tb_hbm3_bank_fsm.vcd");
    $dumpvars(0, tb_hbm3_bank_fsm);

    i_rst_n=0; repeat(4) @(posedge i_clk); i_rst_n=1; @(posedge i_clk);

    // TEST 1: ACT BG0/B0
    $display("\n[TEST 1] ACTIVATE BG0/B0");
    set_addr(0,0);
    check("Bank idle before ACT",       o_bank_idle,     1'b1);
    check("Bank not active before ACT", o_bank_active,   1'b0);
    pulse(i_cmd_act);
    @(posedge i_clk);
    check("Bank active after ACT", o_bank_active,   1'b1);
    check("BG0 bitmap bit set",    o_banks_open[0], 1'b1);

    // TEST 2: tRRDs — same BG ACT must wait
    $display("\n[TEST 2] tRRDs enforcement — same BG");
    set_addr(0,1);  // BG0 B1
    check("act_allowed low in tRRDs window", o_act_allowed, 1'b0);
    repeat(tRRDs) @(posedge i_clk);
    check("act_allowed after tRRDs",         o_act_allowed, 1'b1);

    // TEST 3: ACT different BG immediately (tRRDl)
    $display("\n[TEST 3] tRRDl enforcement — diff BG");
    set_addr(0,0); pulse(i_cmd_act); @(posedge i_clk);  // re-ACT BG0
    set_addr(1,0);  // BG1
    check("act_allowed low in tRRDl window (diff BG)", o_act_allowed, 1'b0);
    repeat(tRRDl) @(posedge i_clk);
    check("act_allowed after tRRDl", o_act_allowed, 1'b1);

    // TEST 4: tFAW — 4 ACTs then blocked
    $display("\n[TEST 4] Four-Activate Window (tFAW)");
    set_addr(0,0); if(o_bank_idle) pulse(i_cmd_act); @(posedge i_clk);
    repeat(tRRDl) @(posedge i_clk);
    set_addr(1,0); pulse(i_cmd_act); @(posedge i_clk);
    repeat(tRRDl) @(posedge i_clk);
    set_addr(2,0); pulse(i_cmd_act); @(posedge i_clk);
    repeat(tRRDl) @(posedge i_clk);
    set_addr(3,0); pulse(i_cmd_act); @(posedge i_clk);
    repeat(tRRDl) @(posedge i_clk);
    set_addr(4,0);  // 5th ACT — should be blocked
    check("5th ACT blocked by tFAW",     o_act_allowed, 1'b0);
    repeat(tFAW) @(posedge i_clk);
    check("ACT allowed after tFAW expires", o_act_allowed, 1'b1);

    // TEST 5: PRECHARGE ALL
    $display("\n[TEST 5] PRECHARGE ALL");
    pulse(i_cmd_prea); @(posedge i_clk);
    check("All banks closed after PREA", o_banks_open, 32'd0);
    check("open_count = 0",              o_open_count,  5'd0);

    // TEST 6: tCCDs — same BG CAS spacing
    $display("\n[TEST 6] tCCDs CAS spacing");
    set_addr(0,0); pulse(i_cmd_act);
    repeat(tRRDs+1) @(posedge i_clk);
    pulse(i_cmd_rd); @(posedge i_clk);
    check("cas_allowed low after first CAS", o_cas_allowed, 1'b0);
    repeat(tCCDs) @(posedge i_clk);
    check("cas_allowed after tCCDs",         o_cas_allowed, 1'b1);

    // TEST 7: Write-to-Read tWTRs
    $display("\n[TEST 7] tWTRs write-to-read");
    pulse(i_cmd_wr); @(posedge i_clk);
    check("Read blocked by tWTRs after WR", o_cas_allowed, 1'b0);
    repeat(tWTRs) @(posedge i_clk);
    check("Read allowed after tWTRs",       o_cas_allowed, 1'b1);

    // Summary
    $display("\n========================================");
    $display("  RESULTS: %0d PASS  |  %0d FAIL", pass, fail);
    if (fail==0) $display("  ALL TESTS PASSED ✅");
    else         $display("  FAILURES DETECTED ❌");
    $display("========================================");
    $finish;
  end

endmodule

Frequently Asked Questions

What is a bank group and why does HBM3 have 8 of them?

A bank group is a cluster of DRAM banks sharing sense amplifiers and internal data paths. Having 8 independent BGs lets the scheduler pipeline CAS commands across groups with tight tCCDs spacing (4 cycles) rather than waiting the full tCCDl (8 cycles). More BGs = more parallelism = higher effective bandwidth.

What does tFAW actually protect against?

tFAW (Four Activate Window) limits peak current draw. Each ACTIVATE command charges the bitlines of an entire row — a high-current event. If you issue too many ACTs simultaneously, the DRAM's internal charge pump and VDD rails can droop, causing bit errors. tFAW is a rolling power budget: never more than 4 row activations in any 16 ns window.

Why does the bank FSM need a separate tWTR constraint?

After a WRITE, the DQ bus is still being driven by the controller. Before issuing a READ, the bus must turn around — the controller stops driving, tri-states its outputs, and the DRAM starts driving. tWTRs (same BG) and tWTRl (cross-BG) are the minimum bus turnaround delays before the first READ data can be sampled cleanly.

How does this module connect to Module 1 (Timing FSM)?

They run in parallel and both must grant permission. Module 1 tracks per-bank timing (tRCD, tRAS, tRP, CL) — it knows whether THIS bank's row is ready. Module 2 tracks inter-bank timing (tRRD, tCCD, tFAW) — it knows whether the BUS and CHARGE PUMP are ready. The scheduler AND-gates both ready/act_allowed/cas_allowed outputs before issuing any command.

Does the banks_open bitmap update in the same cycle as the ACTIVATE?

No — it updates on the clock edge AFTER the ACT pulse, because bstate is registered. The bitmap reflects the state AFTER the last rising edge. The scheduler sees bank_active go high on the cycle following the ACT pulse, which is the correct pipeline timing — the row is being activated during that cycle.

Is the tFAW model in this module cycle-accurate?

It is a conservative approximation. The act_in_faw counter increments on each ACT and faw_timer reloads to tFAW. When the timer expires, one entry exits the window. This slightly over-constrains (may block one extra ACT in edge cases) compared to a precise timestamp FIFO, but it is always safe and synthesizes more cleanly. Module 18 (Integration) will include the precise 4-entry FIFO version.