Idle HBM3 still draws power. This module builds the power management FSM: CKE control, Active Power-Down, Precharge Power-Down, Self-Refresh entry and exit, idle detection, and wake-up latency tracking per JEDEC JESD238.
HBM3 consumes significant power even when no data is being transferred. Understanding the breakdown of power modes is essential for designing an effective power management controller.
Typical power figures for a single HBM3 stack (varies by vendor and operating frequency):
| Power Mode | Typical Power | CKE | Banks | Exit Latency |
|---|---|---|---|---|
| NORMAL (max BW) | 10–15 W | High | Active | 0 cycles |
| Active Power-Down | 3–5 W | Low | Active | tXP = 10 cycles |
| Precharge Power-Down | 2–4 W | Low | Precharged | tXPDLL = 24 cycles |
| Self-Refresh | 0.5–1.5 W | Low | Precharged | tXS = 512 cycles |
| Deep Power-Down | ~0.05 W | Low | Precharged | Full re-init required |
The power management controller's job is to move the DRAM into the deepest power mode that can be exited within the latency budget imposed by the workload.
HBM3 power modes form a hierarchy from shallowest (fastest exit) to deepest (most savings, slowest exit). The controller must track the current bank state to determine which modes are available.
Self-Refresh is the most complex power mode to implement correctly because it involves a handover of refresh responsibility from the controller to the DRAM, and vice versa on exit.
1. Controller issues PRECHARGE ALL (ensure all banks closed) → waits tRP.
2. Controller issues SELF-REFRESH ENTRY command (SRE) — a specific command encoding on the CA bus.
3. DRAM deasserts ACK — CKE goes low on the same cycle as SRE.
4. DRAM takes over refresh internally via its ring oscillator.
5. Controller may stop the external CK after tCKESR cycles (minimum CKE=0 before stopping clock).
1. Controller restarts external clock (if stopped) and waits tCKSRX cycles for stability.
2. Controller raises CKE=1 (Self-Refresh Exit — SRX).
3. DRAM detects CKE rising edge, begins DLL re-lock.
4. Controller must wait tXS cycles before issuing any non-NOP command.
5. Optionally issue ZQCS (ZQ Short Calibration) during tXS window to recalibrate impedance.
6. Controller asserts o_power_ok, signals o_wakeup_latency count-down complete.
The power management controller implements a 7-state FSM that manages all CKE transitions and enforces minimum timing constraints between states.
The power manager monitors the AXI4 interface for periods of no activity. Two conditions define "idle":
The controller counts idle cycles using a 16-bit counter. The threshold i_idle_cycles[15:0] is software-programmable, allowing the OS to tune power aggressiveness. Recommended values:
The o_wakeup_latency[9:0] output is a down-counter that tells the AXI4 wrapper exactly how many cycles remain until the first command can be issued after a wakeup request. This allows the AXI bridge to insert precisely the right number of wait states rather than using a conservative fixed timeout.
When wakeup_req asserts, the power manager loads the appropriate latency based on the current power mode:
The counter decrements every clock cycle. When it reaches zero, o_power_ok asserts and the AXI wrapper may resume transactions. The scheduler must not issue any DRAM commands while power_ok is deasserted.
// ============================================================
// hbm3_power_mgmt.v — HBM3 Power Management Controller
// EcrioniX · HBM3 Controller Build · Module 15
// Phase 4: Power and Thermal Management
// ============================================================
// Power state FSM with CKE control, idle detection, and
// wakeup latency tracking. Synthesizable RTL.
// States: NORMAL, IDLE_WAIT, PRECHARGING, APD, PPD,
// SR_ENTRY, SELF_REFRESH, SR_EXIT
// ============================================================
module hbm3_power_mgmt #(
parameter TXP = 10, // Active Power-Down exit latency (cycles)
parameter TXPDLL = 24, // PPD exit latency (DLL re-lock)
parameter TXS = 512, // Self-Refresh exit latency (cycles)
parameter TRP = 10, // Precharge time (cycles)
parameter TCKESR = 4 // Min CKE low before stopping CK
)(
input wire i_clk,
input wire i_rst_n,
// Traffic status (from AXI4 bridge and bank FSM)
input wire i_traffic_idle, // no pending AXI transactions
input wire i_all_banks_idle, // all 32 banks precharged
// Power mode control
input wire [15:0] i_idle_cycles, // idle cycle threshold before power-down
input wire i_sr_req, // software-requested self-refresh
input wire i_wakeup_req, // host needs memory access
// Outputs: CKE and power state
output reg o_cke, // CKE to PHY (1=active, 0=power mode)
output reg [2:0] o_pwr_state, // encoded power state
output reg [9:0] o_wakeup_latency, // cycles until ready after wakeup
output reg o_power_ok, // safe to issue DRAM commands
output reg o_precharge_all // trigger bank FSM to close all banks
);
// ============================================================
// State encoding
// ============================================================
localparam S_NORMAL = 3'd0;
localparam S_IDLE_WAIT = 3'd1;
localparam S_PRECHARGING = 3'd2;
localparam S_APD = 3'd3;
localparam S_PPD = 3'd4;
localparam S_SR_ENTRY = 3'd5;
localparam S_SELF_REFRESH = 3'd6;
localparam S_SR_EXIT = 3'd7;
reg [2:0] state;
reg [15:0] idle_cnt; // counts idle cycles
reg [9:0] wait_cnt; // counts timing constraint cycles
reg in_sr; // latches when self-refresh is active
// ============================================================
// Power State FSM
// ============================================================
always @(posedge i_clk or negedge i_rst_n) begin
if (!i_rst_n) begin
state <= S_NORMAL;
o_cke <= 1'b1;
o_pwr_state <= 3'd0;
o_wakeup_latency <= '0;
o_power_ok <= 1'b1;
o_precharge_all <= 1'b0;
idle_cnt <= '0;
wait_cnt <= '0;
in_sr <= 1'b0;
end else begin
o_precharge_all <= 1'b0; // default de-assert
case (state)
// --------------------------------------------------
S_NORMAL: begin
o_cke <= 1'b1;
o_power_ok <= 1'b1;
o_pwr_state <= S_NORMAL;
idle_cnt <= '0;
in_sr <= 1'b0;
if (i_traffic_idle)
state <= S_IDLE_WAIT;
end
// --------------------------------------------------
S_IDLE_WAIT: begin
o_cke <= 1'b1;
o_power_ok <= 1'b1;
o_pwr_state <= S_IDLE_WAIT;
if (!i_traffic_idle || i_wakeup_req) begin
// Traffic resumed — back to normal
idle_cnt <= '0;
state <= S_NORMAL;
end else if (i_sr_req) begin
// Explicit SR request
state <= S_PRECHARGING;
o_precharge_all <= 1'b1;
wait_cnt <= TRP[9:0];
end else if (idle_cnt >= i_idle_cycles) begin
// Idle threshold reached — enter power-down
idle_cnt <= '0;
if (!i_all_banks_idle) begin
// Banks open -> APD
o_cke <= 1'b0;
state <= S_APD;
end else begin
// Banks closed -> PPD
o_cke <= 1'b0;
state <= S_PPD;
end
end else begin
idle_cnt <= idle_cnt + 1;
end
end
// --------------------------------------------------
S_PRECHARGING: begin
o_cke <= 1'b1;
o_power_ok <= 1'b0; // stall commands during precharge
o_pwr_state <= S_PRECHARGING;
if (wait_cnt == 10'd0) begin
// Precharge complete
if (i_sr_req) begin
state <= S_SR_ENTRY;
end else begin
o_cke <= 1'b0;
state <= S_PPD;
end
end else begin
wait_cnt <= wait_cnt - 1;
end
end
// --------------------------------------------------
S_APD: begin
o_cke <= 1'b0;
o_power_ok <= 1'b0;
o_pwr_state <= S_APD;
if (i_wakeup_req) begin
o_cke <= 1'b1;
wait_cnt <= TXP[9:0];
o_wakeup_latency <= TXP[9:0];
state <= S_NORMAL; // simplified: APD exit direct
end
end
// --------------------------------------------------
S_PPD: begin
o_cke <= 1'b0;
o_power_ok <= 1'b0;
o_pwr_state <= S_PPD;
if (i_sr_req) begin
// Deepen to SR
state <= S_SR_ENTRY;
end else if (i_wakeup_req) begin
o_cke <= 1'b1;
wait_cnt <= TXPDLL[9:0];
o_wakeup_latency <= TXPDLL[9:0];
state <= S_SR_EXIT; // reuse exit counter
in_sr <= 1'b0; // not SR, just PPD exit
end
end
// --------------------------------------------------
S_SR_ENTRY: begin
o_cke <= 1'b0;
o_power_ok <= 1'b0;
o_pwr_state <= S_SR_ENTRY;
in_sr <= 1'b1;
// Wait minimum TCKESR before we consider SR active
if (wait_cnt == 10'd0) begin
state <= S_SELF_REFRESH;
wait_cnt <= TCKESR[9:0];
end else begin
wait_cnt <= wait_cnt - 1;
end
end
// --------------------------------------------------
S_SELF_REFRESH: begin
o_cke <= 1'b0;
o_power_ok <= 1'b0;
o_pwr_state <= S_SELF_REFRESH;
if (i_wakeup_req) begin
// Assert CKE to exit self-refresh
o_cke <= 1'b1;
wait_cnt <= TXS[9:0];
o_wakeup_latency <= TXS[9:0];
state <= S_SR_EXIT;
end
end
// --------------------------------------------------
S_SR_EXIT: begin
o_cke <= 1'b1;
o_power_ok <= 1'b0;
o_pwr_state <= S_SR_EXIT;
if (o_wakeup_latency != 10'd0) begin
o_wakeup_latency <= o_wakeup_latency - 1;
wait_cnt <= wait_cnt - 1;
end else begin
// Latency expired — memory is ready
o_power_ok <= 1'b1;
in_sr <= 1'b0;
state <= S_NORMAL;
end
end
default: state <= S_NORMAL;
endcase
end
end
endmodule
// ============================================================
// tb_hbm3_power_mgmt.sv — Testbench for Power Management FSM
// EcrioniX · HBM3 Controller Build · Module 15
// ============================================================
`timescale 1ns/1ps
module tb_hbm3_power_mgmt;
logic clk, rst_n;
logic traffic_idle, all_banks_idle;
logic [15:0] idle_cycles;
logic sr_req, wakeup_req;
logic cke;
logic [2:0] pwr_state;
logic [9:0] wakeup_latency;
logic power_ok, precharge_all;
// Short timings for simulation
hbm3_power_mgmt #(
.TXP(10), .TXPDLL(24), .TXS(64), .TRP(5), .TCKESR(3)
) dut (
.i_clk(clk), .i_rst_n(rst_n),
.i_traffic_idle(traffic_idle), .i_all_banks_idle(all_banks_idle),
.i_idle_cycles(idle_cycles), .i_sr_req(sr_req), .i_wakeup_req(wakeup_req),
.o_cke(cke), .o_pwr_state(pwr_state),
.o_wakeup_latency(wakeup_latency), .o_power_ok(power_ok),
.o_precharge_all(precharge_all)
);
initial clk = 0;
always #1 clk = ~clk;
integer errors = 0;
task wait_cyc(input integer n); repeat(n) @(posedge clk); endtask
initial begin
$dumpfile("tb_power_mgmt.vcd");
$dumpvars(0, tb_hbm3_power_mgmt);
// Reset
rst_n = 0; traffic_idle = 0; all_banks_idle = 0;
idle_cycles = 16'd8; sr_req = 0; wakeup_req = 0;
wait_cyc(10);
rst_n = 1;
wait_cyc(5);
// TEST 1: Normal operation — no idle
$display("[%0t] TEST1: Normal — traffic active", $time);
traffic_idle = 0;
wait_cyc(20);
if (pwr_state !== 3'd0) begin
$error("FAIL: Expected NORMAL state, got %0d", pwr_state);
errors++;
end else $display("[%0t] PASS: Stayed in NORMAL", $time);
if (!cke) begin $error("FAIL: CKE should be 1 in NORMAL"); errors++; end
// TEST 2: Idle detection -> PPD (all banks closed)
$display("[%0t] TEST2: Idle -> PPD", $time);
traffic_idle = 1; all_banks_idle = 1;
wait_cyc(12); // idle_cycles=8 + margin
if (pwr_state !== 3'd4) begin
$error("FAIL: Expected PPD(4), got %0d", pwr_state);
errors++;
end else $display("[%0t] PASS: Entered PPD", $time);
if (cke) begin $error("FAIL: CKE should be 0 in PPD"); errors++; end
// TEST 3: Wakeup from PPD
$display("[%0t] TEST3: Wakeup from PPD", $time);
wakeup_req = 1;
@(posedge clk); wakeup_req = 0;
wait_cyc(30); // wait > TXPDLL=24
if (pwr_state !== 3'd0) begin
$error("FAIL: Expected NORMAL after PPD exit, got %0d", pwr_state);
errors++;
end else $display("[%0t] PASS: Returned to NORMAL from PPD", $time);
if (!power_ok) begin $error("FAIL: power_ok should be 1 in NORMAL"); errors++; end
// TEST 4: Self-Refresh entry and exit
$display("[%0t] TEST4: Self-Refresh enter/exit", $time);
all_banks_idle = 1; traffic_idle = 1;
sr_req = 1;
@(posedge clk); sr_req = 0;
wait_cyc(20); // wait for SR_ENTRY + TCKESR
if (pwr_state !== 3'd6) begin
$error("FAIL: Expected SELF_REFRESH(6), got %0d", pwr_state);
errors++;
end else $display("[%0t] PASS: In SELF_REFRESH", $time);
// Wakeup from SR
wakeup_req = 1;
@(posedge clk); wakeup_req = 0;
wait_cyc(70); // TXS=64 + margin
if (pwr_state !== 3'd0) begin
$error("FAIL: Expected NORMAL after SR exit, got %0d", pwr_state);
errors++;
end else $display("[%0t] PASS: NORMAL after SR exit", $time);
if (!power_ok) begin $error("FAIL: power_ok should be 1"); errors++; end
// Summary
wait_cyc(20);
if (errors == 0)
$display("[%0t] ALL TESTS PASSED", $time);
else
$display("[%0t] %0d TEST(S) FAILED", $time, errors);
$finish;
end
initial begin #200000; $error("TIMEOUT"); $finish; end
endmodule
| State | o_pwr_state | CKE | Banks | Power (typ) | Exit Latency | Use Case |
|---|---|---|---|---|---|---|
| NORMAL | 3'b000 | 1 | Open/Closed | 10–15 W | 0 cycles | Active traffic |
| IDLE_WAIT | 3'b001 | 1 | Any | 10–15 W | 0 cycles | Brief idle detection |
| PRECHARGING | 3'b010 | 1 | Closing | 8–12 W | tRP only | Transition to PPD/SR |
| APD | 3'b011 | 0 | Open | 3–5 W | tXP = 10 | Short idle, open rows |
| PPD | 3'b100 | 0 | Precharged | 2–4 W | tXPDLL = 24 | Short idle, closed banks |
| SR_ENTRY | 3'b101 | 0 | Precharged | 1–2 W | — | Transitioning to SR |
| SELF_REFRESH | 3'b110 | 0 | Precharged | 0.5–1.5 W | tXS = 512 | Long compute idle |
| SR_EXIT | 3'b111 | 1 | Precharged | 2–5 W | Counting tXS | DLL re-lock in progress |
Active Power-Down (APD) keeps banks open and retains row state but disables the clock to save dynamic power. Exit latency is short (tXP = 10 cycles) because no re-training is needed. Self-Refresh (SR) forces all banks precharged, disables the external clock (CKE=0), and makes the DRAM run its own internal refresh. SR saves far more power but has a much longer exit latency (tXS = 512 cycles) due to DLL re-lock.
During self-refresh the DRAM's internal oscillator replaces the external CK. When CKE is re-asserted, the DRAM must: (1) wait for external clock to stabilize, (2) re-lock the internal DLL to the external clock. Steps 1 and 2 dominate — DLL lock typically requires 200–500 cycles after a stable clock edge. Hence the 512-cycle minimum tXS in JEDEC JESD238.
CKE (Clock Enable) is driven by the memory controller to the DRAM. CKE=1 means the DRAM is active and the clock is enabled. CKE=0 causes the DRAM to enter a power-saving mode — the exact mode (APD, PPD, or SR) depends on bank state when CKE goes low. Active banks yield APD; precharged banks yield PPD or SR depending on prior MR settings.
The controller counts consecutive cycles with no pending AXI transactions (i_traffic_idle). When the idle count exceeds i_idle_cycles (programmable), it initiates the power-down sequence. The threshold is typically 16–256 cycles for PPD and 4096–65535 cycles for SR, reflecting the trade-off between power savings and the latency cost of exiting the mode.
Wake-up latency is the cycles the AXI interface must wait after wakeup_req before the first transaction completes. APD tXP=10 cycles is negligible. PPD tXPDLL=24 cycles is small. SR tXS=512 cycles means every SR entry/exit costs 256ns dead time at 2GHz. The o_wakeup_latency counter lets the AXI wrapper insert precise wait states rather than a conservative fixed timeout.