HomeHBM3 ControllerModule 15 — Power Management
Phase 4 · Module 15

HBM3 Power Management

Idle HBM3 still draws power. This module builds the power management FSM: CKE control, Active Power-Down, Precharge Power-Down, Self-Refresh entry and exit, idle detection, and wake-up latency tracking per JEDEC JESD238.

hbm3_power_mgmt.v tb_hbm3_power_mgmt.sv Synthesizable RTL JEDEC JESD238 Phase 4

1. HBM3 Power Consumption — Active vs Idle vs Self-Refresh

HBM3 consumes significant power even when no data is being transferred. Understanding the breakdown of power modes is essential for designing an effective power management controller.

Typical power figures for a single HBM3 stack (varies by vendor and operating frequency):

Power ModeTypical PowerCKEBanksExit Latency
NORMAL (max BW)10–15 WHighActive0 cycles
Active Power-Down3–5 WLowActivetXP = 10 cycles
Precharge Power-Down2–4 WLowPrechargedtXPDLL = 24 cycles
Self-Refresh0.5–1.5 WLowPrechargedtXS = 512 cycles
Deep Power-Down~0.05 WLowPrechargedFull re-init required

The power management controller's job is to move the DRAM into the deepest power mode that can be exited within the latency budget imposed by the workload.

For GPU workloads, self-refresh is typically only viable during long compute kernels (100+ µs idle) where the 512-cycle exit latency is negligible. For CPU memory, PPD is the practical sweet spot — 2–4W savings with only 24-cycle wake-up penalty.

2. Power Mode Hierarchy

HBM3 power modes form a hierarchy from shallowest (fastest exit) to deepest (most savings, slowest exit). The controller must track the current bank state to determine which modes are available.

The controller must ensure ALL banks are precharged (i_all_banks_idle=1) before issuing a PRECHARGE ALL command and entering PPD or SR. Entering SR with open banks is a JEDEC protocol violation and causes data loss.

3. Self-Refresh Deep Dive

Self-Refresh is the most complex power mode to implement correctly because it involves a handover of refresh responsibility from the controller to the DRAM, and vice versa on exit.

SR Entry Sequence

1. Controller issues PRECHARGE ALL (ensure all banks closed) → waits tRP.
2. Controller issues SELF-REFRESH ENTRY command (SRE) — a specific command encoding on the CA bus.
3. DRAM deasserts ACK — CKE goes low on the same cycle as SRE.
4. DRAM takes over refresh internally via its ring oscillator.
5. Controller may stop the external CK after tCKESR cycles (minimum CKE=0 before stopping clock).

SR Exit Sequence

1. Controller restarts external clock (if stopped) and waits tCKSRX cycles for stability.
2. Controller raises CKE=1 (Self-Refresh Exit — SRX).
3. DRAM detects CKE rising edge, begins DLL re-lock.
4. Controller must wait tXS cycles before issuing any non-NOP command.
5. Optionally issue ZQCS (ZQ Short Calibration) during tXS window to recalibrate impedance.
6. Controller asserts o_power_ok, signals o_wakeup_latency count-down complete.

4. CKE FSM — Power State Machine

The power management controller implements a 7-state FSM that manages all CKE transitions and enforces minimum timing constraints between states.

NORMAL CKE=1 IDLE_WAIT counting idle PRCHG bank close APD CKE=0 (active) PPD CKE=0 (prchg) SR_ENTRY SRE cmd SELF_REFRESH CKE=0, int. ref SR_EXIT tXS wait traffic idle wakeup_req idle_cnt expired banks open tRP done sr_req wakeup tXS done → power_ok wakeup tXP CKE Normal CKE=0 (power-down) Normal SR Awake

5. Idle Detection

The power manager monitors the AXI4 interface for periods of no activity. Two conditions define "idle":

The controller counts idle cycles using a 16-bit counter. The threshold i_idle_cycles[15:0] is software-programmable, allowing the OS to tune power aggressiveness. Recommended values:

6. Wake-up Latency Management

The o_wakeup_latency[9:0] output is a down-counter that tells the AXI4 wrapper exactly how many cycles remain until the first command can be issued after a wakeup request. This allows the AXI bridge to insert precisely the right number of wait states rather than using a conservative fixed timeout.

When wakeup_req asserts, the power manager loads the appropriate latency based on the current power mode:

The counter decrements every clock cycle. When it reaches zero, o_power_ok asserts and the AXI wrapper may resume transactions. The scheduler must not issue any DRAM commands while power_ok is deasserted.

7. Full Verilog Source — Power State FSM

Verilog — hbm3_power_mgmt.v
// ============================================================
// hbm3_power_mgmt.v — HBM3 Power Management Controller
// EcrioniX · HBM3 Controller Build · Module 15
// Phase 4: Power and Thermal Management
// ============================================================
// Power state FSM with CKE control, idle detection, and
// wakeup latency tracking. Synthesizable RTL.
// States: NORMAL, IDLE_WAIT, PRECHARGING, APD, PPD,
//         SR_ENTRY, SELF_REFRESH, SR_EXIT
// ============================================================

module hbm3_power_mgmt #(
    parameter TXP     = 10,  // Active Power-Down exit latency (cycles)
    parameter TXPDLL  = 24,  // PPD exit latency (DLL re-lock)
    parameter TXS     = 512, // Self-Refresh exit latency (cycles)
    parameter TRP     = 10,  // Precharge time (cycles)
    parameter TCKESR  = 4    // Min CKE low before stopping CK
)(
    input  wire        i_clk,
    input  wire        i_rst_n,

    // Traffic status (from AXI4 bridge and bank FSM)
    input  wire        i_traffic_idle,    // no pending AXI transactions
    input  wire        i_all_banks_idle,  // all 32 banks precharged

    // Power mode control
    input  wire [15:0] i_idle_cycles,     // idle cycle threshold before power-down
    input  wire        i_sr_req,          // software-requested self-refresh
    input  wire        i_wakeup_req,      // host needs memory access

    // Outputs: CKE and power state
    output reg         o_cke,             // CKE to PHY (1=active, 0=power mode)
    output reg  [2:0]  o_pwr_state,       // encoded power state
    output reg  [9:0]  o_wakeup_latency,  // cycles until ready after wakeup
    output reg         o_power_ok,        // safe to issue DRAM commands
    output reg         o_precharge_all    // trigger bank FSM to close all banks
);

// ============================================================
// State encoding
// ============================================================
localparam S_NORMAL       = 3'd0;
localparam S_IDLE_WAIT    = 3'd1;
localparam S_PRECHARGING  = 3'd2;
localparam S_APD          = 3'd3;
localparam S_PPD          = 3'd4;
localparam S_SR_ENTRY     = 3'd5;
localparam S_SELF_REFRESH = 3'd6;
localparam S_SR_EXIT      = 3'd7;

reg [2:0]  state;
reg [15:0] idle_cnt;    // counts idle cycles
reg [9:0]  wait_cnt;    // counts timing constraint cycles
reg        in_sr;       // latches when self-refresh is active

// ============================================================
// Power State FSM
// ============================================================
always @(posedge i_clk or negedge i_rst_n) begin
    if (!i_rst_n) begin
        state            <= S_NORMAL;
        o_cke            <= 1'b1;
        o_pwr_state      <= 3'd0;
        o_wakeup_latency <= '0;
        o_power_ok       <= 1'b1;
        o_precharge_all  <= 1'b0;
        idle_cnt         <= '0;
        wait_cnt         <= '0;
        in_sr            <= 1'b0;
    end else begin
        o_precharge_all <= 1'b0; // default de-assert

        case (state)
            // --------------------------------------------------
            S_NORMAL: begin
                o_cke       <= 1'b1;
                o_power_ok  <= 1'b1;
                o_pwr_state <= S_NORMAL;
                idle_cnt    <= '0;
                in_sr       <= 1'b0;

                if (i_traffic_idle)
                    state <= S_IDLE_WAIT;
            end

            // --------------------------------------------------
            S_IDLE_WAIT: begin
                o_cke       <= 1'b1;
                o_power_ok  <= 1'b1;
                o_pwr_state <= S_IDLE_WAIT;

                if (!i_traffic_idle || i_wakeup_req) begin
                    // Traffic resumed — back to normal
                    idle_cnt <= '0;
                    state    <= S_NORMAL;
                end else if (i_sr_req) begin
                    // Explicit SR request
                    state           <= S_PRECHARGING;
                    o_precharge_all <= 1'b1;
                    wait_cnt        <= TRP[9:0];
                end else if (idle_cnt >= i_idle_cycles) begin
                    // Idle threshold reached — enter power-down
                    idle_cnt <= '0;
                    if (!i_all_banks_idle) begin
                        // Banks open -> APD
                        o_cke  <= 1'b0;
                        state  <= S_APD;
                    end else begin
                        // Banks closed -> PPD
                        o_cke  <= 1'b0;
                        state  <= S_PPD;
                    end
                end else begin
                    idle_cnt <= idle_cnt + 1;
                end
            end

            // --------------------------------------------------
            S_PRECHARGING: begin
                o_cke       <= 1'b1;
                o_power_ok  <= 1'b0; // stall commands during precharge
                o_pwr_state <= S_PRECHARGING;

                if (wait_cnt == 10'd0) begin
                    // Precharge complete
                    if (i_sr_req) begin
                        state <= S_SR_ENTRY;
                    end else begin
                        o_cke <= 1'b0;
                        state <= S_PPD;
                    end
                end else begin
                    wait_cnt <= wait_cnt - 1;
                end
            end

            // --------------------------------------------------
            S_APD: begin
                o_cke       <= 1'b0;
                o_power_ok  <= 1'b0;
                o_pwr_state <= S_APD;

                if (i_wakeup_req) begin
                    o_cke            <= 1'b1;
                    wait_cnt         <= TXP[9:0];
                    o_wakeup_latency <= TXP[9:0];
                    state            <= S_NORMAL; // simplified: APD exit direct
                end
            end

            // --------------------------------------------------
            S_PPD: begin
                o_cke       <= 1'b0;
                o_power_ok  <= 1'b0;
                o_pwr_state <= S_PPD;

                if (i_sr_req) begin
                    // Deepen to SR
                    state <= S_SR_ENTRY;
                end else if (i_wakeup_req) begin
                    o_cke            <= 1'b1;
                    wait_cnt         <= TXPDLL[9:0];
                    o_wakeup_latency <= TXPDLL[9:0];
                    state            <= S_SR_EXIT; // reuse exit counter
                    in_sr            <= 1'b0;      // not SR, just PPD exit
                end
            end

            // --------------------------------------------------
            S_SR_ENTRY: begin
                o_cke       <= 1'b0;
                o_power_ok  <= 1'b0;
                o_pwr_state <= S_SR_ENTRY;
                in_sr       <= 1'b1;
                // Wait minimum TCKESR before we consider SR active
                if (wait_cnt == 10'd0) begin
                    state    <= S_SELF_REFRESH;
                    wait_cnt <= TCKESR[9:0];
                end else begin
                    wait_cnt <= wait_cnt - 1;
                end
            end

            // --------------------------------------------------
            S_SELF_REFRESH: begin
                o_cke       <= 1'b0;
                o_power_ok  <= 1'b0;
                o_pwr_state <= S_SELF_REFRESH;

                if (i_wakeup_req) begin
                    // Assert CKE to exit self-refresh
                    o_cke            <= 1'b1;
                    wait_cnt         <= TXS[9:0];
                    o_wakeup_latency <= TXS[9:0];
                    state            <= S_SR_EXIT;
                end
            end

            // --------------------------------------------------
            S_SR_EXIT: begin
                o_cke       <= 1'b1;
                o_power_ok  <= 1'b0;
                o_pwr_state <= S_SR_EXIT;

                if (o_wakeup_latency != 10'd0) begin
                    o_wakeup_latency <= o_wakeup_latency - 1;
                    wait_cnt         <= wait_cnt - 1;
                end else begin
                    // Latency expired — memory is ready
                    o_power_ok <= 1'b1;
                    in_sr      <= 1'b0;
                    state      <= S_NORMAL;
                end
            end

            default: state <= S_NORMAL;
        endcase
    end
end

endmodule

8. SystemVerilog Testbench

SystemVerilog — tb_hbm3_power_mgmt.sv
// ============================================================
// tb_hbm3_power_mgmt.sv — Testbench for Power Management FSM
// EcrioniX · HBM3 Controller Build · Module 15
// ============================================================
`timescale 1ns/1ps

module tb_hbm3_power_mgmt;

logic        clk, rst_n;
logic        traffic_idle, all_banks_idle;
logic [15:0] idle_cycles;
logic        sr_req, wakeup_req;
logic        cke;
logic [2:0]  pwr_state;
logic [9:0]  wakeup_latency;
logic        power_ok, precharge_all;

// Short timings for simulation
hbm3_power_mgmt #(
    .TXP(10), .TXPDLL(24), .TXS(64), .TRP(5), .TCKESR(3)
) dut (
    .i_clk(clk), .i_rst_n(rst_n),
    .i_traffic_idle(traffic_idle), .i_all_banks_idle(all_banks_idle),
    .i_idle_cycles(idle_cycles), .i_sr_req(sr_req), .i_wakeup_req(wakeup_req),
    .o_cke(cke), .o_pwr_state(pwr_state),
    .o_wakeup_latency(wakeup_latency), .o_power_ok(power_ok),
    .o_precharge_all(precharge_all)
);

initial clk = 0;
always #1 clk = ~clk;

integer errors = 0;

task wait_cyc(input integer n); repeat(n) @(posedge clk); endtask

initial begin
    $dumpfile("tb_power_mgmt.vcd");
    $dumpvars(0, tb_hbm3_power_mgmt);

    // Reset
    rst_n = 0; traffic_idle = 0; all_banks_idle = 0;
    idle_cycles = 16'd8; sr_req = 0; wakeup_req = 0;
    wait_cyc(10);
    rst_n = 1;
    wait_cyc(5);

    // TEST 1: Normal operation — no idle
    $display("[%0t] TEST1: Normal — traffic active", $time);
    traffic_idle = 0;
    wait_cyc(20);
    if (pwr_state !== 3'd0) begin
        $error("FAIL: Expected NORMAL state, got %0d", pwr_state);
        errors++;
    end else $display("[%0t] PASS: Stayed in NORMAL", $time);
    if (!cke) begin $error("FAIL: CKE should be 1 in NORMAL"); errors++; end

    // TEST 2: Idle detection -> PPD (all banks closed)
    $display("[%0t] TEST2: Idle -> PPD", $time);
    traffic_idle = 1; all_banks_idle = 1;
    wait_cyc(12); // idle_cycles=8 + margin
    if (pwr_state !== 3'd4) begin
        $error("FAIL: Expected PPD(4), got %0d", pwr_state);
        errors++;
    end else $display("[%0t] PASS: Entered PPD", $time);
    if (cke) begin $error("FAIL: CKE should be 0 in PPD"); errors++; end

    // TEST 3: Wakeup from PPD
    $display("[%0t] TEST3: Wakeup from PPD", $time);
    wakeup_req = 1;
    @(posedge clk); wakeup_req = 0;
    wait_cyc(30); // wait > TXPDLL=24
    if (pwr_state !== 3'd0) begin
        $error("FAIL: Expected NORMAL after PPD exit, got %0d", pwr_state);
        errors++;
    end else $display("[%0t] PASS: Returned to NORMAL from PPD", $time);
    if (!power_ok) begin $error("FAIL: power_ok should be 1 in NORMAL"); errors++; end

    // TEST 4: Self-Refresh entry and exit
    $display("[%0t] TEST4: Self-Refresh enter/exit", $time);
    all_banks_idle = 1; traffic_idle = 1;
    sr_req = 1;
    @(posedge clk); sr_req = 0;
    wait_cyc(20); // wait for SR_ENTRY + TCKESR
    if (pwr_state !== 3'd6) begin
        $error("FAIL: Expected SELF_REFRESH(6), got %0d", pwr_state);
        errors++;
    end else $display("[%0t] PASS: In SELF_REFRESH", $time);

    // Wakeup from SR
    wakeup_req = 1;
    @(posedge clk); wakeup_req = 0;
    wait_cyc(70); // TXS=64 + margin
    if (pwr_state !== 3'd0) begin
        $error("FAIL: Expected NORMAL after SR exit, got %0d", pwr_state);
        errors++;
    end else $display("[%0t] PASS: NORMAL after SR exit", $time);
    if (!power_ok) begin $error("FAIL: power_ok should be 1"); errors++; end

    // Summary
    wait_cyc(20);
    if (errors == 0)
        $display("[%0t] ALL TESTS PASSED", $time);
    else
        $display("[%0t] %0d TEST(S) FAILED", $time, errors);
    $finish;
end

initial begin #200000; $error("TIMEOUT"); $finish; end

endmodule

9. Power Mode Comparison Table

Stateo_pwr_stateCKEBanksPower (typ)Exit LatencyUse Case
NORMAL3'b0001Open/Closed10–15 W0 cyclesActive traffic
IDLE_WAIT3'b0011Any10–15 W0 cyclesBrief idle detection
PRECHARGING3'b0101Closing8–12 WtRP onlyTransition to PPD/SR
APD3'b0110Open3–5 WtXP = 10Short idle, open rows
PPD3'b1000Precharged2–4 WtXPDLL = 24Short idle, closed banks
SR_ENTRY3'b1010Precharged1–2 WTransitioning to SR
SELF_REFRESH3'b1100Precharged0.5–1.5 WtXS = 512Long compute idle
SR_EXIT3'b1111Precharged2–5 WCounting tXSDLL re-lock in progress

Frequently Asked Questions

What is the difference between Active Power-Down and Self-Refresh in HBM3?

Active Power-Down (APD) keeps banks open and retains row state but disables the clock to save dynamic power. Exit latency is short (tXP = 10 cycles) because no re-training is needed. Self-Refresh (SR) forces all banks precharged, disables the external clock (CKE=0), and makes the DRAM run its own internal refresh. SR saves far more power but has a much longer exit latency (tXS = 512 cycles) due to DLL re-lock.

Why does self-refresh exit take so long (tXS = 512 cycles)?

During self-refresh the DRAM's internal oscillator replaces the external CK. When CKE is re-asserted, the DRAM must: (1) wait for external clock to stabilize, (2) re-lock the internal DLL to the external clock. Steps 1 and 2 dominate — DLL lock typically requires 200–500 cycles after a stable clock edge. Hence the 512-cycle minimum tXS in JEDEC JESD238.

What is CKE and how does it control power modes?

CKE (Clock Enable) is driven by the memory controller to the DRAM. CKE=1 means the DRAM is active and the clock is enabled. CKE=0 causes the DRAM to enter a power-saving mode — the exact mode (APD, PPD, or SR) depends on bank state when CKE goes low. Active banks yield APD; precharged banks yield PPD or SR depending on prior MR settings.

How does the controller detect when to enter power-down?

The controller counts consecutive cycles with no pending AXI transactions (i_traffic_idle). When the idle count exceeds i_idle_cycles (programmable), it initiates the power-down sequence. The threshold is typically 16–256 cycles for PPD and 4096–65535 cycles for SR, reflecting the trade-off between power savings and the latency cost of exiting the mode.

How does wake-up latency affect system performance?

Wake-up latency is the cycles the AXI interface must wait after wakeup_req before the first transaction completes. APD tXP=10 cycles is negligible. PPD tXPDLL=24 cycles is small. SR tXS=512 cycles means every SR entry/exit costs 256ns dead time at 2GHz. The o_wakeup_latency counter lets the AXI wrapper insert precise wait states rather than a conservative fixed timeout.