HBM3 ECC Engine (SECDED) — Module 7 | HBM3 Controller Build

Q: What is SECDED and what Hamming distance does it require?

SECDED stands for Single Error Correct, Double Error Detect. It requires a minimum Hamming distance of 4 between any two valid codewords. With distance 4: any single-bit error moves you to distance 1 from the correct codeword (correctable), any double-bit error moves you to distance 2 from all valid codewords (detectable but not correctable). The overall parity bit (P7) is what separates a correctable single error from an uncorrectable double error.

Q: How many parity bits are needed for 32-bit data?

Standard Hamming codes require ceil(log2(m+r+1)) parity bits for m data bits. For 32 data bits: 6 bits gives coverage up to 63 positions — enough for 32+6=38 bit positions. Adding the 7th overall parity bit gives SECDED capability, for a total codeword width of 39 bits (32 data + 7 ECC).

Q: What does the syndrome value represent after decoding?

The syndrome is computed by XOR-ing the received parity bits against freshly computed parity over the received data. A syndrome of 0 means no error. A non-zero syndrome whose value equals a valid bit position (1–38) and whose overall parity is odd means a correctable single-bit error at that position. A non-zero syndrome with even overall parity means an uncorrectable double-bit error has been detected.

Q: Is this ECC engine synthesizable for ASIC and FPGA?

Yes. The hbm3_ecc_engine module uses only XOR trees and registered outputs — no RAM, no loops with variable bounds, no unsupported constructs. It synthesizes cleanly in Synopsys Design Compiler for ASIC and in Vivado/Quartus for FPGA. The XOR trees for 7 parity bits over 32 data bits are shallow (depth ~5 XOR gates) so they meet timing easily even at 2 GHz.

1. Why HBM3 Needs ECC

DRAM cells are capacitors. They leak charge, they can be struck by alpha particles from packaging materials, and cosmic-ray neutrons cause soft errors at a measurable rate. The industry metric is BEER (Bit Error Rate) — for modern HBM3 at 64 Gb it sits around 10^-12 to 10^-14 raw errors per bit per hour. That sounds tiny until you have a GPU with 96 GB of HBM3: at 10^-12 you statistically see a raw bit flip roughly every few hours of continuous use.

Without ECC that flip is silent data corruption. With SECDED ECC the controller catches and corrects it in hardware, transparently. The application never sees the error; the OS may log a corrected-error counter.

HBM3 uses inline ECC. Unlike DDR5 which pins a separate x8 ECC DRAM rank onto the bus, HBM3 stores its ECC bits in the same die array as data. Every 32-bit data burst carried over a pseudo-channel also carries 7 ECC bits, forming a 39-bit codeword. The PHY reads all 39 bits and passes them to this engine before forwarding corrected data to the memory subsystem.

Key numbers: 32 data bits + 7 ECC bits = 39 bits total per pseudo-channel per access. ECC overhead = 7/39 ≈ 18%. This is the price HBM3 pays for hardware-corrected reliability.

2. SECDED Theory — Hamming Distance

The Hamming distance between two binary strings is the number of bit positions where they differ. Error-correcting codes work by ensuring every valid codeword is sufficiently far from every other valid codeword.

d=1: No redundancy. Any error hits another valid codeword — undetectable.
d=2: Single parity bit. Detects 1-bit errors, corrects none.
d=3: Hamming SEC. Corrects 1-bit errors, detects 2-bit errors ambiguously.
d=4: SECDED. Corrects 1-bit errors, unambiguously detects 2-bit errors.

SECDED requires d=4. The key insight: with d=4, a single-bit error moves the received word to distance 1 from exactly one valid codeword (the original) and distance 3+ from all others — so correction is unambiguous. A double-bit error moves the received word to distance 2 from the nearest valid codewords — not correctable, but detectable because it cannot be confused with a single-bit error from another codeword.

The extra bit that pushes from d=3 to d=4 is the overall parity bit P7 — an XOR of all 38 other bits in the codeword. After decoding: if syndrome ≠ 0 and overall parity is odd → single-bit error, correct it. If syndrome ≠ 0 and overall parity is even → double-bit error, flag it uncorrectable.

3. Hamming Code Construction

Parity Bit Positions (1-indexed)

In a Hamming code, parity bits occupy positions that are powers of 2: 1, 2, 4, 8, 16, 32, 64. Every other position carries a data bit. For our 39-bit codeword the positions are:

Codeword Position	Role	Label
1	Parity	P1
2	Parity	P2
3	Data	D0
4	Parity	P4
5	Data	D1
6	Data	D2
7	Data	D3
8	Parity	P8
9–15	Data	D4–D10
16	Parity	P16
17–31	Data	D11–D25
32	Parity	P32
33–38	Data	D26–D31
39	Overall parity	P7 (P_all)

Which Bits Each Parity Covers

Each parity bit Pi covers all codeword positions whose binary representation has bit i set:

P1 (bit 0): covers positions 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37 — i.e., all odd positions
P2 (bit 1): covers positions 2,3,6,7,10,11,14,15,18,19,22,23,26,27,30,31,34,35,38
P4 (bit 2): covers positions 4–7,12–15,20–23,28–31,36–39
P8 (bit 3): covers positions 8–15,24–31
P16 (bit 4): covers positions 16–31
P32 (bit 5): covers positions 32–38
P7/P_all: XOR of all 38 bits (positions 1–38)

Each parity bit is set so that the XOR of all bits it covers (including itself) equals zero in a correct codeword. During decoding, re-computing these XORs over the received data gives the syndrome.

4. Data Flow Diagram

5. Encoder Logic

The encoder takes i_enc_data[31:0] and computes 6 Hamming parity bits plus one overall parity bit. Each of the 6 parity bits is the XOR of specific data bits — those whose codeword positions (after inserting parity slots) have the corresponding power-of-2 bit set.

After laying out data into the 38-position codeword (positions 1–38 excluding parity slots = data bits D0–D31), each Hamming parity bit covers:

Parity Bit	Position	Data bits covered (D-index)
P1	1	D0,D1,D3,D4,D6,D8,D10,D11,D13,D15,D17,D19,D21,D23,D25,D26,D28,D30
P2	2	D0,D2,D3,D5,D6,D9,D10,D12,D13,D16,D17,D20,D21,D24,D25,D27,D28,D31
P4	4	D1,D2,D3,D7,D8,D9,D10,D14,D15,D16,D17,D22,D23,D24,D25,D29,D30,D31
P8	8	D4,D5,D6,D7,D8,D9,D10,D18,D19,D20,D21,D22,D23,D24,D25
P16	16	D11,D12,D13,D14,D15,D16,D17,D18,D19,D20,D21,D22,D23,D24,D25
P32	32	D26,D27,D28,D29,D30,D31
P_all	39	XOR of all 38 bits (P1..P32 + D0..D31)

The 39-bit output codeword layout places parity bits at their power-of-2 positions and data bits at the remaining positions. In zero-indexed o_enc_codeword[38:0]: index 0 = position 1 (P1), index 1 = position 2 (P2), index 2 = position 3 (D0), index 3 = position 4 (P4), and so on.

6. Full Verilog Source

Verilog — hbm3_ecc_engine.v

// ============================================================
// hbm3_ecc_engine.v
// SECDED (Single Error Correct, Double Error Detect) ECC
// for HBM3 pseudo-channel: 32-bit data + 7 parity = 39-bit
// codeword.  Inline ECC — no separate ECC DRAM needed.
// Phase 2, Module 7 — EcrioniX HBM3 Controller Series
// ============================================================
module hbm3_ecc_engine (
    input  wire        i_clk,
    input  wire        i_rst_n,

    // Encode path
    input  wire        i_enc_valid,          // encode request
    input  wire [31:0] i_enc_data,           // 32-bit data in
    output reg  [38:0] o_enc_codeword,       // 39-bit codeword out

    // Decode path
    input  wire        i_dec_valid,          // decode request
    input  wire [38:0] i_dec_codeword,       // received 39-bit word
    output reg  [31:0] o_dec_data,           // corrected data
    output reg         o_single_err,         // 1 = 1-bit error corrected
    output reg         o_double_err,         // 1 = 2-bit error, uncorrectable
    output reg  [5:0]  o_err_pos             // bit position of error (0=none)
);

// ─────────────────────────────────────────────────────────────
// Internal wires
// ─────────────────────────────────────────────────────────────
// Codeword layout (1-indexed position → 0-indexed array index):
// Pos 1  → cw[0]  : P1
// Pos 2  → cw[1]  : P2
// Pos 3  → cw[2]  : D[0]
// Pos 4  → cw[3]  : P4
// Pos 5  → cw[4]  : D[1]
// Pos 6  → cw[5]  : D[2]
// Pos 7  → cw[6]  : D[3]
// Pos 8  → cw[7]  : P8
// Pos 9  → cw[8]  : D[4]
// Pos 10 → cw[9]  : D[5]
// Pos 11 → cw[10] : D[6]
// Pos 12 → cw[11] : D[7]
// Pos 13 → cw[12] : D[8]
// Pos 14 → cw[13] : D[9]
// Pos 15 → cw[14] : D[10]
// Pos 16 → cw[15] : P16
// Pos 17 → cw[16] : D[11]
// Pos 18 → cw[17] : D[12]
// Pos 19 → cw[18] : D[13]
// Pos 20 → cw[19] : D[14]
// Pos 21 → cw[20] : D[15]
// Pos 22 → cw[21] : D[16]
// Pos 23 → cw[22] : D[17]
// Pos 24 → cw[23] : D[18]
// Pos 25 → cw[24] : D[19]
// Pos 26 → cw[25] : D[20]
// Pos 27 → cw[26] : D[21]
// Pos 28 → cw[27] : D[22]
// Pos 29 → cw[28] : D[23]
// Pos 30 → cw[29] : D[24]
// Pos 31 → cw[30] : D[25]
// Pos 32 → cw[31] : P32
// Pos 33 → cw[32] : D[26]
// Pos 34 → cw[33] : D[27]
// Pos 35 → cw[34] : D[28]
// Pos 36 → cw[35] : D[29]
// Pos 37 → cw[36] : D[30]
// Pos 38 → cw[37] : D[31]
// Pos 39 → cw[38] : P_all (overall parity)

// ─────────────────────────────────────────────────────────────
// ENCODER
// ─────────────────────────────────────────────────────────────
wire [5:0]  enc_p;      // 6 Hamming parity bits
wire        enc_pall;   // overall parity
wire [37:0] cw_pre;     // codeword before overall parity

// P1: covers positions with bit0=1: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37
// → data bits at those positions: D0,D1,D3,D4,D6,D8,D10,D11,D13,D15,D17,D19,D21,D23,D25,D26,D28,D30
assign enc_p[0] = i_enc_data[0]  ^ i_enc_data[1]  ^ i_enc_data[3]  ^ i_enc_data[4]  ^
                   i_enc_data[6]  ^ i_enc_data[8]  ^ i_enc_data[10] ^ i_enc_data[11] ^
                   i_enc_data[13] ^ i_enc_data[15] ^ i_enc_data[17] ^ i_enc_data[19] ^
                   i_enc_data[21] ^ i_enc_data[23] ^ i_enc_data[25] ^ i_enc_data[26] ^
                   i_enc_data[28] ^ i_enc_data[30];

// P2: covers positions with bit1=1: 2,3,6,7,10,11,14,15,18,19,22,23,26,27,30,31,34,35,38
assign enc_p[1] = i_enc_data[0]  ^ i_enc_data[2]  ^ i_enc_data[3]  ^ i_enc_data[5]  ^
                   i_enc_data[6]  ^ i_enc_data[9]  ^ i_enc_data[10] ^ i_enc_data[12] ^
                   i_enc_data[13] ^ i_enc_data[16] ^ i_enc_data[17] ^ i_enc_data[20] ^
                   i_enc_data[21] ^ i_enc_data[24] ^ i_enc_data[25] ^ i_enc_data[27] ^
                   i_enc_data[28] ^ i_enc_data[31];

// P4: covers positions with bit2=1: 4-7,12-15,20-23,28-31,36-38
assign enc_p[2] = i_enc_data[1]  ^ i_enc_data[2]  ^ i_enc_data[3]  ^ i_enc_data[7]  ^
                   i_enc_data[8]  ^ i_enc_data[9]  ^ i_enc_data[10] ^ i_enc_data[14] ^
                   i_enc_data[15] ^ i_enc_data[16] ^ i_enc_data[17] ^ i_enc_data[22] ^
                   i_enc_data[23] ^ i_enc_data[24] ^ i_enc_data[25] ^ i_enc_data[29] ^
                   i_enc_data[30] ^ i_enc_data[31];

// P8: covers positions with bit3=1: 8-15,24-31
assign enc_p[3] = i_enc_data[4]  ^ i_enc_data[5]  ^ i_enc_data[6]  ^ i_enc_data[7]  ^
                   i_enc_data[8]  ^ i_enc_data[9]  ^ i_enc_data[10] ^ i_enc_data[18] ^
                   i_enc_data[19] ^ i_enc_data[20] ^ i_enc_data[21] ^ i_enc_data[22] ^
                   i_enc_data[23] ^ i_enc_data[24] ^ i_enc_data[25];

// P16: covers positions with bit4=1: 16-31
assign enc_p[4] = i_enc_data[11] ^ i_enc_data[12] ^ i_enc_data[13] ^ i_enc_data[14] ^
                   i_enc_data[15] ^ i_enc_data[16] ^ i_enc_data[17] ^ i_enc_data[18] ^
                   i_enc_data[19] ^ i_enc_data[20] ^ i_enc_data[21] ^ i_enc_data[22] ^
                   i_enc_data[23] ^ i_enc_data[24] ^ i_enc_data[25];

// P32: covers positions with bit5=1: 32-38
assign enc_p[5] = i_enc_data[26] ^ i_enc_data[27] ^ i_enc_data[28] ^ i_enc_data[29] ^
                   i_enc_data[30] ^ i_enc_data[31];

// Assemble pre-parity codeword (no P_all yet)
assign cw_pre = {
    i_enc_data[31], i_enc_data[30], i_enc_data[29], i_enc_data[28],
    i_enc_data[27], i_enc_data[26], enc_p[5],
    i_enc_data[25], i_enc_data[24], i_enc_data[23], i_enc_data[22],
    i_enc_data[21], i_enc_data[20], i_enc_data[19], i_enc_data[18],
    i_enc_data[17], i_enc_data[16], i_enc_data[15], i_enc_data[14],
    i_enc_data[13], i_enc_data[12], i_enc_data[11], enc_p[4],
    i_enc_data[10], i_enc_data[9],  i_enc_data[8],  i_enc_data[7],
    i_enc_data[6],  i_enc_data[5],  i_enc_data[4],  enc_p[3],
    i_enc_data[3],  i_enc_data[2],  i_enc_data[1],  enc_p[2],
    i_enc_data[0],  enc_p[1],       enc_p[0]
};

// Overall parity = XOR of all 38 bits
assign enc_pall = ^cw_pre;

always @(posedge i_clk or negedge i_rst_n) begin
    if (!i_rst_n)
        o_enc_codeword <= 39'd0;
    else if (i_enc_valid)
        o_enc_codeword <= {enc_pall, cw_pre};
end

// ─────────────────────────────────────────────────────────────
// DECODER — syndrome computation
// ─────────────────────────────────────────────────────────────
wire [5:0]  synd;           // 6-bit Hamming syndrome
wire        synd_pall;      // overall parity of received word
wire        any_err;
wire        sec;            // single error correctable
wire        ded;            // double error detected
wire [37:0] rx_cw;          // received codeword bits 37:0
wire        rx_pall;        // received overall parity bit (cw[38])

assign rx_cw   = i_dec_codeword[37:0];
assign rx_pall = i_dec_codeword[38];

// Recompute Hamming parities over received bits, XOR with received parity bits
assign synd[0] = rx_cw[0] ^ rx_cw[2] ^ rx_cw[4] ^ rx_cw[6] ^ rx_cw[8]  ^
                  rx_cw[10] ^ rx_cw[12] ^ rx_cw[14] ^ rx_cw[16] ^ rx_cw[18] ^
                  rx_cw[20] ^ rx_cw[22] ^ rx_cw[24] ^ rx_cw[26] ^ rx_cw[28] ^
                  rx_cw[30] ^ rx_cw[32] ^ rx_cw[34] ^ rx_cw[36];

assign synd[1] = rx_cw[1] ^ rx_cw[2] ^ rx_cw[5] ^ rx_cw[6] ^ rx_cw[9]  ^
                  rx_cw[10] ^ rx_cw[13] ^ rx_cw[14] ^ rx_cw[17] ^ rx_cw[18] ^
                  rx_cw[21] ^ rx_cw[22] ^ rx_cw[25] ^ rx_cw[26] ^ rx_cw[29] ^
                  rx_cw[30] ^ rx_cw[33] ^ rx_cw[34] ^ rx_cw[37];

assign synd[2] = rx_cw[3] ^ rx_cw[4] ^ rx_cw[5] ^ rx_cw[6] ^ rx_cw[11] ^
                  rx_cw[12] ^ rx_cw[13] ^ rx_cw[14] ^ rx_cw[19] ^ rx_cw[20] ^
                  rx_cw[21] ^ rx_cw[22] ^ rx_cw[27] ^ rx_cw[28] ^ rx_cw[29] ^
                  rx_cw[30] ^ rx_cw[35] ^ rx_cw[36] ^ rx_cw[37];

assign synd[3] = rx_cw[7]  ^ rx_cw[8]  ^ rx_cw[9]  ^ rx_cw[10] ^ rx_cw[11] ^
                  rx_cw[12] ^ rx_cw[13] ^ rx_cw[14] ^ rx_cw[23] ^ rx_cw[24] ^
                  rx_cw[25] ^ rx_cw[26] ^ rx_cw[27] ^ rx_cw[28] ^ rx_cw[29] ^ rx_cw[30];

assign synd[4] = rx_cw[15] ^ rx_cw[16] ^ rx_cw[17] ^ rx_cw[18] ^ rx_cw[19] ^
                  rx_cw[20] ^ rx_cw[21] ^ rx_cw[22] ^ rx_cw[23] ^ rx_cw[24] ^
                  rx_cw[25] ^ rx_cw[26] ^ rx_cw[27] ^ rx_cw[28] ^ rx_cw[29] ^ rx_cw[30];

assign synd[5] = rx_cw[31] ^ rx_cw[32] ^ rx_cw[33] ^ rx_cw[34] ^
                  rx_cw[35] ^ rx_cw[36] ^ rx_cw[37];

// Overall parity check (XOR of all 39 received bits)
assign synd_pall = ^i_dec_codeword;  // should be 0 if no error or even-count error

assign any_err = |synd;
assign sec     = any_err &  synd_pall;   // odd parity → correctable single error
assign ded     = any_err & ~synd_pall;   // even parity + syndrome → double error

// ─────────────────────────────────────────────────────────────
// Error correction — flip the bit at position synd (1-indexed)
// synd is the error position in the 1-indexed Hamming space
// Map it back to data bit index
// ─────────────────────────────────────────────────────────────
wire [38:0] corrected_cw;
genvar gi;
generate
    for (gi = 0; gi < 39; gi = gi + 1) begin : g_flip
        assign corrected_cw[gi] = (sec && (synd == (gi+1))) ?
                                   ~i_dec_codeword[gi] : i_dec_codeword[gi];
    end
endgenerate

// Extract data bits from corrected codeword
wire [31:0] ext_data;
assign ext_data = {
    corrected_cw[37], corrected_cw[36], corrected_cw[35], corrected_cw[34],
    corrected_cw[33], corrected_cw[32], corrected_cw[30], corrected_cw[29],
    corrected_cw[28], corrected_cw[27], corrected_cw[26], corrected_cw[25],
    corrected_cw[24], corrected_cw[23], corrected_cw[22], corrected_cw[21],
    corrected_cw[20], corrected_cw[19], corrected_cw[18], corrected_cw[17],
    corrected_cw[16], corrected_cw[14], corrected_cw[13], corrected_cw[12],
    corrected_cw[11], corrected_cw[10], corrected_cw[9],  corrected_cw[8],
    corrected_cw[6],  corrected_cw[5],  corrected_cw[4],  corrected_cw[2]
};

always @(posedge i_clk or negedge i_rst_n) begin
    if (!i_rst_n) begin
        o_dec_data   <= 32'd0;
        o_single_err <= 1'b0;
        o_double_err <= 1'b0;
        o_err_pos    <= 6'd0;
    end else if (i_dec_valid) begin
        o_dec_data   <= ext_data;
        o_single_err <= sec;
        o_double_err <= ded;
        o_err_pos    <= sec ? synd : 6'd0;
    end
end

endmodule

7. Syndrome Table (First 16 Entries)

The 6-bit syndrome value directly equals the 1-indexed error position in the codeword. Syndrome 0 = no error. When the overall parity check is also odd, the syndrome gives the exact position to flip.

Syndrome [5:0]	Error Position	Codeword Bit	Maps To
000000	None	—	No error
000001	1	cw[0]	P1 (parity bit)
000010	2	cw[1]	P2 (parity bit)
000011	3	cw[2]	D[0]
000100	4	cw[3]	P4 (parity bit)
000101	5	cw[4]	D[1]
000110	6	cw[5]	D[2]
000111	7	cw[6]	D[3]
001000	8	cw[7]	P8 (parity bit)
001001	9	cw[8]	D[4]
001010	10	cw[9]	D[5]
001011	11	cw[10]	D[6]
001100	12	cw[11]	D[7]
001101	13	cw[12]	D[8]
001110	14	cw[13]	D[9]
001111	15	cw[14]	D[10]

8. Port Reference Table

Port	Dir	Width	Description
i_clk	In	1	System clock
i_rst_n	In	1	Active-low synchronous reset
i_enc_valid	In	1	Encode request — latch i_enc_data this cycle
i_enc_data	In	32	Raw 32-bit data to protect
o_enc_codeword	Out	39	39-bit SECDED codeword (data + 7 ECC bits), registered
i_dec_valid	In	1	Decode request — evaluate i_dec_codeword this cycle
i_dec_codeword	In	39	Received codeword from DRAM (may have errors)
o_dec_data	Out	32	Corrected data output, registered
o_single_err	Out	1	High when a single-bit error was detected and corrected
o_double_err	Out	1	High when a double-bit error is detected (uncorrectable)
o_err_pos	Out	6	Codeword position (1–38) of corrected bit; 0 if no error

9. SystemVerilog Testbench

SystemVerilog — tb_hbm3_ecc_engine.sv

// ============================================================
// tb_hbm3_ecc_engine.sv
// Self-checking testbench with SVA assertions
// Tests: no-error, single-bit error at every position, DED
// ============================================================
`timescale 1ns/1ps
module tb_hbm3_ecc_engine;

logic        clk, rst_n;
logic        enc_valid;
logic [31:0] enc_data;
logic [38:0] enc_cw;

logic        dec_valid;
logic [38:0] dec_cw;
logic [31:0] dec_data;
logic        single_err, double_err;
logic [5:0]  err_pos;

hbm3_ecc_engine dut (
    .i_clk(clk), .i_rst_n(rst_n),
    .i_enc_valid(enc_valid), .i_enc_data(enc_data),
    .o_enc_codeword(enc_cw),
    .i_dec_valid(dec_valid), .i_dec_codeword(dec_cw),
    .o_dec_data(dec_data),
    .o_single_err(single_err), .o_double_err(double_err),
    .o_err_pos(err_pos)
);

// 500 MHz clock
initial clk = 0;
always #1 clk = ~clk;

// SVA: after enc_valid, codeword must be valid next cycle
property p_enc_latency;
    @(posedge clk) enc_valid |=> (enc_cw !== 39'bx);
endproperty
assert property(p_enc_latency) else
    $error("ENC: codeword X after enc_valid");

// SVA: dec output stable cycle after dec_valid
property p_dec_latency;
    @(posedge clk) dec_valid |=> (dec_data !== 32'bx);
endproperty
assert property(p_dec_latency) else
    $error("DEC: data X after dec_valid");

// SVA: single_err and double_err must be mutually exclusive
property p_exclusive;
    @(posedge clk) !(single_err && double_err);
endproperty
assert property(p_exclusive) else
    $error("ECC: single_err and double_err both asserted!");

task automatic encode_and_check;
    input [31:0] data;
    output [38:0] cw_out;
    begin
        @(negedge clk);
        enc_valid = 1; enc_data = data;
        @(posedge clk); #0.1;
        enc_valid = 0;
        @(posedge clk); #0.1;
        cw_out = enc_cw;
    end
endtask

task automatic decode_cw;
    input [38:0] cw;
    begin
        @(negedge clk);
        dec_valid = 1; dec_cw = cw;
        @(posedge clk); #0.1;
        dec_valid = 0;
        @(posedge clk); #0.1;
    end
endtask

integer i, pass, fail;
logic [38:0] cw_good;

initial begin
    pass = 0; fail = 0;
    rst_n = 0; enc_valid = 0; dec_valid = 0;
    enc_data = 0; dec_cw = 0;
    repeat(4) @(posedge clk);
    rst_n = 1;
    @(posedge clk);

    // ── Test 1: No error ──────────────────────────────────────
    encode_and_check(32'hDEAD_BEEF, cw_good);
    decode_cw(cw_good);
    if (dec_data === 32'hDEAD_BEEF && !single_err && !double_err) begin
        $display("PASS: No-error decode: data=0x%08X", dec_data); pass++;
    end else begin
        $display("FAIL: No-error decode: got=0x%08X sec=%0b ded=%0b",
                  dec_data, single_err, double_err); fail++;
    end

    // ── Test 2: Single-bit errors at all 39 positions ─────────
    encode_and_check(32'hA5A5_A5A5, cw_good);
    for (i = 0; i < 39; i++) begin
        decode_cw(cw_good ^ (39'd1 << i));
        if (single_err && !double_err && dec_data === 32'hA5A5_A5A5) begin
            pass++;
        end else begin
            $display("FAIL: SEC at pos %0d: dec=0x%08X sec=%b ded=%b pos=%0d",
                      i, dec_data, single_err, double_err, err_pos);
            fail++;
        end
    end
    $display("SEC sweep: %0d/39 passed", pass-1); // -1 for test1

    // ── Test 3: Double-bit error ───────────────────────────────
    encode_and_check(32'h1234_5678, cw_good);
    decode_cw(cw_good ^ 39'h3);  // flip bits 0 and 1
    if (double_err && !single_err) begin
        $display("PASS: DED detected"); pass++;
    end else begin
        $display("FAIL: DED not flagged: sec=%0b ded=%0b", single_err, double_err);
        fail++;
    end

    $display("═══ RESULTS: PASS=%0d  FAIL=%0d ═══", pass, fail);
    if (fail === 0) $display("ALL TESTS PASSED");
    $finish;
end
endmodule

10. Frequently Asked Questions

Why does HBM3 use inline ECC instead of a separate ECC DRAM?

HBM3 stores ECC bits in the same memory array as data (inline ECC), unlike DDR5 which uses a separate x8 DRAM rank. Inline ECC lets HBM3 maintain its wide-bus efficiency without needing extra dies. The trade-off is about 18% capacity overhead (7 ECC bits per 32 data bits), but the latency savings from avoiding a separate ECC transaction dominate at HBM3 speeds.

What is SECDED and what Hamming distance does it require?

SECDED (Single Error Correct, Double Error Detect) requires a minimum Hamming distance of 4. With d=4: any single-bit error moves the received word to distance 1 from the correct codeword (correctable), any double-bit error moves it to distance 2 from all valid codewords (detectable but not correctable). The overall parity bit P7 is what pushes the minimum distance from 3 to 4 and disambiguates correctable from uncorrectable errors.

How many parity bits are needed for 32-bit data?

Standard Hamming codes require r parity bits where 2^r >= m + r + 1, with m = 32 data bits. Solving: r=6 gives 2^6=64 >= 32+6+1=39. So 6 Hamming parity bits suffice. Adding the 7th overall parity bit converts SEC to SECDED, for a total codeword width of 39 bits (32 data + 7 ECC).

What does the syndrome value represent after decoding?

The syndrome is a 6-bit value computed by XOR-ing the received parity bits against freshly recomputed parity over the received data bits. Syndrome = 0 means no error. A non-zero syndrome whose value falls within the codeword range (1–38) and whose overall parity bit is odd means a single-bit error at that exact position — correct it by flipping that bit. A non-zero syndrome with even overall parity means a double-bit error has been detected and cannot be corrected.

Is this ECC engine synthesizable for ASIC and FPGA?

Yes. The module uses only XOR trees and registered outputs — no RAM, no variable-bound loops, no unsupported constructs. It synthesizes cleanly in Synopsys Design Compiler for ASIC and in Vivado/Quartus for FPGA. The XOR trees for 7 parity bits over 32 data bits are shallow (depth ~5 XOR gates) so they meet timing comfortably even at 2 GHz. The generate loop for bit-flip correction produces 39 parallel 2:1 muxes.

← Module 6: Read Path Module 8: Address Mapper →