Topic 20 · Digital Electronics

Carry Lookahead Adder
Generate · Propagate · O(log N)

The key to fast binary arithmetic: compute all carries in parallel using G and P signals — eliminating the ripple carry bottleneck.

Generate GPropagate P4-bit CLA Group CLAO(log N)Verilog

Why Ripple Carry is Slow

In a Ripple Carry Adder: Ci+1 = Gi + Pi·Ci — each carry waits for the previous. For a 32-bit adder, carry ripples through 32 stages.

Adder typeDelayArea
Ripple Carry (RCA)O(N)O(N)
Carry Lookahead (CLA)O(log N)O(N log N)
Kogge-Stone (prefix)O(log N)O(N log N)
Carry SelectO(√N)O(N)
Carry SkipO(√N)O(N)

Generate & Propagate — The Core Idea

Gi = Ai · Bi   (generate — carry created regardless of Cin)
Pi = Ai ⊕ Bi   (propagate — carry-in is passed through)
Ci+1 = Gi + Pi · Ci

Expanding recursively — all computed simultaneously:

C1 = G0 + P0·C0
C2 = G1 + P1·G0 + P1·P0·C0
C3 = G2 + P2·G1 + P2·P1·G0 + P2·P1·P0·C0
C4 = G3 + P3·(G2 + P2·(G1 + P1·(G0 + P0·C0)))
All four carries computed in 2 gate levels (AND then OR) — vs 4 FA stages in RCA. Sum bits: Si = Pi ⊕ Ci.

Interactive 4-bit CLA Simulator

A (4-bit)
B (4-bit)
Cin
Carries (all parallel):

Verilog — 4-bit CLA & 16-bit Group CLA

module cla4 (
  input  logic [3:0] a, b,
  input  logic       cin,
  output logic [3:0] sum,
  output logic       cout
);
  logic [3:0] g, p;
  logic [4:0] c;

  assign g    = a & b;
  assign p    = a ^ b;
  assign c[0] = cin;
  assign c[1] = g[0] | (p[0] & c[0]);
  assign c[2] = g[1] | (p[1] & g[0])             | (p[1] & p[0] & c[0]);
  assign c[3] = g[2] | (p[2] & g[1])             | (p[2] & p[1] & g[0])
              | (p[2] & p[1] & p[0] & c[0]);
  assign c[4] = g[3] | (p[3] & g[2])             | (p[3] & p[2] & g[1])
              | (p[3] & p[2] & p[1] & g[0])
              | (p[3] & p[2] & p[1] & p[0] & c[0]);
  assign sum  = p ^ c[3:0];
  assign cout = c[4];
endmodule

// 16-bit Group CLA: four 4-bit blocks chained
module cla16 (
  input  logic [15:0] a, b,
  input  logic        cin,
  output logic [15:0] sum,
  output logic        cout
);
  logic [2:0] gc; // group carries
  cla4 b0(a[3:0],   b[3:0],   cin,   sum[3:0],   gc[0]);
  cla4 b1(a[7:4],   b[7:4],   gc[0], sum[7:4],   gc[1]);
  cla4 b2(a[11:8],  b[11:8],  gc[1], sum[11:8],  gc[2]);
  cla4 b3(a[15:12], b[15:12], gc[2], sum[15:12], cout);
endmodule

Frequently Asked Questions

What is the carry lookahead adder?

CLA computes all carries simultaneously using G (A·B) and P (A XOR B) signals, reducing carry delay from O(N) to O(log N). Essential for fast processor ALUs.

What are generate and propagate signals?

Gi=A·B: carry is generated here. Pi=A⊕B: carry-in is passed through. These let all carries be computed in 2 gate levels rather than waiting for each bit sequentially.

Why don't processors use pure CLA for 64-bit?

Pure CLA area grows as O(N²) for wide operands. Modern CPUs use Kogge-Stone or Brent-Kung parallel prefix adders — O(log N) delay with O(N log N) area — a better tradeoff at 64-bit widths.