Carry Lookahead Adder
Generate · Propagate · O(log N)
The key to fast binary arithmetic: compute all carries in parallel using G and P signals — eliminating the ripple carry bottleneck.
Why Ripple Carry is Slow
In a Ripple Carry Adder: Ci+1 = Gi + Pi·Ci — each carry waits for the previous. For a 32-bit adder, carry ripples through 32 stages.
| Adder type | Delay | Area |
|---|---|---|
| Ripple Carry (RCA) | O(N) | O(N) |
| Carry Lookahead (CLA) | O(log N) | O(N log N) |
| Kogge-Stone (prefix) | O(log N) | O(N log N) |
| Carry Select | O(√N) | O(N) |
| Carry Skip | O(√N) | O(N) |
Generate & Propagate — The Core Idea
Pi = Ai ⊕ Bi (propagate — carry-in is passed through)
Ci+1 = Gi + Pi · Ci
Expanding recursively — all computed simultaneously:
C2 = G1 + P1·G0 + P1·P0·C0
C3 = G2 + P2·G1 + P2·P1·G0 + P2·P1·P0·C0
C4 = G3 + P3·(G2 + P2·(G1 + P1·(G0 + P0·C0)))
Interactive 4-bit CLA Simulator
Verilog — 4-bit CLA & 16-bit Group CLA
module cla4 (
input logic [3:0] a, b,
input logic cin,
output logic [3:0] sum,
output logic cout
);
logic [3:0] g, p;
logic [4:0] c;
assign g = a & b;
assign p = a ^ b;
assign c[0] = cin;
assign c[1] = g[0] | (p[0] & c[0]);
assign c[2] = g[1] | (p[1] & g[0]) | (p[1] & p[0] & c[0]);
assign c[3] = g[2] | (p[2] & g[1]) | (p[2] & p[1] & g[0])
| (p[2] & p[1] & p[0] & c[0]);
assign c[4] = g[3] | (p[3] & g[2]) | (p[3] & p[2] & g[1])
| (p[3] & p[2] & p[1] & g[0])
| (p[3] & p[2] & p[1] & p[0] & c[0]);
assign sum = p ^ c[3:0];
assign cout = c[4];
endmodule
// 16-bit Group CLA: four 4-bit blocks chained
module cla16 (
input logic [15:0] a, b,
input logic cin,
output logic [15:0] sum,
output logic cout
);
logic [2:0] gc; // group carries
cla4 b0(a[3:0], b[3:0], cin, sum[3:0], gc[0]);
cla4 b1(a[7:4], b[7:4], gc[0], sum[7:4], gc[1]);
cla4 b2(a[11:8], b[11:8], gc[1], sum[11:8], gc[2]);
cla4 b3(a[15:12], b[15:12], gc[2], sum[15:12], cout);
endmodule
Frequently Asked Questions
What is the carry lookahead adder?
CLA computes all carries simultaneously using G (A·B) and P (A XOR B) signals, reducing carry delay from O(N) to O(log N). Essential for fast processor ALUs.
What are generate and propagate signals?
Gi=A·B: carry is generated here. Pi=A⊕B: carry-in is passed through. These let all carries be computed in 2 gate levels rather than waiting for each bit sequentially.
Why don't processors use pure CLA for 64-bit?
Pure CLA area grows as O(N²) for wide operands. Modern CPUs use Kogge-Stone or Brent-Kung parallel prefix adders — O(log N) delay with O(N log N) area — a better tradeoff at 64-bit widths.