Reconfigurable Hardware

What is an FPGA?

A Field-Programmable Gate Array is a chip you program after manufacturing — load a bitstream and it becomes any digital circuit you design. This guide explains the architecture, how every block works, and when to use FPGA vs ASIC vs CPU.

LUTFlip-FlopBRAMDSP SliceCLBRouting FabricBitstreamVerilog

FPGA Architecture Overview

FPGA Internal Architecture CLB LUT+FF CLB LUT+FF CLB LUT+FF CLB LUT+FF CLB LUT+FF CLB LUT+FF CLB LUT+FF CLB LUT+FF CLB LUT+FF CLB LUT+FF CLB LUT+FF CLB LUT+FF Programmable Routing Fabric BRAM 36Kb BRAM 36Kb Block RAM DSP 18×18 DSP 18×18 DSP 18×18 DSP Slices I/O Banks I/O Ring CLB (LUT+FF) Block RAM DSP Slice I/O Bank

How an FPGA Works

An FPGA ships as a blank slate. You write RTL in Verilog or VHDL, synthesize it to a netlist, place and route on the FPGA's resource grid, then generate a bitstream — a binary file that programs every LUT, connection switch, and I/O standard. Power off and the configuration is lost (SRAM-based); power on again and the bitstream reloads from flash.

The Configurable Logic Block (CLB)

A CLB is the fundamental building block. In modern Xilinx/AMD devices a single CLB slice contains:

ResourceCount per SliceFunction
6-input LUT8Implements any 6-variable Boolean function (64 SRAM bits)
Flip-Flop16D-type register for sequential logic, clocked by global clock net
Carry Chain8-bitFast ripple-carry for adders and counters without LUT chaining
MUXVariousF7/F8 muxes to merge LUTs for wider functions (7-input, 8-input)
Distributed RAMoptionalLUTs configured as 64-bit single-port SRAM

Block RAM (BRAM)

Dedicated true dual-port SRAM columns embedded in the fabric. Each block is 36Kb (configurable as 18Kb+18Kb). Both ports can read and write independently on different clocks — perfect for async FIFOs, line buffers, and coefficient tables. Synthesizer automatically infers BRAMs when you declare a large array in RTL.

DSP Slices

Hard-wired multiplier-accumulator (MAC) units. A DSP48 slice in Xilinx 7-series contains an 18×18 signed multiplier feeding a 48-bit accumulator with pre-adder. Cascading DSP slices lets you build FIR filters, FFTs, and matrix multipliers running at 500+ MHz without using any LUTs.

Interactive: 3-Input LUT Explorer

A 3-input LUT is an 8-entry truth table stored in SRAM. Click any output cell to toggle it between 0 and 1. The LUT implements whatever function you program into it — the bitstream sets these 8 bits.

ABCOUT (click)
LUT INIT bits (binary, MSB=row7):
INIT hex (for Verilog attribute):
8'h00
Verilog inference:
assign out = 1'b0;
The INIT attribute tells the synthesizer exactly which 8 bits to program into this LUT's SRAM. One LUT → one bitstream fragment.

FPGA vs ASIC vs CPU — When to Use Which

FPGA

  • Reprogrammable any time
  • Parallel hardware execution
  • Medium NRE cost (tools only)
  • 100× area vs ASIC
  • 10–100× more power
  • 10× lower clock vs ASIC
✓ Prototyping, low volume, DSP

ASIC

  • Fixed function post tape-out
  • Maximum performance (GHz)
  • Very high NRE ($1M–$50M)
  • Smallest die area
  • Lowest power
  • Best for 1M+ units
✓ High-volume, power-critical

CPU / GPU

  • Software-defined, flexible
  • Sequential with SIMD
  • Zero NRE — buy off shelf
  • Lowest latency to market
  • Higher power than ASIC
  • General-purpose
✓ General compute, ML training
FeatureFPGAASICCPU
Programmable after fabYes (always)NoSoftware only
Typical clock speed200–500 MHz1–5 GHz3–5 GHz
Power efficiencyMediumBestMedium
Time to working HWHours to days6–18 monthsDays (buy + code)
Unit cost @ 1M units$20–$200$1–$10$50–$500
ParallelismMassive (HW)Massive (HW)Limited (cores)

Simple Verilog for FPGA

Any synthesizable Verilog maps to FPGA resources. The synthesizer decides how many LUTs, FFs, and BRAMs are needed.

// 4-bit counter — maps to 4 FFs + carry chain
module counter #(parameter N=4) (
  input  wire       clk, rst_n,
  output reg [N-1:0] count
);
  always @(posedge clk or negedge rst_n)
    if (!rst_n) count <= '0;
    else        count <= count + 1;
endmodule

// 8-bit adder — maps to LUTs + carry chain (no DSP needed)
module adder8 (
  input  wire [7:0] a, b,
  input  wire       cin,
  output wire [7:0] sum,
  output wire       cout
);
  assign {cout, sum} = a + b + cin;
endmodule

// 256-deep FIFO — synthesizer infers BRAM
module fifo256 #(parameter W=8) (
  input  wire        wr_clk, rd_clk, wr_en, rd_en,
  input  wire [W-1:0] din,
  output reg  [W-1:0] dout,
  output wire         full, empty
);
  reg [W-1:0] mem [0:255];
  reg [7:0] wr_ptr=0, rd_ptr=0;
  assign full  = (wr_ptr+1 == rd_ptr);
  assign empty = (wr_ptr == rd_ptr);
  always @(posedge wr_clk) if (wr_en && !full)  mem[wr_ptr++] <= din;
  always @(posedge rd_clk) if (rd_en && !empty) dout <= mem[rd_ptr++];
endmodule

FPGA Use Cases

🔬

ASIC Prototyping

Run real RTL on FPGA before $10M tape-out. Find functional bugs at full speed with real I/O.

📡

Software-Defined Radio

Implement modulation, demodulation, filtering in real-time. Change waveform without new hardware.

High-Frequency Trading

Sub-microsecond order execution. FPGA processes market data and places orders before CPU even wakes up.

🎮

Video Processing

4K encode/decode, frame synchronization, multi-channel processing — pixel pipelines in hardware.

🌐

Network Offload

Line-rate packet parsing, routing table lookup, encryption at 400Gbps — impossible in software.

🤖

ML Inference

Low-latency neural network inference with custom bit-width — between GPU (power) and ASIC (NRE).

Frequently Asked Questions

What does "field-programmable" mean?

"Field" means after leaving the factory — in the field, in your lab, on the production board. The chip arrives blank; you program it yourself by loading a bitstream. This is unlike gate arrays of the 1980s that required a mask revision to change the metal connections.

How many LUTs does a modern FPGA have?

Entry-level: Xilinx Spartan-7 has ~16K LUTs. Mid-range: Xilinx Artix-7 has ~215K LUTs. High-end: Xilinx Virtex UltraScale+ has over 1.7 million LUTs. Each "LUT" in the count is a 6-input LUT that can implement any 6-variable function.

Is Verilog or VHDL better for FPGA?

Both generate identical hardware — the synthesizer doesn't care. Verilog/SystemVerilog is dominant in industry (especially ASIC design, US). VHDL is common in Europe and defense/aerospace. For learning, pick Verilog — more online resources, shorter syntax, and directly transferable to ASIC work.

Can you run Linux on an FPGA?

Yes — you instantiate a soft-core CPU (like RISC-V or MicroBlaze) in the FPGA fabric, then boot Linux on it. Zynq and Zynq UltraScale+ SoC-FPGAs embed a real ARM Cortex-A processor alongside the FPGA fabric, giving you both worlds — a hard CPU running Linux plus custom FPGA hardware accelerators.