HomeFPGA from ScratchDay 3
DAY 3 · FPGA FOUNDATIONS

Hard Blocks — Block RAM, DSP Slices & Clocking

By EcrioniX · Updated Jun 6, 2026

Day 2 showed the flexible "soft" fabric of LUTs and flip-flops. But building everything from LUTs would be wasteful — imagine constructing a megabyte of memory or a fast multiplier out of tiny truth tables. So FPGA vendors sprinkle hard blocks — purpose-built silicon for the things every design needs — across the fabric. Knowing them is the difference between a slow design and a fast one.

1. Why hard blocks exist

The LUT fabric is wonderfully general, but generality has a cost. Common functions like large memories, multipliers and clock management are needed in almost every design, and implementing them in soft logic would burn enormous resources and run slowly. The solution: build those functions once as optimised dedicated silicon ("hard" blocks) and scatter them through the chip, ready to use. You get ASIC-like efficiency for the common cases while keeping the fabric free for your custom logic.

A modern FPGA floorplan: soft fabric + columns of hard blocks I/O blocks & transceivers (edges) CLBs (LUT+FF) Block RAM DSP slices clock network + PLL/MMCM
Figure — Hard blocks (Block RAM, DSP) sit in dedicated columns within the soft CLB fabric, with global clocking and I/O around it.

2. Block RAM (BRAM) — on-chip memory

Almost every design needs memory: a video frame buffer, a FIFO between clock domains, a packet buffer, a coefficient table. Block RAM is dedicated memory silicon — typically tens of kilobits per block (e.g. 18 Kb or 36 Kb), with many blocks across the chip that you can combine into larger or wider memories.

Key BRAM features:

BRAM vs distributed RAM

There's a second kind of memory: distributed RAM, which repurposes the LUTs from Day 2 as tiny memories. The tools pick automatically:

Block RAMDistributed RAM (LUT)
Built fromdedicated memory blockslogic-cell LUTs
Best forlarger, deeper memoriessmall, shallow memories near logic
Costuses a BRAM blockconsumes LUTs you might need for logic

Rule of thumb: small register files → distributed RAM; buffers and FIFOs → Block RAM (we'll build one in Day 15).

3. DSP slices — fast multiply-accumulate

Multiplication is the bane of soft logic — a single 18×18 multiplier built from LUTs eats hundreds of cells and runs slowly. Yet multiply-accumulate (MAC) is the core of filters, FFTs, image processing and neural networks. So FPGAs include DSP slices: hard blocks built around a fast hardware multiplier plus a pre-adder, an adder/accumulator, and pipeline registers.

// A FIR filter tap — exactly what a DSP slice does in one block: acc = acc + (sample * coeff); // multiply-accumulate (MAC) // Hundreds of DSP slices running in parallel = huge DSP/AI throughput.

A modern FPGA can have thousands of DSP slices, and their parallel MAC throughput is exactly why FPGAs are used for AI inference and signal processing. When you write a*b+c in HDL, synthesis maps it straight onto a DSP slice — no LUTs wasted. (This is the same MAC idea behind the systolic array.)

4. Clocking — the most underrated resource

A synchronous design's clock must reach thousands of flip-flops at almost the same instant. If it arrived at wildly different times (high skew), flip-flops would sample at the wrong moments and the design would fail. Ordinary routing can't deliver a clock cleanly, so FPGAs have dedicated clocking hardware:

💡 The conductor and the orchestra

The clock network is like a conductor whose beat must reach every musician (flip-flop) at once. You can't relay the beat person-to-person (ordinary routing) — it'd drift hopelessly. Instead there's a dedicated visual line of sight to all players (the global clock tree), and a PLL is the metronome that sets, multiplies and steadies the tempo.

5. I/O blocks & gigabit transceivers

At the edges sit I/O blocks (IOBs) — configurable pin drivers/receivers supporting dozens of electrical standards (LVCMOS, LVDS, SSTL…), with built-in registers and delay elements for precise timing. High-end FPGAs also include hard gigabit transceivers (SerDes) — the very blocks from our SerDes Lab — for multi-gigabit links like PCIe and Ethernet, far faster than the general fabric could drive.

6. SoC FPGAs — a CPU on the same die

The ultimate hard block is an entire processor. SoC FPGAs (e.g. AMD Zynq, Intel SoC FPGAs) put hard Arm Cortex-A cores right next to the FPGA fabric on one chip. The CPU runs software (even Linux — recall the MMU/Linux discussion from the ARM course) while the fabric implements custom accelerators, and the two talk over an on-chip AXI bus. It's the best of both worlds: software flexibility plus hardware acceleration.

✅ The mental model

Beyond the soft LUT/FF fabric, an FPGA includes hard blocks for what every design needs: Block RAM (efficient on-chip dual-port memory), DSP slices (fast multiply-accumulate for signal/AI math), dedicated clocking (global low-skew networks + PLL/MMCM), flexible I/O and gigabit transceivers, and on SoC FPGAs a hard CPU. Using the right hard block instead of soft logic is how you get speed and capacity.

🎯 Day 3 takeaways

Quick check

  1. Why not build a large memory or a multiplier from LUTs?
  2. What makes Block RAM ideal for a FIFO between two clock domains?
  3. What can a PLL/MMCM do that ordinary routing cannot?
  4. What does an SoC FPGA combine, and how do the parts communicate?

FAQ

What is Block RAM?

Dedicated on-chip memory hard blocks (tens of Kb each), dual-port and combinable, used for buffers, FIFOs and tables — far more efficient than LUT-based memory.

What is a DSP slice?

A hard block with a fast multiplier plus adder/accumulator for multiply-accumulate, giving FPGAs high DSP and AI throughput.

Why special clocking?

Clocks need ultra-low skew to thousands of flip-flops; dedicated global networks, BUFG buffers and PLLs/MMCMs provide and shape them.

What is an SoC FPGA?

A chip combining hard Arm CPU cores with FPGA fabric (e.g. Zynq), running software and custom hardware together over an AXI bus.

Previous
← Day 2: Inside an FPGA (LUTs & CLBs)

← Back to the full course roadmap