Everything so far has been integers. But graphics, physics, audio, scientific code and AI all live in the world of real numbers with fractions — and that needs floating point. Today we'll see how ARM does fractional math in hardware via the VFP/FPU, what IEEE 754 actually stores, and the subtle pitfalls (rounding, denormals, ABIs) that trip up real engineers.
You can do fractional math with integers (fixed-point), and tiny microcontrollers without an FPU must emulate floats in software — slow, hundreds of cycles per operation. A hardware FPU (Floating-Point Unit) executes a float add or multiply in just a few cycles. For any workload heavy in decimals — 3D transforms, DSP, neural nets — a hardware FPU is the difference between smooth and unusable.
On ARM this unit is historically called the VFP (Vector Floating Point). In modern AArch64 it's tightly integrated with NEON (Day 27) — they share the same V register file.
Floating point follows the IEEE 754 standard. A number is stored as three fields — a sign, an exponent, and a fraction (mantissa) — essentially scientific notation in binary: value = ±1.fraction × 2^exponent.
| Precision | Bits | Sign / Exp / Frac | ARM name |
|---|---|---|---|
| Half | 16 | 1 / 5 / 10 | H |
| Single (float) | 32 | 1 / 8 / 23 | S |
| Double (double) | 64 | 1 / 11 / 52 | D |
More exponent bits → larger range; more fraction bits → more precision. The standard also defines special values: ±0, ±∞ (infinity), and NaN (Not-a-Number, e.g. from 0/0). Half precision has surged in importance because machine-learning inference often uses 16-bit floats to halve memory and double throughput.
Because the fraction is finite binary, many decimals can't be stored exactly — classically 0.1 + 0.2 ≠ 0.3 exactly. This isn't a bug; it's the nature of IEEE 754. Never compare floats with ==; compare within a small tolerance (epsilon).
In AArch64 the FP and NEON units share 32 registers V0–V31. For scalar floating point you access them through width-specific views:
So S5, D5 and V5 are the same physical register viewed at different widths — exactly mirroring how X and W share GPRs in Day 26.
Note FMADD — a fused multiply-add computes a*b+c with a single rounding, which is both faster and more accurate than separate multiply then add. It's the backbone of dot products and matrix math.
Two special registers govern FP behaviour:
Since results rarely fit exactly, IEEE 754 defines how to round. The FPCR selects one:
Numbers extremely close to zero are represented as denormals. They preserve accuracy near zero but can be very slow on some hardware. Performance-critical audio/DSP code often sets flush-to-zero in FPCR, trading a sliver of accuracy for consistent speed.
This catches many embedded developers. There are two ways floating point reaches the hardware, and they're incompatible ABIs:
| Hard-float | Soft-float | |
|---|---|---|
| FP ops | real FPU instructions | emulated in software libraries |
| FP args passed in | FP registers (S/D) | integer registers |
| Speed | fast | slow |
| Needs FPU? | yes | no |
Every object you link must use the same float ABI — mixing a hard-float library with soft-float code produces subtle, maddening bugs or link errors. On chips with an FPU (all AArch64), hard-float is standard. On the smallest MCUs without an FPU, soft-float is the only option. A middle option, softfp, uses the FPU but passes args in integer registers for compatibility.
The VFP/FPU does fractional math in hardware using IEEE 754 (sign + exponent + fraction) at half (H), single (S) and double (D) precision, in registers that share NEON's V file. FPCR sets rounding and denormal behaviour; FPSR records exception flags. Match your hard-/soft-float ABI everywhere, never compare floats with ==, and prefer single precision + FMA for speed.
==.ARM's hardware floating-point unit; it executes IEEE 754 math directly and shares the V register file with NEON in AArch64.
The standard for representing floats as sign/exponent/fraction, with half, single and double precision plus rules for rounding, infinities and NaN.
Hard-float uses real FPU instructions and FP registers for arguments; soft-float emulates FP in software. They are incompatible ABIs.
Those decimals can't be stored exactly in finite binary, so compare floats within a tolerance, not with ==.