HomeARM CourseDay 13
DAY 13 · THE INSTRUCTION SET

Multiply, Divide & Saturating Arithmetic

By EcrioniX · Updated Jun 6, 2026

Add and subtract are cheap; multiply is bigger, and divide is so costly that early ARM cores left it out entirely. Today: how ARM multiplies, why divide is special, and the saturating math that keeps DSP from glitching.

1. Multiply instructions

InstrComputesNotes
MUL Rd,Rn,RmRd = Rn × Rm (low 32 bits)basic 32×32→32
MLA Rd,Rn,Rm,RaRd = Rn × Rm + Ramultiply-accumulate
UMULL RdLo,RdHi,Rn,Rm64-bit = Rn × Rmunsigned long (full result)
SMULL RdLo,RdHi,Rn,Rm64-bit = Rn × Rmsigned long
UMLAL / SMLAL64-bit += Rn × Rmlong multiply-accumulate
MUL r0, r1, r2 ; r0 = r1 * r2 (low 32 bits) MLA r0, r1, r2, r3 ; r0 = r1*r2 + r3 (great for dot products) UMULL r0, r1, r2, r3 ; {r1:r0} = r2 * r3 (full 64-bit)

Use MUL for a plain product, MLA for "multiply then add" (the heart of filters and dot products), and the long forms when 32×32 can overflow 32 bits and you need the full 64-bit answer.

💡 Often you don't need MUL at all

For a constant multiplier, the barrel shifter (Day 9) is faster: r1 × 5 is just ADD r0, r1, r1, LSL #2. Compilers prefer shift-and-add for small constants and save MUL for variable multipliers.

2. Division — the expensive one

Division is hard to build in hardware (it's inherently iterative), so to stay small and fast many early ARM cores had no divide instruction at all. Software handled it:

Modern cores do include hardware divide:

SDIV r0, r1, r2 ; signed r0 = r1 / r2 UDIV r0, r1, r2 ; unsigned r0 = r1 / r2

SDIV/UDIV are available on Cortex-M3 and later M-profile cores, and on modern A-profile / A64. Note there's no remainder instruction — you compute it as a − (a/b)×b with a divide, multiply and subtract.

3. Saturating arithmetic

Normal arithmetic wraps on overflow: add 1 to the largest value and it flips to the smallest (a huge, ugly jump). In signal processing that wraparound is catastrophic — a loud click, a corrupted sample. Saturating arithmetic instead clamps to the max or min:

normal: 0x7FFFFFFF + 1 = 0x80000000 (wraps to most-negative ❌)
saturate: 0x7FFFFFFF + 1 = 0x7FFFFFFF (clamps to max ✓)

ARM's DSP instructions do this in hardware:

InstrDoes
QADD Rd,Rn,Rmsaturating add
QSUB Rd,Rn,Rmsaturating subtract
SSAT / USATclamp a value to a given bit-width

These are central to fixed-point DSP on cores like the Cortex-M4/M7 (which add a DSP extension and SIMD). Audio, control loops and filters rely on saturation to degrade gracefully instead of glitching.

✅ The mental model

MUL/MLA for products (and dot-products); UMULL/SMULL when you need all 64 bits. Divide is costly — avoid it (shift/reciprocal) or use SDIV/UDIV on newer cores. Saturating math (QADD/QSUB/SSAT) clamps instead of wrapping — essential for DSP.

🎯 Day 13 takeaways

Quick check

  1. Which instruction computes a×b + c in one step?
  2. Why did early ARM cores omit a divide instruction?
  3. What does a saturating add do at the maximum value instead of wrapping?

FAQ

What are the ARM multiply instructions?

MUL (low 32), MLA (multiply-accumulate), UMULL/SMULL (full 64-bit unsigned/signed), and UMLAL/SMLAL (64-bit accumulate).

Does ARM have divide?

Not on early cores — software handled it. SDIV/UDIV exist on Cortex-M3+ and modern A-profile/A64. There's no remainder instruction.

What is saturating arithmetic?

Math that clamps to max/min on overflow instead of wrapping (QADD/QSUB/SSAT) — essential for DSP and fixed-point.

Previous
← Day 12: Branches & loops

← Back to the full course roadmap