Welcome to Phase 4. Everything you learned in Days 1–25 was mostly the 32-bit world. Now we step into AArch64 — the 64-bit architecture that powers modern phones, Apple Silicon Macs, AWS Graviton servers and the fastest ARM chips on Earth. The good news: it's actually cleaner and simpler than 32-bit ARM. Let's rebuild the programmer's model for 64 bits.
ARMv8-A introduced a fundamental split. A core can run in one of two execution states:
This is why a single modern chip can still run old 32-bit apps while delivering full 64-bit performance to new ones. Crucially, the state can only change on an exception boundary — you don't flip between 64-bit and 32-bit mid-function the way you interworked ARM and Thumb (Day 16). From here on we focus entirely on AArch64, the state all new software targets.
Here's the headline change. AArch64 gives you 31 general-purpose registers, each 64 bits wide, named X0 through X30. That's nearly double the 16 you had in AArch32 — and more registers means fewer trips to memory, which is a direct performance win.
Every X register has a 32-bit view called W0–W30 that accesses its lower 32 bits. Writing to a W register zeroes the upper 32 bits of the corresponding X register — a deliberate rule that avoids partial-register stalls.
| Name | Width | Meaning |
|---|---|---|
| X0–X30 | 64-bit | general-purpose registers |
| W0–W30 | 32-bit | lower-half views of X0–X30 |
| XZR / WZR | 64/32 | the zero register (reads 0, writes discarded) |
| SP | 64-bit | the stack pointer (must stay 16-byte aligned) |
| PC | 64-bit | program counter — not a general-purpose register |
You may have noticed the registers stop at 30, not 31. Encoding slot 31 is special — depending on the instruction it means either the zero register (XZR/WZR) or the stack pointer (SP).
In AArch32, the program counter was r15 — a general-purpose register you could read and even write to jump. Powerful, but a security and predictability nightmare. AArch64 removes the PC from the general register file. You can no longer do arithmetic on it or load into it directly; control flow happens only through proper branch instructions. This single change kills a whole class of exploits and makes the pipeline easier to build.
The link register is now X30: a BL (branch with link) saves the return address there, and you return with RET (which defaults to X30). Compare this to the AArch32 BX LR from Day 15.
The old AArch32 modes (User, IRQ, FIQ, Supervisor…) from Day 5 are gone. AArch64 replaces them with a clean ladder of four Exception Levels:
| Level | Runs | Privilege |
|---|---|---|
| EL0 | applications (user space) | least |
| EL1 | OS kernel (Linux, Android) | ↑ |
| EL2 | hypervisor (virtualization) | ↑↑ |
| EL3 | secure monitor (TrustZone, Day 23) | most |
Each level (except EL0) has its own banked SP and its own system registers (with the _EL1, _EL2 suffixes you saw in Day 24). Exceptions move you up a level; the ERET instruction returns you down. This maps perfectly onto modern software: app → kernel → hypervisor → secure firmware.
The single CPSR register from Day 3 is replaced by PSTATE — processor state held as a set of independently accessible fields rather than one packed word. The familiar condition flags live here:
On taking an exception, PSTATE is saved into SPSR_ELx and the return address into ELR_ELx — the 64-bit equivalent of the banked SPSR/LR mechanism from Day 17.
A64 keeps the RISC spirit but tidies up the quirks:
CSEL, CSET and CSINC that avoid branches without bloating the encoding.ADRP) for the 64-bit address space.The 64-bit procedure call standard (cf. Day 15's AAPCS) takes full advantage of the larger register file:
| Registers | Role | Preserved by |
|---|---|---|
| X0–X7 | arguments 1–8; X0 = return value | caller-saved |
| X8 | indirect result location | caller-saved |
| X9–X15 | scratch / temporaries | caller-saved |
| X16–X17 | intra-procedure-call (IP0/IP1) | caller-saved |
| X18 | platform register (reserved) | platform |
| X19–X28 | local variables | callee-saved |
| X29 | frame pointer (FP) | callee-saved |
| X30 | link register (LR) | special |
With eight argument registers instead of four, most function calls pass everything in registers and never touch the stack — a real speed advantage over AArch32.
AArch64 also opened the door to the Scalable Vector Extension (SVE/SVE2) — a vector instruction set whose register width is not fixed in the program (it can be 128 to 2048 bits depending on the chip), so the same binary scales across implementations. It's huge for HPC and machine learning. We'll meet ARM's mainstream vector engine, NEON, next lesson; SVE is its supercomputer-class cousin.
People assume 64-bit just means handling larger integers and more than 4 GB of RAM. True — but the bigger wins here are architectural: nearly 2× the registers, a cleaner instruction set, branchless conditional selects, 8 argument registers, and a simpler privilege model. That's why AArch64 code is often faster and easier for compilers to optimise.
AArch64 is the 64-bit state of ARMv8-A: 31 registers X0–X30 (with W views), a magic register-31 that's either XZR or SP, a PC that's no longer a GPR, four exception levels EL0–EL3 instead of modes, PSTATE instead of CPSR, the fixed-width A64 instruction set with LDP/STP and CSEL, and the AAPCS64 convention passing 8 args in registers. Cleaner, wider, faster.
The 64-bit execution state of ARMv8-A, running the A64 instruction set with 31 general-purpose registers and exception levels EL0–EL3.
X0–X30 are the full 64-bit registers; W0–W30 are their lower 32-bit views, and writing a W zeroes the upper 32 bits.
PSTATE holds the condition flags and masks; four exception levels (EL0–EL3) replace the old modes.
AArch32 is 32-bit compatibility (16 regs, modes, conditional execution); AArch64 is 64-bit with 31 regs, exception levels, fixed A64 instructions and better performance.