HomeRISC-V from ScratchDay 3
DAY 3 · THE RISC-V ISA

Instruction Formats — R, I, S, B, U, J

By EcrioniX · Updated Jun 7, 2026

Every RISC-V instruction is exactly 32 bits — a single row of ones and zeros. But how does the CPU know that one row means "add" and another means "jump"? The answer is the instruction format: a fixed layout that says which bits mean what. Today we'll decode all six formats in plain English — this is the bridge from "assembly" to "what the hardware actually sees."

1. An instruction is a filled-in form

From Day 2, the CPU works with registers and a few operations. To tell it what to do, each instruction packs all the needed info into 32 bits. Think of it like a pre-printed form with labeled boxes:

💡 The form analogy

Imagine a tiny form with boxes: "Operation", "Put result in", "Input A", "Input B". To say "add register 5 and register 6, put it in register 7," you fill: Operation = add, Result = x7, A = x5, B = x6. An instruction format is just which boxes the form has — and a 32-bit instruction is the filled-in form, written as bits.

2. The fields (the boxes on the form)

RISC-V instructions are built from a small set of fields. Learn these seven and you can read almost any instruction:

FieldBitsWhat it holds (plain English)
opcode7The broad kind of instruction (is it arithmetic? a load? a branch?).
rd5Destination register — where the result goes (5 bits = one of 32 registers).
rs15Source register 1 — first input.
rs25Source register 2 — second input.
funct33Picks the exact operation within the opcode's family (e.g. add vs. sub vs. XOR).
funct77Extra bits to distinguish operations that share a funct3 (e.g. add vs. sub).
immvariesImmediate — a constant value baked into the instruction (like the "1" in "add 1").

Not every instruction needs every box. An add needs three registers but no constant; addi needs a register and a constant; a jump needs a big offset. That's exactly why there are several formats — each is a different combination of these boxes.

3. The six formats at a glance

RV32I defines six layouts of the same 32 bits. Here's the whole map — notice how rs1, rs2 and rd stay in the same place wherever they appear (that's deliberate — §5):

The 32 bits, six ways (bit 31 → bit 0) R funct7 rs2 rs1 f3 rd opcode add, sub, and… I imm[11:0] rs1 f3 rd opcode addi, lw, jalr S imm[11:5] rs2 rs1 f3 imm[4:0] opcode sw, sh, sb B imm(hi) rs2 rs1 f3 imm(lo) opcode beq, bne, blt U imm[31:12] (upper 20 bits) rd opcode lui, auipc J imm (20-bit jump offset) rd opcode jal bit 31 bit 0 opcode (red) · rd (green) · rs1/rs2 (blue) · funct (purple) · immediate (amber)
Figure — All six RV32I formats. The opcode (bits 6:0) is always last; rs1/rs2/rd never move.

4. What each format is for

5. The clever part: why the boxes don't move

Look again at the diagram: rs1, rs2 and rd are in the exact same bit positions in every format that uses them, and the opcode is always bits 6:0. This is RISC-V being smart on purpose:

The price: the immediate gets chopped up and scattered (especially in S, B, U, J formats) so the register fields can stay put. It looks messy, but the sign bit is always bit 31, so the hardware can sign-extend quickly. Don't memorize the exact immediate bit-shuffling — tools and our decoder handle it; just know why it's split.

6. Worked example: encoding "add x7, x5, x6"

Let's turn one line of assembly into the actual 32 bits the CPU sees. add is R-type. Copy or download the worksheet:

encoding-add.txt — R-type worked example
Instruction:  add x7, x5, x6      (x7 = x5 + x6)   --> R-type

Fill the R-type boxes:
  funct7 = 0000000   (add; for sub it would be 0100000)
  rs2    = x6 = 6  = 00110
  rs1    = x5 = 5  = 00101
  funct3 = 000       (add/sub family)
  rd     = x7 = 7  = 00111
  opcode = 0110011   (OP - register-register arithmetic)

Lay them out, bit 31 -> bit 0:
  0000000 | 00110 | 00101 | 000 | 00111 | 0110011
  funct7    rs2     rs1    f3    rd      opcode

As one 32-bit word:
  0000 0000 0110 0010 1000 0011 1011 0011
  = 0x006283B3

That single number IS the instruction. Memory hands the CPU 0x006283B3,
the decoder slices out the fields, and the ALU computes x5 + x6 into x7.

That hex value, 0x006283B3, is exactly what sits in instruction memory and what the CPU fetches. On Day 11 we'll build the decoder — the hardware that chops a 32-bit word back into opcode/rd/rs1/rs2/imm so the rest of the CPU can act on it.

✅ Day 3 in one line

A RISC-V instruction is 32 bits arranged into fields — opcode, rd, rs1, rs2, funct3/7 and an immediate. RV32I has six formats (R, I, S, B, U, J), each a different mix of those fields for different jobs. The register fields never move (for speed), so the immediate is split instead. Assembly like add x7,x5,x6 becomes one number (0x006283B3) the CPU decodes and executes.

🎯 Day 3 takeaways

Quick check

  1. Which format does add use, and which does addi use? Why are they different?
  2. What do funct3 and funct7 decide?
  3. Why does RISC-V keep rs1/rs2/rd in the same bit positions across formats?
  4. Which formats carry a destination register rd, and which don't?

FAQ

What is an instruction format?

The fixed layout of bits inside a 32-bit instruction — which bits are the opcode, registers, and immediate. RV32I has six: R/I/S/B/U/J.

What are the fields?

opcode (kind), rd (destination), rs1/rs2 (sources), funct3/funct7 (exact operation), and imm (a constant). Not every format uses all of them.

Why six formats?

Different instructions need different info — three registers, a constant, an address, or a jump offset — so each format is a different mix of fields.

Why is the immediate split up?

So the register fields can stay in fixed positions (which speeds up decoding); the sign bit stays at bit 31 for fast sign-extension.

Previous
← Day 2: Registers & RV32I

← Back to the full roadmap  ·  Open the Verilog simulator →