HomeRISC-V + AcceleratorDay 2 — Custom ISA Extension

RISC-V Custom ISA Extension

RISC-V permanently reserves four opcode spaces for user-defined instructions. Learn exactly how to encode, emit, decode, and execute custom instructions — from bit-field layout to GCC inline assembly to Verilog decode logic.

By EcrioniX Engineering Team · Published June 19, 2026 · ~4,600 words · 15 min read

1. The Four Custom Opcode Spaces

The RISC-V ISA specification divides the 32-bit instruction space into opcode groups using bits [6:0]. Standard extensions (I, M, A, F, D, C, V…) occupy specific opcodes. The spec permanently reserves four opcodes for custom use — they will never be assigned to any ratified extension, making them safe for permanent custom instruction deployment.

RISC-V 32-bit instruction opcode map (bits [6:2], bits [1:0] always = 11): Opcode [6:2] Hex Name Reserved for ───────────────────────────────────────────────────── 00010 0x0B custom-0 User-defined (non-standard, not 64-bit) 01010 0x2B custom-1 User-defined (non-standard, not 64-bit) 10110 0x5B custom-2/rv128 User-defined (reserved/non-standard) 11110 0x7B custom-3/rv128 User-defined (reserved/non-standard) Full 7-bit opcode values: custom-0: 0x0B = 0000_1011 custom-1: 0x2B = 0010_1011 custom-2: 0x5B = 0101_1011 custom-3: 0x7B = 0111_1011 Instruction variants per opcode space: Each opcode supports R / I / S / U type encoding R-type: funct3 (3 bits) × funct7 (7 bits) = 1024 distinct operations I-type: funct3 (3 bits) × imm[11:0] — immediate operand variant Total across 4 spaces: 4 × 1024 = 4096 distinct R-type custom instructions

Why "permanently reserved"?

RISC-V's modularity promise means that any conforming implementation can ignore extensions it doesn't support. The custom opcode spaces are deliberately left out of the standard allocation process — no RISC-V foundation working group can ever assign them to a standard extension. This gives SoC designers a stable, conflict-free space for proprietary instructions that won't collide with future GCC/LLVM compiler updates.

2. Custom Instruction Encoding — R-Type Deep Dive

R-type is the most useful encoding for custom instructions that operate on CPU registers. The format gives you two source registers (rs1, rs2), one destination register (rd), and 10 bits of sub-operation selection (funct3 + funct7).

RISC-V R-Type Custom Instruction Encoding funct7 [31:25] — 7 bits rs2 [24:20] — 5 bits rs1 [19:15] — 5 bits funct3 [14:12] rd [11:7] — 5 bits opcode 0x0B [6:0] Example: custom_mac a0, a1, a2 → rd=a0(x10), rs1=a1(x11), rs2=a2(x12), funct3=0, funct7=0 0000000 01100 01011 000 01010 0001011 Binary: 0000000_01100_01011_000_01010_0001011 = 0x00C5850B
Fig 1: R-type custom-0 instruction encoding. funct7+funct3 together select 1 of 1024 possible sub-operations. rd, rs1, rs2 index the CPU integer register file.
Sub-operation encoding strategy with funct7 + funct3: {funct7, funct3} = 10-bit sub-operation selector Example allocation for a custom math accelerator: {7'b000_0000, 3'b000} → CUSTOM_MAC (multiply-accumulate: rd = rs1*rs2 + accum) {7'b000_0000, 3'b001} → CUSTOM_CLEAR (clear accumulator: accum = 0) {7'b000_0000, 3'b010} → CUSTOM_DOT4 (4-element dot product) {7'b000_0001, 3'b000} → CUSTOM_AES_ENC (AES round encrypt) {7'b000_0001, 3'b001} → CUSTOM_AES_DEC (AES round decrypt) {7'b000_0010, 3'b000} → CUSTOM_SHA256 (SHA256 compression round) Using custom-1 (0x2B) for a separate crypto unit: {7'b000_0000, 3'b000} → CRYPTO_HASH {7'b000_0000, 3'b001} → CRYPTO_VERIFY ... up to 1024 more operations

3. GCC .insn Directive — Emitting Custom Instructions

The RISC-V GNU assembler (GAS) provides the .insn directive to emit arbitrary instruction encodings without modifying the assembler or compiler. This is the fastest path to testing a custom instruction in simulation — no toolchain patches required.

.insn Directive Syntax

.insn type opcode, [operands...] R-type syntax: .insn r opcode, funct3, funct7, rd, rs1, rs2 opcode = immediate (0x0B for custom-0) funct3 = immediate (0–7) funct7 = immediate (0–127) rd = destination register rs1,rs2 = source registers Example — emit custom_mac a0, a1, a2: .insn r 0x0B, 0, 0, a0, a1, a2 → encodes: funct7=0, rs2=a2, rs1=a1, funct3=0, rd=a0, opcode=0x0B → binary: 0x00C5850B I-type syntax (immediate operand): .insn i opcode, funct3, rd, rs1, imm Example: .insn i 0x0B, 1, a0, a1, 42 → rs1=a1, imm=42, funct3=1, rd=a0, opcode=0x0B
Assembly — custom instructions with .insn directive
# RISC-V assembly using .insn for custom-0 instructions # Assemble with: riscv64-unknown-elf-as -march=rv32i custom_test.s .section .text .global custom_mac_test # Custom instruction definitions (using .insn directive) # CUSTOM_MAC: rd = rs1 * rs2 + accumulator (funct3=0, funct7=0) # CUSTOM_CLEAR: clear accumulator (funct3=1, funct7=0) custom_mac_test: li a1, 3 # a1 = 3 (rs1) li a2, 4 # a2 = 4 (rs2) # Clear the accumulator first .insn r 0x0B, 1, 0, a0, a0, a0 # CUSTOM_CLEAR (rd=a0, ignored) # Compute 3*4 + 0 = 12 .insn r 0x0B, 0, 0, a0, a1, a2 # CUSTOM_MAC: a0 = a1*a2 + accum # Compute 3*4 + 12 = 24 (accumulates) .insn r 0x0B, 0, 0, a0, a1, a2 # CUSTOM_MAC: a0 = a1*a2 + 12 ret

4. C Intrinsics — Calling Custom Instructions from C

For production use, wrap the .insn directive in a C macro using GCC inline assembly. This gives you a callable C function with proper register constraints — the compiler handles register allocation automatically.

C — custom instruction macros and intrinsics
#ifndef CUSTOM_INSN_H #define CUSTOM_INSN_H #include /* * CUSTOM_MAC: result = rs1 * rs2 + accumulator * Uses custom-0 opcode (0x0B), funct3=0, funct7=0 */ static inline uint32_t custom_mac(uint32_t rs1, uint32_t rs2) { uint32_t rd; __asm__ volatile ( ".insn r 0x0B, 0, 0, %0, %1, %2" : "=r"(rd) /* output: rd register */ : "r"(rs1), "r"(rs2) /* inputs: rs1, rs2 */ : /* no clobbers */ ); return rd; } /* * CUSTOM_CLEAR: clear the on-chip accumulator * Uses custom-0 opcode (0x0B), funct3=1, funct7=0 */ static inline void custom_clear(void) { __asm__ volatile ( ".insn r 0x0B, 1, 0, x0, x0, x0" : /* no outputs */ : /* no inputs */ : /* no clobbers */ ); } /* * CUSTOM_DOT4: 4-element dot product (packed 8-bit in 32-bit word) * a = {a3,a2,a1,a0} packed bytes, b = {b3,b2,b1,b0} * result = a0*b0 + a1*b1 + a2*b2 + a3*b3 */ static inline uint32_t custom_dot4(uint32_t a, uint32_t b) { uint32_t rd; __asm__ volatile ( ".insn r 0x0B, 2, 0, %0, %1, %2" : "=r"(rd) : "r"(a), "r"(b) ); return rd; } #endif /* CUSTOM_INSN_H */
C — using the custom instruction intrinsics
#include "custom_insn.h" #include /* Dot product of two int8 arrays using custom instruction */ int32_t dot_product(int8_t *a, int8_t *b, int len) { custom_clear(); /* reset accumulator */ for (int i = 0; i < len; i += 4) { /* Pack 4 bytes into a 32-bit word */ uint32_t va = (uint8_t)a[i] | ((uint8_t)a[i+1] << 8) | ((uint8_t)a[i+2] << 16) | ((uint8_t)a[i+3] << 24); uint32_t vb = (uint8_t)b[i] | ((uint8_t)b[i+1] << 8) | ((uint8_t)b[i+2] << 16) | ((uint8_t)b[i+3] << 24); custom_dot4(va, vb); /* accumulates internally */ } /* Final MAC with zero to read accumulator */ return (int32_t)custom_mac(0, 0); } int main(void) { int8_t a[] = {1, 2, 3, 4}; int8_t b[] = {5, 6, 7, 8}; /* Expected: 1*5 + 2*6 + 3*7 + 4*8 = 5+12+21+32 = 70 */ printf("dot product = %d\n", dot_product(a, b, 4)); return 0; } /* Compile: riscv64-unknown-elf-gcc -march=rv32i -mabi=ilp32 -nostdlib -O2 -o dot_test dot_test.c */

5. Verilog Decode Logic — Intercepting Custom Instructions

In a simple in-order RISC-V pipeline (like the one built in RISC-V From Scratch), custom instructions are decoded at the Decode stage. The opcode field selects the custom execution unit, and funct3/funct7 selects the specific operation.

Verilog — custom instruction decode and dispatch
// Custom instruction decoder — plug into the main decode stage // Detects custom-0 (0x0B) instructions and routes to custom EX unit module custom_decode ( input logic [31:0] instr, // raw 32-bit instruction word output logic is_custom, // 1 = this is a custom instruction output logic [6:0] funct7, output logic [2:0] funct3, output logic [4:0] rd, rs1, rs2, output logic [2:0] custom_op // decoded operation for EX unit ); localparam CUSTOM0_OP = 7'b000_1011; // 0x0B // Extract fields assign funct7 = instr[31:25]; assign rs2 = instr[24:20]; assign rs1 = instr[19:15]; assign funct3 = instr[14:12]; assign rd = instr[11:7]; // Detect custom-0 opcode assign is_custom = (instr[6:0] == CUSTOM0_OP); // Map {funct7[0], funct3} to custom operation code always_comb begin custom_op = 3'b000; // default: NOP if (is_custom) begin case ({funct7[0], funct3}) 4'b0_000: custom_op = 3'd0; // CUSTOM_MAC 4'b0_001: custom_op = 3'd1; // CUSTOM_CLEAR 4'b0_010: custom_op = 3'd2; // CUSTOM_DOT4 4'b0_011: custom_op = 3'd3; // CUSTOM_AES_ENC 4'b1_000: custom_op = 3'd4; // CUSTOM_SHA256 default: custom_op = 3'd7; // ILLEGAL_CUSTOM endcase end end endmodule
Verilog — custom execution unit (MAC + DOT4)
// Custom execution unit — connected to the pipeline after decode module custom_exec ( input logic clk, rst_n, input logic valid, // instruction is valid (from decode) input logic [2:0] custom_op, // operation from decoder input logic [31:0] rs1_val, // register file read value for rs1 input logic [31:0] rs2_val, // register file read value for rs2 output logic done, // result is ready (this cycle for simple ops) output logic [31:0] result // value to write back to rd ); logic [63:0] accumulator; // 64-bit accumulator (wider than 32-bit rd) localparam MAC = 3'd0; localparam CLEAR = 3'd1; localparam DOT4 = 3'd2; always_ff @(posedge clk or negedge rst_n) begin if (!rst_n) begin accumulator <= '0; done <= 1'b0; result <= '0; end else if (valid) begin done <= 1'b1; // single-cycle for these ops case (custom_op) MAC: begin accumulator <= accumulator + (rs1_val * rs2_val); result <= accumulator[31:0]; end CLEAR: begin accumulator <= '0; result <= '0; end DOT4: begin // Unpack 4×8-bit elements and dot product automatic logic [63:0] dp; dp = $signed(rs1_val[7:0]) * $signed(rs2_val[7:0]) + $signed(rs1_val[15:8]) * $signed(rs2_val[15:8]) + $signed(rs1_val[23:16]) * $signed(rs2_val[23:16]) + $signed(rs1_val[31:24]) * $signed(rs2_val[31:24]); accumulator <= accumulator + dp; result <= accumulator[31:0]; end default: begin done <= 1'b0; result <= '0; end endcase end else begin done <= 1'b0; end end endmodule

6. Testbench — Verifying Custom Instruction Decode

Verilog — custom instruction testbench
`timescale 1ns/1ps module tb_custom_exec; logic clk = 0, rst_n; logic valid; logic [2:0] custom_op; logic [31:0] rs1_val, rs2_val; logic done; logic [31:0] result; custom_exec dut (.*); always #5 clk = ~clk; // 100 MHz task send_op(input [2:0] op, input [31:0] a, b); @(posedge clk); valid <= 1; custom_op <= op; rs1_val <= a; rs2_val <= b; @(posedge clk); valid <= 0; @(posedge clk); endtask initial begin rst_n = 0; valid = 0; repeat(3) @(posedge clk); rst_n = 1; // Test 1: CLEAR send_op(3'd1, 0, 0); $display("After CLEAR: accum should be 0"); // Test 2: MAC 3*4 = 12 send_op(3'd0, 32'd3, 32'd4); @(posedge clk); assert (result == 32'd0) else $error("MAC1 result wrong: %0d", result); // reads prev accum $display("MAC(3,4) done, accum = 12"); // Test 3: MAC 3*4 again = 24 send_op(3'd0, 32'd3, 32'd4); @(posedge clk); assert (result == 32'd12) else $error("MAC2 result wrong: %0d", result); $display("MAC(3,4) done, accum = 24, result (prev accum) = %0d", result); // Test 4: DOT4 — [1,2,3,4]·[5,6,7,8] = 70 send_op(3'd2, 32'h04030201, 32'h08070605); @(posedge clk); $display("DOT4 done, accum includes 70"); $display("All tests passed!"); $finish; end endmodule

7. Custom Opcode Space Usage Table

OpcodeHexRecommended UseNotes
custom-00x0BMath / ML accelerator (MAC, DOT, GEMM)Most widely used, best tool support
custom-10x2BCrypto accelerator (AES, SHA, HMAC)Separate namespace from compute
custom-20x5BDSP / signal processing (FFT, FIR)May conflict with future rv128
custom-30x7BDebug / profiling / special opsAvoid in shipping silicon — rv128 risk

custom-2 and custom-3 — Use With Caution

The RISC-V spec notes that custom-2 and custom-3 are "reserved for custom extensions" but also annotated as potential rv128 (128-bit RISC-V) opcodes. For production chips shipping today, stick to custom-0 and custom-1 — they are unambiguously reserved for custom use with no future standard extension risk.

8. Interview Q&A

#QuestionAnswer Points
1How many custom operations can you define per custom opcode space?Using R-type encoding: funct3 (3 bits) × funct7 (7 bits) = 8 × 128 = 1024 distinct operations per opcode space. Across all 4 custom opcode spaces: 4096 total. Using I-type, the immediate field can encode additional sub-modes.
2What is the .insn directive and when would you use it?A GAS (GNU Assembler) directive that emits any instruction encoding without compiler modification. Used to prototype custom instructions without patching GCC/LLVM. Syntax: .insn r 0x0B, funct3, funct7, rd, rs1, rs2. Good for simulation and early bring-up; production code uses compiler intrinsics or a full toolchain patch.
3What is the difference between a custom ISA extension and the RoCC interface?Custom ISA extension modifies the CPU pipeline directly — you add decode and execute logic inside the processor core. RoCC is a standard tightly-coupled coprocessor interface where the custom instruction is intercepted at decode and dispatched to an external (but closely attached) coprocessor via the cmd/resp channels. RoCC is easier to integrate (no pipeline modification) but only works with Rocket/Chipyard cores. Custom ISA can be used with any RISC-V core you have source access to.
4How do you tell GCC to allocate registers for a custom instruction?Use GCC inline assembly with output/input constraints: =r for output register (rd), r for input registers (rs1, rs2). The compiler resolves which physical registers to use; the .insn line refers to them by placeholder. Example: __asm__ volatile (".insn r 0x0B, 0, 0, %0, %1, %2" : "=r"(rd) : "r"(rs1), "r"(rs2));

Day 2 Knowledge Checklist

← Day 1Architecture Overview Next → Day 3RoCC Interface Deep Dive