Welcome to the most hands-on course on the site. Over the coming lessons we won't just learn about a processor — we'll build one, line by line, until it runs real programs. And we'll do it with Claude writing and optimizing the Verilog at every step. Today is the map: what we're building, how, and the full roadmap.
By the end of this course you will have a working RISC-V CPU — a real RV32I processor, written in Verilog, that fetches, decodes and executes genuine RISC-V machine code. Then we go further and optimize it: turn the simple single-cycle design into a 5-stage pipelined core with forwarding and hazard handling, add CSRs and memory-mapped I/O, and even run it on an FPGA.
Crucially, you build it too. Every module — register file, ALU, control unit, datapath, pipeline — is given to you as copyable, downloadable code you can drop straight into our online Verilog simulator and watch run. No black boxes.
You can use computers your whole life and still find the processor "magic." Building one dissolves that mystery forever. You'll finally feel how an instruction becomes electrical control signals, how a register file and ALU cooperate, why pipelining is both powerful and tricky, and what a hazard really is. It's the single best way to understand computer architecture — and a standout portfolio project. Because RISC-V is an open ISA (no licence needed — see RISC-V vs ARM), we're legally free to implement every instruction ourselves, exactly like India's SHAKTI does.
Forget everything intimidating for a moment. Here's the whole idea in one picture:
Imagine a worker who can only do a handful of tiny jobs: "add these two numbers," "fetch a number from a shelf," "put a number back on a shelf," "if this is bigger than that, jump to a different step." That's literally all a processor does — millions of times per second. The fixed list of commands the worker understands is called the instruction set. RISC-V is one such list.
That "list of commands a chip understands" has a formal name: the ISA — Instruction Set Architecture. It's the language the hardware speaks. Every program — a game, a browser, an OS — is ultimately just a very long sequence of these simple commands. And a CPU's entire life is a four-step loop:
So what makes RISC-V special among instruction sets? Two things:
Think of a recipe written using only the most basic kitchen moves — "pick up," "pour," "stir," "wait." Each step is trivial on its own, but string thousands together and you can cook any dish. RISC-V instructions are those basic moves; a program is the recipe; the CPU is the cook following it exactly.
"RV32I" looks cryptic but it's just three pieces stuck together. Read it left to right:
So RV32I = the 32-bit, integer-only foundation of RISC-V. It's small enough to fully understand and build, yet complete enough to run real programs. That's why we start here.
The "I" base gives us everything we need to compute:
| Category | What it does | Examples |
|---|---|---|
| Arithmetic | add / subtract whole numbers | add, sub, addi |
| Logic | bitwise AND / OR / XOR, shifts | and, or, xor, sll, srl |
| Compare | set 1/0 if less-than | slt, slti |
| Load / Store | read from / write to memory | lw, sw, lb, sb |
| Branches | "if" — jump only when a condition holds | beq, bne, blt, bge |
| Jumps | go to another part of the program (loops, calls) | jal, jalr |
That's it — with just those, you can build loops, if-statements, function calls and any calculation. Anything extra is an optional add-on named by another letter, which RISC-V lets you bolt on only if you need it:
| Letter | Adds |
|---|---|
| M | multiply & divide (we add this in Phase 4) |
| F / D | floating-point (decimals) |
| A | atomic operations (for multi-core) |
| C | compressed 16-bit instructions (smaller code) |
A chip with all the common ones is written RV32IMAFDC — but the I is the heart, and the only part that's mandatory. Master RV32I and you understand the core of every RISC-V processor on Earth, including SHAKTI.
Our target is a clean, correct, then progressively faster RV32I core. Here's the high-level datapath we'll assemble piece by piece in Phase 2:
| Target spec | Choice |
|---|---|
| ISA | RV32I (32-bit base integer) — the clean RISC-V foundation |
| Language | Verilog (synthesizable, runs in our browser simulator) |
| v1 | Single-cycle core — simplest correct design |
| v2 | 5-stage pipeline + forwarding + hazard unit (the optimization) |
| Extras | CSRs/traps, memory-mapped UART, M-extension, FPGA |
Here's what makes this course different. For each module, Claude designs the architecture, writes clean and commented Verilog, and explains every decision — then progressively optimizes the design (sharing logic, pipelining, removing stalls). You get the reasoning and the ready-to-run code.
1. Concept — the idea explained in plain English with a diagram. 2. The RTL — Claude writes the module as copyable + downloadable code. 3. Test — paste it into the Verilog simulator and run it. 4. Optimize — we refine it toward a faster, cleaner core. You end each lesson with working hardware you understand.
Honest scope: this is a genuine, standards-compliant educational RV32I core that executes real RISC-V code and gets real optimizations — not a commercial tape-out-grade CPU. It teaches exactly how production cores (like SHAKTI) are structured, which is the goal.
You asked for it, so here's the standard for the whole course: every code block has a Copy button and a Download button. Try it on the project's top-level skeleton — the shell we'll fill in across Phase 2:
// EcrioniX · RISC-V from Scratch — top-level core (skeleton)
// We flesh out each piece across Phase 2 (Days 7-15).
module riscv_core (
input wire clk,
input wire rst_n, // active-low reset
// simple instruction + data memory interfaces
output wire [31:0] imem_addr,
input wire [31:0] imem_data,
output wire [31:0] dmem_addr,
output wire [31:0] dmem_wdata,
output wire dmem_we,
input wire [31:0] dmem_rdata
);
// --- coming soon ---
// wire [31:0] pc; // Day 8 : program counter
// wire [31:0] instr; // Day 8 : fetched instruction
// regfile / alu / control... // Days 9-14
endmodule
The lines inside module riscv_core (...) are the chip's ports — its electrical "pins," the only way it talks to the outside world. input = a signal coming into the CPU; output = a signal the CPU drives out. wire [31:0] just means "a 32-bit-wide connection." Let's decode every one:
| Port | Dir | What it is |
|---|---|---|
| clk | in | Clock — the CPU's heartbeat. On every tick (0→1) the processor takes one step. Everything is synchronized to it. |
| rst_n | in | Reset, active-low (the _n = "negated"). Drive it to 0 to force the CPU back to a known start state (PC = 0); 1 = run normally. |
| imem_addr | out | Instruction-memory address. The CPU outputs which instruction it wants next (this is the Program Counter). |
| imem_data | in | The instruction word (32 bits) read back from instruction memory at that address. |
| dmem_addr | out | Data-memory address the CPU wants to read from or write to (for lw/sw). |
| dmem_wdata | out | The data to write into data memory (used by store instructions). |
| dmem_we | out | Write-enable. 1 = "store this data," 0 = "I'm only reading." |
| dmem_rdata | in | The data read back from data memory (used by load instructions). |
Two of those names need their full forms — they confuse everyone at first:
lw (load word) and sw (store word).Keeping the program (imem) and the data (dmem) on separate connections lets the CPU fetch an instruction and access data at the same time, instead of waiting in line for one shared memory. This split is called a Harvard arrangement (the same idea behind the split I-cache/D-cache in real chips). For each, the CPU drives an address out and gets the data back in — and for data memory it can also send data out with a write-enable. That's the entire conversation a simple CPU has with memory.
So this top-level module is really saying: "Give me the instruction at imem_addr, and let me read/write data at dmem_addr." Everything we build in Phase 2 lives inside this shell and decides what those addresses and data should be each clock tick.
And here's the first program our finished CPU will run — sum the numbers 1 to 10 in RISC-V assembly:
# sum 1..10 -> x10 (a0). Our CPU will execute this by Day 15.
li x5, 0 # sum = 0
li x6, 1 # i = 1
li x7, 11 # limit
loop: bge x6, x7, done # if i >= 11, stop
add x5, x5, x6 # sum += i
addi x6, x6, 1 # i++
jal x0, loop # repeat
done: add x10, x5, x0 # result in a0 (should be 55)
Don't worry if the assembly looks unfamiliar — decoding exactly these instructions is what Phase 1 is about.
See the full lesson-by-lesson roadmap →
always blocks. New to it? Start with our Verilog tutorials.We're going to build a real RV32I RISC-V CPU in Verilog — from a single-cycle core that runs a program, to a pipelined, optimized processor — with Claude writing and optimizing every module, all copy/download-ready and runnable in your browser.
sum.s program compute, and where's the result?A complete RV32I RISC-V CPU in Verilog — single-cycle first, then pipelined and optimized — that executes real programs.
No CPU-design experience needed; just basic digital logic and Verilog. Everything is explained and written step by step.
Just a browser + the EcrioniX Verilog simulator. Local tools (Icarus/Verilator, a RISC-V assembler) are optional.
Yes — a standards-compliant, educational RV32I core that runs real machine code and gains real optimizations.