HomeToolsGPU Lab
🎮 INTERACTIVE LAB

How a GPU Works — Watch It, Don't Just Read It

A CPU is a few geniuses. A GPU is a thousand interns doing the same tiny job at once. Race them, change the core count, and launch a real kernel across the cores below.

Lab 1 · The Race
CPU vs GPU — same job, who finishes first?

Both have to apply the same operation to 48 data elements (think: brighten 48 pixels). The CPU has 1 core — it does them one at a time. The GPU has many small cores — it does a whole batch every cycle. Hit Run and count the cycles.

GPU cores: 16
0
CPU cycles
0
GPU cycles
GPU speedup

Try it: drag the slider to 48 cores and the GPU finishes the whole array in a single cycle. That's the entire idea of a GPU — trade clever-and-serial for simple-and-massively-parallel.

The Idea
Few geniuses vs a thousand interns

🧠 CPU — few, powerful cores

A handful of complex cores with big caches and branch predictors. Brilliant at one complicated, decision-heavy task done fast — running your OS, logic, "if this then that". Latency-optimised.

⚡ GPU — thousands of simple cores

Thousands of small arithmetic units that all run the same instruction on different data at once. Brilliant at one simple calculation done a million times — pixels, matrices, AI. Throughput-optimised.

Neither is "better" — they're built for opposite goals. The GPU wins only when the work is wide, uniform and independent: every data element needs the same math and doesn't depend on its neighbours.

Lab 2 · SIMT
One line of code. Thousands of threads.

The magic word is SIMT — Single Instruction, Multiple Threads. You write one tiny program (a kernel), and the GPU runs it on thousands of threads at once — each thread handles one element, identified by its thread ID. Launch the kernel below and watch every thread brighten its own pixel simultaneously.

// the kernel — every thread runs THIS, once __global__ void brighten(unsigned char* px) { int i = threadIdx.x; // my thread ID = my pixel px[i] = min(255, px[i] + 80); // do my one job }
Input pixels
Threads (each owns one pixel)
Output pixels
The Honest Part
When a GPU is the wrong tool

GPUs aren't magic speed buttons. They lose when:

That's why your computer has both: a CPU for the smart, sequential, decision-heavy work, and a GPU for the wide, repetitive number-crunching. Graphics, deep learning, physics and crypto live on the GPU; everything else stays on the CPU.

Reference
FAQ

Why is a GPU faster than a CPU?

For wide, uniform, independent data it processes many elements per cycle (thousands of simple cores) instead of one. The race above shows it directly.

What is SIMT?

Single Instruction, Multiple Threads — one kernel runs on thousands of threads, each on its own data element identified by its thread ID.

When is a GPU not faster?

Serial, branchy or dependency-heavy work, or tiny datasets where data-transfer overhead dominates. Then the CPU's strong single-thread performance wins.

Related: Cache Simulator · Why AI Needs So Many Chips · Logic Gate Simulator