Question 1

What is a GPU?

Accepted Answer

A GPU (Graphics Processing Unit) is a specialized processor designed to perform thousands of mathematical operations in parallel. Originally built to render graphics on screens, modern GPUs are now the backbone of AI, machine learning, scientific simulation, and cryptocurrency mining. A GPU contains thousands of smaller cores (e.g., NVIDIA A100 has 6912 CUDA cores) optimized for parallel workloads, unlike a CPU which has fewer but more powerful sequential-processing cores.

Question 2

What is the difference between a GPU and a CPU?

Accepted Answer

A CPU (Central Processing Unit) has a few powerful cores (typically 4–64) optimized for sequential, low-latency tasks. A GPU has thousands of smaller cores (1000–20000+) optimized for parallel throughput. CPUs handle operating systems, applications, and control logic. GPUs handle massive parallel workloads like matrix multiplication (AI/ML), 3D rendering, and video encoding. For AI training, a GPU can be 100x faster than a CPU for matrix operations.

Question 3

What does VRAM mean in a GPU?

Accepted Answer

VRAM (Video RAM) is the dedicated high-speed memory on a GPU used to store textures, frame buffers, and data for active computations. VRAM is much faster than system RAM — modern AI GPUs like the NVIDIA H100 use HBM3 VRAM with over 3 TB/s bandwidth. For AI/ML, more VRAM means larger models can fit on a single GPU (e.g., 80GB H100 vs 24GB RTX 4090).

Question 4

What are CUDA cores in NVIDIA GPUs?

Accepted Answer

CUDA cores are NVIDIA's parallel processing units inside a GPU. Each CUDA core can perform floating-point arithmetic simultaneously with thousands of others. They are organized into Streaming Multiprocessors (SMs). For example, the RTX 4090 has 16,384 CUDA cores across 128 SMs. More CUDA cores generally means higher throughput for parallel workloads like AI training and graphics rendering.

Question 5

Which GPU is best for AI and machine learning?

Accepted Answer

For professional AI/ML workloads: NVIDIA H100 (80GB HBM3, 3.35 TB/s bandwidth) is the industry standard for training large models. NVIDIA A100 is the previous generation data center GPU still widely used. For consumer AI: RTX 4090 (24GB GDDR6X) offers the best price/performance. AMD's Instinct MI300X (192GB HBM3) is the strongest AMD competitor for LLM inference. The GPU market is dominated by NVIDIA due to the CUDA software ecosystem.

Question 6

How is a GPU made? What process node?

Accepted Answer

Modern GPUs are fabricated at TSMC and Samsung foundries. NVIDIA's RTX 40-series (Ada Lovelace) uses TSMC 4N (customized 4nm). NVIDIA H100 uses TSMC 4N. AMD RDNA 4 uses TSMC 3nm. GPUs are among the largest chips by die area — the H100 SXM has a 814mm² die with 80 billion transistors. This makes GPU design one of the most complex semiconductor challenges, requiring advanced packaging like HBM stacking.

Question 7

What is a GPU used for beyond graphics?

Accepted Answer

Modern GPUs are used for: (1) AI/ML training and inference — matrix math at scale, (2) Scientific simulation — CFD, molecular dynamics, weather modeling, (3) Cryptocurrency mining — proof-of-work hashing, (4) Video encoding/streaming — hardware H.264/H.265/AV1 encoders, (5) EDA acceleration — some VLSI tools like Synopsys use GPUs to accelerate simulation, (6) Ray tracing for photorealistic rendering. The shift from graphics-only to general-purpose GPU computing (GPGPU) happened with NVIDIA's CUDA platform launch in 2006.

Property	CPU	GPU
Core count	4 – 128 cores	1,000 – 20,000+ cores
Core design	Large, complex (out-of-order, branch predict)	Small, simple (in-order, scalar)
Clock speed	4 – 6 GHz	1.5 – 3 GHz
Cache	Large (up to 256 MB L3)	Smaller (shared per SM)
Memory	System RAM (DDR5, 100–200 GB/s)	VRAM — HBM3 up to 3.35 TB/s
Best for	OS, apps, sequential logic, low-latency	AI, graphics, video, simulations
Programming	C, C++, Python (any language)	CUDA (NVIDIA), ROCm (AMD), OpenCL
Power (TDP)	15 – 400W	75 – 700W
Process node	TSMC 3nm, Intel 20A	TSMC 4N, TSMC 3nm

Architecture	Series	Process	Key Feature	Use Case
Hopper	H100, H200	TSMC 4N	Transformer Engine, NVLink 4	AI Data Center
Blackwell	B100, B200, GB200	TSMC 4NP	5th gen Tensor Cores, FP4	AI Data Center
Ada Lovelace	RTX 4090–4060	TSMC 4N	3rd gen RT Core, DLSS 3	Gaming / Prosumer
Ampere	A100, RTX 30-series	Samsung 8nm	Multi-instance GPU (MIG)	AI + Gaming
Jetson Orin	AGX Orin	TSMC 8nm	Ampere GPU + Cortex-A78AE	Edge AI / Automotive

Architecture	Series	Process	Key Feature	Use Case
RDNA 4	RX 9000-series	TSMC 3nm	Ray accelerators, FSR 4	Gaming
CDNA 3	Instinct MI300X	TSMC 5nm	192 GB HBM3, unified CPU+GPU die	AI / HPC
RDNA 3	RX 7000-series	TSMC 5nm	Chiplet design	Gaming

Spec	What it means	Why it matters
CUDA cores / Shaders	Number of parallel compute units	More = higher throughput for parallel workloads
VRAM	Dedicated GPU memory (GB)	Model size limit for AI; texture budget for games
Memory Bandwidth	GB/s of data fed to GPU cores	Critical for memory-bound workloads like LLM inference
TDP (Watts)	Thermal Design Power = power consumption	Determines cooling, electricity cost, and data center density
TFLOPS	Trillion floating-point ops/second	Raw compute throughput — higher = faster AI training
Die size (mm²)	Physical area of the chip	Larger die = more cores but lower yield = higher cost
Process node	Transistor size (nm)	Smaller = more transistors, better power efficiency

What is a GPU?
Graphics Processing Unit Explained

What is a GPU? BASICS

GPU vs CPU — Key Differences COMPARISON

GPU Architecture Deep Dive ARCHITECTURE

CUDA Core (Shader Unit)

Streaming Multiprocessor (SM)

Warp

Tensor Core

RT Core (Ray Tracing)

HBM / GDDR VRAM

NVIDIA GPU Families NVIDIA

AMD GPU Families AMD

GPU Specs Explained SPECS

GPU in AI and Machine Learning AI/ML

How GPU Chips Are Made CHIP DESIGN

Frequently Asked Questions FAQ

What is a GPU?Graphics Processing Unit Explained