HomeDay 24

Specialized ASICs

Groq, Cerebras, SambaNova: hyper-specialized inference accelerators. Different tradeoffs than Google/Apple/NVIDIA.

The Specialists vs Generalists

CompanyChipFocusMax TFLOPSCost
GoogleTPUTrain + Infer430Data center
NVIDIAH100All compute1,450Expensive, flexible
GroqLPUInference (LLM)3,800Narrow, fast
CerebrasWaferTrainingUnknown (huge)Experimental
SambaNovaDataflowTraining + Infer12,800Limited capacity

Groq LPU (Language Processing Unit)

Extreme specialization: LLM inference only

Why This Works for Inference

LLM inference pattern: 1. Load model weights (one-time, slow) 2. Feed tokens through network 3. Generate next token (parallelizable with batch) Traditional systolic: Designed for training (multiple epochs) Groq LPU: Designed for serving (one-pass inference) Result: Achieves 430 tokens/sec for GPT-3 (vs GPU's 50 tokens/sec)

Cerebras Wafer-Scale Engine

Extreme integration: entire chip on one wafer

SambaNova Reconfigurable

Dataflow units that reconfigure per model:

The Tradeoff

Specialization gains: - Groq: 10× faster LLM inference (but only LLMs) - Cerebras: Massive parallelism (but software immature) - SambaNova: Flexibility (but hard to program) Generalization wins: - TPU: train + infer (good for both) - H100: all workloads (most flexible) Production reality: Google/NVIDIA win market share despite lower peak TFLOPS because developers know how to use them.

Day 25: Other accelerators (Qualcomm Hexagon, Intel Gaudi, AWS Trainium).