AI Chip Design Day 5

What You Sacrifice for Efficiency

NPU Trade-off #1: Inference Only

Can it train? Most mobile NPUs: NO. They're inference-only machines.

Why? Training requires FP32 precision, gradient computation, backprop. Overkill for mobile.

Result: Apple A18 can run a 3B parameter LLM in 50ms. Can't fine-tune on device.

NPU Trade-off #2: Fixed Operations

An NPU is one instruction: systolic array multiply.

Can't run:

Custom layers (you add later)
Complex branching logic
General-purpose code

Workaround: Hybrid chip. CPU handles control, NPU handles compute.

NPU Trade-off #3: Precision

Int8 quantization loses accuracy.

Top-1 accuracy of ResNet50:

FP32: 76.1%
INT8: 75.8% (minimal loss)

For inference, almost no loss. For training, unacceptable.

NPU vs GPU: The Design Decision

Choose NPU when:
• Inference only
• Energy budget tight (mobile, edge)
• Single workload (AI)
• Scale (billions of units)

Choose GPU when:
• Training required
• Multiple workloads
• Flexibility needed
• High power budget available

Tomorrow (Day 6): The building block: multiply-accumulate (MAC) units.

Design Trade-offs

What You Sacrifice for Efficiency

NPU Trade-off #1: Inference Only

NPU Trade-off #2: Fixed Operations

NPU Trade-off #3: Precision

NPU vs GPU: The Design Decision