Production AI Chips
Apple A18 Neural Engine (2024)
iPhone 16
• 17 TFLOPS (INT8)
• ~2W power budget
• Inference only (no training)
• On-device: Face unlock, photo search, voice
• 17 TFLOPS (INT8)
• ~2W power budget
• Inference only (no training)
• On-device: Face unlock, photo search, voice
Google TPU v4 (Data Center)
Cloud AI
• 430 TFLOPS
• 150W power
• Multi-chip (16 chips in a pod)
• Trains & infers LLMs
• 430 TFLOPS
• 150W power
• Multi-chip (16 chips in a pod)
• Trains & infers LLMs
NVIDIA H100 (GPU Alternative)
Data Center GPU
• 1,450 TFLOPS (sparse tensor)
• 700W power
• General-purpose (can do anything)
• Market leader but less efficient than TPU
• 1,450 TFLOPS (sparse tensor)
• 700W power
• General-purpose (can do anything)
• Market leader but less efficient than TPU
Day 5: Design trade-offs: what do you sacrifice for efficiency?