HomeDay 25

Mobile Accelerators

Qualcomm Hexagon, MediaTek, Samsung: bringing AI to smartphones, smartwatches, IoT. Power, latency, integration constraints.

Mobile NPU Landscape

OEMNamePeak TOPSPowerUse Case
AppleNeural Engine172WiPhone (all models)
QualcommHexagon DSP4-80.5-1WAndroid flagship
MediaTekAPU (AI Processor)2-40.3-0.5WBudget/mid-range
SamsungNPU (Exynos)1-20.2WGalaxy A/M series
HuaweiDa Vinci (Kirin)81WRestricted (sanctions)

Qualcomm Hexagon DSP

Digital Signal Processor, not systolic

Architecture

Why Not Systolic?

Systolic arrays assume: - Large, regular matrix multiplies - Batch processing Mobile use: - Small models (MobileNetV3, ResNet-50 pruned) - One image at a time (batch=1) - Variable layer sizes - Tight latency budget (<10 ms) Result: SIMD DSP is more flexible, even if less throughput.

Power Budget Reality

Smartphone power consumption (active use): - Screen: 2-3W - CPU: 1-2W - GPU: 2-3W - Modem: 0.5W - NPU: 0.2-1W ← This is the constraint! Battery capacity: 3,000-4,000 mAh (10-15 Wh) Target endurance: 10+ hours NPU for facial recognition: ~10 ms per frame @ 30 fps → 0.3W average if running continuously → 2% of total power budget (acceptable)

Real Mobile AI Workloads

Common Use Cases

  • Face recognition: 10-20 ms (MobileNetV2 backbone)
  • Object detection: 50-100 ms (SSD-MobileNet)
  • Scene understanding: 100-200 ms (semantic segmentation)
  • Speech recognition: Real-time (DSP or CPU)
  • Generative AI: Not yet (<100M param models only)

Model Sizes

ModelParamsSize (INT8)Device
MobileNetV35.4M2 MBAny phone
ResNet-5025M100 MBFlagship
BERT-base110M440 MBRare (storage)
LLaMA-7B7B3.5 GBNot feasible

Integration: SoC Perspective

Mobile NPUs are on the same chip as CPU/GPU, sharing memory and power rails:

  • Reduced latency (no external I/O)
  • Shared HBM? No (size + cost constraints)
  • Shared cache? Partial (L3 sometimes shared)
  • Power gating: All NPU components can be disabled when idle

Day 26: Practical design: building a simple 4×4 systolic MAC in Verilog. From theory to HDL.