AI Chip Design Day 3

TOPS, FLOPS, and Power Metrics

Key Metrics

FLOPS: Floating-point operations per second (FP32)
TOPS: Tera-operations per second (INT8, BF16)
TFLOPS/W: Tera-operations per Watt (efficiency)
Peak vs Sustained: Peak = no thermal limits; Sustained = real-world

Why Quantization = Power Savings

FP32: 32-bit per value. Lots of precision, big data movement.
INT8: 8-bit per value. 4× less data. 4× less power. Same accuracy for inference.

Most modern AI chips never use FP32 for inference. Why pay 4× power for precision you don't need?

Day 4: Real AI chip examples and their specifications.

Energy Efficiency

TOPS, FLOPS, and Power Metrics

Key Metrics

Why Quantization = Power Savings