TOPS, FLOPS, and Power Metrics
Key Metrics
- FLOPS: Floating-point operations per second (FP32)
- TOPS: Tera-operations per second (INT8, BF16)
- TFLOPS/W: Tera-operations per Watt (efficiency)
- Peak vs Sustained: Peak = no thermal limits; Sustained = real-world
Why Quantization = Power Savings
FP32: 32-bit per value. Lots of precision, big data movement.
INT8: 8-bit per value. 4× less data. 4× less power. Same accuracy for inference.
Most modern AI chips never use FP32 for inference. Why pay 4× power for precision you don't need?
INT8: 8-bit per value. 4× less data. 4× less power. Same accuracy for inference.
Most modern AI chips never use FP32 for inference. Why pay 4× power for precision you don't need?
Day 4: Real AI chip examples and their specifications.