HomeDay 3

Energy Efficiency

TOPS, FLOPS, and Power Metrics

Key Metrics

Why Quantization = Power Savings

FP32: 32-bit per value. Lots of precision, big data movement.
INT8: 8-bit per value. 4× less data. 4× less power. Same accuracy for inference.

Most modern AI chips never use FP32 for inference. Why pay 4× power for precision you don't need?

Day 4: Real AI chip examples and their specifications.