Quantization
Quantization is another key step in reducing the size and computational requirements of a model. It involves converting the weights and activations to a lower precision data type, such as 8-bit or 16-bit floating-point numbers. There are several techniques used for quantization, including:
- Fixed-point quantization: Converting the weights and activations to fixed-point numbers.
- Integer quantization: Converting the weights and activations to integer numbers.
- Binary quantization: Converting the weights and activations to binary numbers.
- Perceptual quantization: Converting the weights and activations to a lower precision data type based on the perceived quality of the output.