RE: LeoThread 2024-10-28 03:27

Quantization

Quantization is another key step in reducing the size and computational requirements of a model. It involves converting the weights and activations to a lower precision data type, such as 8-bit or 16-bit floating-point numbers. There are several techniques used for quantization, including:

Fixed-point quantization: Converting the weights and activations to fixed-point numbers.
Integer quantization: Converting the weights and activations to integer numbers.
Binary quantization: Converting the weights and activations to binary numbers.
Perceptual quantization: Converting the weights and activations to a lower precision data type based on the perceived quality of the output.