Lab 3: Mixed Precision Search#
ELEC70109/EE9-AML3-10/EE9-AO25
Written by
Aaron Zhao and
Pedro Gimenes
General introduction#
You have looked at how to quantize models in lab0 and how to search for optimal architectures in lab2. In this lab, you will learn how to use Mase to search for optimal quantization schemes for a model.
Learning tasks#
Go through “Tutorial 6: Mixed Precision Quantization Search with Mase and Optuna” to understand how to use Mase to search for optimal quantization schemes for a model.
Implementation tasks#
In Tutorial 6, all layers allocated to IntegerLinear are allocated the same width and fractional width. This is suboptimal, as different layers may have different sensitivities to quantization.
Modify the code to allow different layers to have widths in the range [8, 16, 32] and fractional widths in the range [2, 4, 8]. Expose this choice as an additional hyperparameter for the Optuna sampler.
Run the search again, and plot a figure that has the number of trials on the x axis, and the maximum achieved accuracy up to that point on the y axis.
In Section 1 of Tutorial 6, when defining the search space, a number of layers are imported, however only LinearInteger and the full precision nn.Linear are selected.
Now, extend the search to consider all supported precisions for the Linear layer in Mase, including Minifloat, BlockFP, BlockLog, Binary, etc. This may also require changing the model constructor so the required arguments are passed when instantiating each layer.
Run the search again, and plot a figure that has the number of trials on the x axis, and the maximum achieved accuracy up to that point on the y axis. Plot one curve for each precision to compare their performance.