chop.passes.transform.tensorrt#
tensorrt_fake_quantization_transform_pass#
- chop.passes.graph.transforms.tensorrt.quantize.calibrate.tensorrt_fake_quantize_transform_pass(*args, **kwargs)#
tensorrt_calibrate_transform_pass#
- chop.passes.graph.transforms.tensorrt.quantize.calibrate.tensorrt_calibrate_transform_pass(*args, **kwargs)#
tensorrt_fine_tune_transform_pass#
- chop.passes.graph.transforms.tensorrt.quantize.fine_tune.tensorrt_fine_tune_transform_pass(graph, pass_args=None)[source]#
Fine-tunes a quantized model using Quantization Aware Training (QAT) to improve its accuracy post-quantization.
This pass employs a fine-tuning process that adjusts the quantized model’s weights in a way that acknowledges the quantization effects, thereby aiming to recover or even surpass the original model’s accuracy. The training process uses a reduced number of epochs and a significantly lower learning rate compared to the initial training phase, following a cosine annealing learning rate schedule.
- Parameters:
graph (MaseGraph) – The model graph to be fine-tuned. This graph should already be quantized.
pass_args (dict, optional) – A dictionary containing arguments for fine-tuning, such as the number of epochs (epochs), the initial learning rate (initial_learning_rate), and the final learning rate (final_learning_rate). These parameters allow customization of the training regime based on the specific needs of the model and dataset.
- Returns:
A tuple containing the fine-tuned graph and an empty dictionary. The empty dictionary is a placeholder for potential extensions.
- Return type:
tuple(MaseGraph, dict)
The default training regime involves: - Using 10% of the original training epochs. - Starting with 1% of the original training learning rate. - Employing a cosine annealing schedule to reduce the learning rate to 0.01% of the initial training learning rate by the end of fine-tuning.
The resulting fine-tuned model checkpoints are saved in the following directory structure, facilitating easy access and version control:
- mase_output
- tensorrt
- quantization
- model_task_dataset_date
cache
- ckpts
fine_tuning
json
onnx
trt
Example of usage:
graph = MaseGraph(…) fine_tuned_graph, _ = tensorrt_fine_tune_transform_pass(graph, {‘epochs’: 5, ‘initial_learning_rate’: 0.001, ‘final_learning_rate’: 0.00001})
This example demonstrates initiating the fine-tuning process with custom epochs, and initial and final learning rates, adapting the training regime to the specific requirements of the quantized model.