chop.passes.transform.tensorrt#

tensorrt_fake_quantization_transform_pass#

chop.passes.graph.transforms.tensorrt.quantize.calibrate.tensorrt_fake_quantize_transform_pass(*args, **kwargs)#

tensorrt_calibrate_transform_pass#

chop.passes.graph.transforms.tensorrt.quantize.calibrate.tensorrt_calibrate_transform_pass(*args, **kwargs)#

tensorrt_fine_tune_transform_pass#

chop.passes.graph.transforms.tensorrt.quantize.fine_tune.tensorrt_fine_tune_transform_pass(graph, pass_args=None)[source]#

Fine-tunes a quantized model using Quantization Aware Training (QAT) to improve its accuracy post-quantization.

This pass employs a fine-tuning process that adjusts the quantized model’s weights in a way that acknowledges the quantization effects, thereby aiming to recover or even surpass the original model’s accuracy. The training process uses a reduced number of epochs and a significantly lower learning rate compared to the initial training phase, following a cosine annealing learning rate schedule.

Parameters:
  • graph (MaseGraph) – The model graph to be fine-tuned. This graph should already be quantized.

  • pass_args (dict, optional) – A dictionary containing arguments for fine-tuning, such as the number of epochs (epochs), the initial learning rate (initial_learning_rate), and the final learning rate (final_learning_rate). These parameters allow customization of the training regime based on the specific needs of the model and dataset.

Returns:

A tuple containing the fine-tuned graph and an empty dictionary. The empty dictionary is a placeholder for potential extensions.

Return type:

tuple(MaseGraph, dict)

The default training regime involves: - Using 10% of the original training epochs. - Starting with 1% of the original training learning rate. - Employing a cosine annealing schedule to reduce the learning rate to 0.01% of the initial training learning rate by the end of fine-tuning.

The resulting fine-tuned model checkpoints are saved in the following directory structure, facilitating easy access and version control:

  • mase_output
    • tensorrt
      • quantization
        • model_task_dataset_date
          • cache

          • ckpts
            • fine_tuning

          • json

          • onnx

          • trt

Example of usage:

graph = MaseGraph(…) fine_tuned_graph, _ = tensorrt_fine_tune_transform_pass(graph, {‘epochs’: 5, ‘initial_learning_rate’: 0.001, ‘final_learning_rate’: 0.00001})

This example demonstrates initiating the fine-tuning process with custom epochs, and initial and final learning rates, adapting the training regime to the specific requirements of the quantized model.