Quickstart#

This page gives a brief overview of the main Mase workflows. For hands-on walkthroughs, follow the Tutorials.

Importing a model#

To import a model into Mase, wrap it in a MaseGraph. Mase uses Torch FX to build a computation graph that analysis and transform passes can iterate over.

from transformers import AutoModelForSequenceClassification
from chop import MaseGraph
import chop.passes as passes

model = AutoModelForSequenceClassification.from_pretrained("prajjwal1/bert-tiny")
mg = MaseGraph(model, hf_input_names=["input_ids", "attention_mask", "labels"])
mg, _ = passes.init_metadata_analysis_pass(mg)
mg, _ = passes.add_common_metadata_analysis_pass(mg)

See Tutorial 1: Introduction to the Mase IR, MaseGraph and Torch FX passes for a full walkthrough.

Model Compression#

The CompressionPipeline applies quantization and pruning in a single pass, preparing a model for edge deployment.

from chop.pipelines import CompressionPipeline
from chop import MaseGraph

mg = MaseGraph(model)
pipe = CompressionPipeline()

quantization_config = {
    "by": "type",
    "default": {"config": {"name": None}},
    "linear": {"config": {"name": "integer",
                           "data_in_width": 8, "data_in_frac_width": 4,
                           "weight_width": 8, "weight_frac_width": 4,
                           "bias_width": 8, "bias_frac_width": 4}},
}
pruning_config = {
    "weight": {"sparsity": 0.5, "method": "l1-norm", "scope": "local"},
    "activation": {"sparsity": 0.5, "method": "l1-norm", "scope": "local"},
}

mg, _ = pipe(mg, pass_args={
    "quantize_transform_pass": quantization_config,
    "prune_transform_pass": pruning_config,
})

See Tutorial 3: Running Quantization-Aware Training (QAT) on Bert for quantization, Tutorial 4: Unstructured Pruning on Bert for pruning, and Tutorial 5: Neural Architecture Search (NAS) with Mase and Optuna for combined compression via CompressionPipeline.