Step-by-Step Model Deployment Examples
This guide walks through converting, compiling, and running ML models on a Torq device.
Prerequisites
Activate the Python environment as explained in Getting Started. (Skip this step if using the Docker container.)
Navigate to the root directory of the Release Package, or run the Docker container. For the Docker container, the release package is located at:
$ cd /opt/release
Prepare Model (Convert to MLIR)
Before compiling, your model must be converted to MLIR. See Model Preparation for full instructions on each framework.
Here is a quick summary of conversion commands:
TFLite:
$ iree-import-tflite path/to/model.tflite -o model.tosa
ONNX:
$ python -m iree.compiler.tools.import_onnx path/to/model.onnx -o model.mlir --data-prop
Torch: See Convert Torch Model to MLIR for the Python-based conversion workflow.
Note
The tests/hf/ directory containing sample models is only included in the Release Package and is not available in the compiler GitHub repository.
Compile for Device
Use torq-compile to compile the MLIR model into a Torq runtime executable (.vmfb).
The key flags are:
Flag |
Description |
|---|---|
|
Specifies the MLIR dialect of the input file. See table below. |
|
Output path for the compiled model. |
Input types by source framework:
Source Framework |
Input Type |
Typical File Extension |
|---|---|---|
Auto (default) |
|
all |
TFLite |
|
|
ONNX |
|
|
Torch |
|
|
Linalg |
|
|
Note
Input type defaults to auto, which defers input type detection and conversion to the compiler. This behavior can be overridden by manually specifying --iree-input-type.
Example: TFLite model (MobileNetV2)
Model Source: The MobileNetV2 model is generated from tf.keras.applications using tf_model_generator.py. The dataset used for int8 quantization consists of random data. This model is only included in the Release Package and is not available in the compiler GitHub repository.
# Convert TFLite to TOSA
$ iree-import-tflite tests/hf/Synaptics_MobileNetV2/MobileNetV2_int8.tflite -o mobilenetv2.tosa
# Compile for device
$ torq-compile mobilenetv2.tosa -o mobilenetv2.vmfb
Example: ONNX model
# Convert ONNX to MLIR
$ python -m iree.compiler.tools.import_onnx path/to/model.onnx -o model.mlir --data-prop
# Compile for device
$ torq-compile model.mlir -o model.vmfb
Run Inference on Torq
Use the compiled model with torq-run-module to run inference:
$ torq-run-module \
--module=mobilenetv2.vmfb \
--function=main \
--input="1x224x224x3xi8=1"
Note
To use an actual image as input, preprocess the image and save it as a NumPy array (.npy file). Then provide the path to the .npy file as input, as described in the Input Guide.
Using the Python Bindings
You can also run inference programmatically using the torq-runtime Python package:
from torq.runtime import VMFBInferenceRunner
runner = VMFBInferenceRunner("mobilenetv2.vmfb", device_uri="torq")
outputs = runner.infer(inputs)
print(f"Inference took {runner.infer_time_ms} ms")
The VMFBInferenceRunner class supports options such as function and load_method. See the Torq Runtime Python API page for the full API reference.
Compiling for Host (Simulation)
To compile for the current host machine (e.g., for testing without device hardware), add simulator-specific flags:
$ torq-compile mobilenetv2.tosa \
--torq-css-qemu \
--torq-target-host-triple=native \
-o mobilenetv2.vmfb
Handling Unsupported Operations
For information about handling unsupported operations and CSS fallback, see Handling Unsupported Ops.