Torq Runtime Python API (Beta)
The torq-runtime Python package provides bindings for loading and running compiled .vmfb models on a Torq device directly from Python.
Warning
torq-runtime is currently in beta and is not yet available on PyPI.
Installation
The torq-runtime package is included in the GitHub release. Install the runtime wheel directly from any release snapshot.
Quick Start
import numpy as np
from torq.runtime import VMFBInferenceRunner
# Load the compiled model
runner = VMFBInferenceRunner("mobilenetv2.vmfb", device_uri="torq")
# Prepare input data
input_data = np.random.randint(0, 255, size=(1, 224, 224, 3), dtype=np.int8)
# Run inference
outputs = runner.infer([input_data])
print(f"Inference took {runner.infer_time_ms:.2f} ms")
API Reference
VMFBInferenceRunner
The main class for loading and running .vmfb models via the IREE runtime.
VMFBInferenceRunner(
model_path,
*,
function="main",
device_uri="torq",
n_threads=None,
load_method="preload",
load_model_to_mem=True,
runtime_flags=None,
)
Parameters:
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
(required) |
Path to the |
|
|
|
Exported function name inside the module. |
|
|
|
IREE device identifier. |
|
|
|
Worker thread count (only for llvm-cpu device). |
|
|
|
|
|
|
|
Whether to load the model into memory during initialization. |
|
|
|
Extra IREE runtime flags. |
Properties:
Property |
Type |
Description |
|---|---|---|
|
|
Path to the loaded model file. |
|
|
Elapsed time in milliseconds for the last call to |
|
|
Input tensor metadata extracted from the model, or |
|
|
Output tensor metadata extracted from the model, or |
Methods:
infer(inputs)
Run inference and return the output arrays.
inputs — Either an iterable of NumPy arrays or a mapping of name to array.
Returns — A list of NumPy arrays containing the model outputs.
profile_vmfb_inference_time
Load a .vmfb model and run inference multiple times for profiling.
profile_vmfb_inference_time(
model_path,
inputs=None,
*,
n_iters=5,
do_warmup=True,
function="main",
device="torq",
n_threads=None,
load_model_to_mem=True,
runtime_flags=None,
)
Parameters:
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
(required) |
Path to the |
|
|
|
Input arrays. Generated randomly from model metadata when |
|
|
|
Number of timed inference iterations. |
|
|
|
Whether to run one untimed warmup pass first. |
|
|
|
Exported function name inside the module. |
|
|
|
IREE device URI. |
|
|
|
Worker thread count (only for llvm-cpu device). |
|
|
|
Whether to load the model into memory during initialization. |
|
|
|
Extra IREE runtime flags. |
Returns: Average wall-clock inference time in milliseconds.
run_vmfb
Run a .vmfb model via the iree-run-module CLI and return wall-clock time.
run_vmfb(
model_path,
inputs,
outputs,
device="torq",
n_threads=None,
iree_binary=None,
)
Parameters:
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
|
(required) |
Path to the |
|
|
(required) |
Input descriptors forwarded as |
|
|
(required) |
Output descriptors forwarded as |
|
|
|
IREE device URI. |
|
|
|
Worker thread count (only for llvm-cpu device, defaults to |
|
|
|
Path to the |
Returns: Elapsed wall-clock time in milliseconds.
TensorInfo
Dataclass holding dtype and shape metadata for a tensor.
@dataclass
class TensorInfo:
dtype: DTypeLike
shape: list[int | str]
Field |
Type |
Description |
|---|---|---|
|
|
NumPy-compatible dtype. |
|
|
Tensor dimensions. |
Methods:
is_valid()— ReturnsTrueif every dimension is an integer (i.e., no dynamic dimensions).
Utility Functions
random_inputs_from_info(inputs_info)
Generate random NumPy arrays matching the given tensor metadata. Useful for testing.
inputs_info — Iterable of
TensorInfo.Returns — List of NumPy arrays with appropriate shapes and dtypes.
Examples
Inspecting Model Inputs and Outputs
from torq.runtime import VMFBInferenceRunner
runner = VMFBInferenceRunner("model.vmfb", device_uri="torq")
if runner.inputs_info:
for i, info in enumerate(runner.inputs_info):
print(f"Input {i}: dtype={info.dtype}, shape={info.shape}")
if runner.outputs_info:
for i, info in enumerate(runner.outputs_info):
print(f"Output {i}: dtype={info.dtype}, shape={info.shape}")
Profiling Inference Latency
from torq.runtime import profile_vmfb_inference_time
avg_ms = profile_vmfb_inference_time(
"model.vmfb",
n_iters=10,
do_warmup=True,
device="torq",
)
print(f"Average inference time: {avg_ms:.2f} ms")
Running with Custom Inputs
import numpy as np
from torq.runtime import VMFBInferenceRunner
runner = VMFBInferenceRunner("model.vmfb", device_uri="torq")
# Load preprocessed input from a .npy file
input_data = np.load("preprocessed_input.npy")
outputs = runner.infer([input_data])