Torq Runtime Python API (Beta)

The torq-runtime Python package provides bindings for loading and running compiled .vmfb models on a Torq device directly from Python.

Warning

torq-runtime is currently in beta and is not yet available on PyPI.

Installation

The torq-runtime package is included in the GitHub release. Install the runtime wheel directly from any release snapshot.

Quick Start

import numpy as np
from torq.runtime import VMFBInferenceRunner

# Load the compiled model
runner = VMFBInferenceRunner("mobilenetv2.vmfb", device_uri="torq")

# Prepare input data
input_data = np.random.randint(0, 255, size=(1, 224, 224, 3), dtype=np.int8)

# Run inference
outputs = runner.infer([input_data])
print(f"Inference took {runner.infer_time_ms:.2f} ms")

API Reference

`VMFBInferenceRunner`

The main class for loading and running .vmfb models via the IREE runtime.

VMFBInferenceRunner(
    model_path,
    *,
    function="main",
    device_uri="torq",
    n_threads=None,
    load_method="preload",
    load_model_to_mem=True,
    runtime_flags=None,
)

Parameters:

Parameter	Type	Default	Description
`model_path`	`str \| PathLike`	(required)	Path to the `.vmfb` file.
`function`	`str`	`"main"`	Exported function name inside the module.
`device_uri`	`str`	`"torq"`	IREE device identifier.
`n_threads`	`int \| None`	`None`	Worker thread count (only for llvm-cpu device).
`load_method`	`"preload" \| "mmap"`	`"preload"`	`"preload"` copies into memory; `"mmap"` memory-maps the file.
`load_model_to_mem`	`bool`	`True`	Whether to load the model into memory during initialization.
`runtime_flags`	`Iterable[str] \| None`	`None`	Extra IREE runtime flags.

Properties:

Property	Type	Description
`model_path`	`PathLike`	Path to the loaded model file.
`infer_time_ms`	`float`	Elapsed time in milliseconds for the last call to `infer()`.
`inputs_info`	`list[TensorInfo] \| None`	Input tensor metadata extracted from the model, or `None` if unavailable.
`outputs_info`	`list[TensorInfo] \| None`	Output tensor metadata extracted from the model, or `None` if unavailable.

Methods:

`infer(inputs)`

Run inference and return the output arrays.

inputs — Either an iterable of NumPy arrays or a mapping of name to array.
Returns — A list of NumPy arrays containing the model outputs.

`profile_vmfb_inference_time`

Load a .vmfb model and run inference multiple times for profiling.

profile_vmfb_inference_time(
    model_path,
    inputs=None,
    *,
    n_iters=5,
    do_warmup=True,
    function="main",
    device="torq",
    n_threads=None,
    load_model_to_mem=True,
    runtime_flags=None,
)

Parameters:

Parameter	Type	Default	Description
`model_path`	`str \| PathLike`	(required)	Path to the `.vmfb` file.
`inputs`	`Iterable[NDArray] \| None`	`None`	Input arrays. Generated randomly from model metadata when `None`.
`n_iters`	`int`	`5`	Number of timed inference iterations.
`do_warmup`	`bool`	`True`	Whether to run one untimed warmup pass first.
`function`	`str`	`"main"`	Exported function name inside the module.
`device`	`str`	`"torq"`	IREE device URI.
`n_threads`	`int \| None`	`None`	Worker thread count (only for llvm-cpu device).
`load_model_to_mem`	`bool`	`True`	Whether to load the model into memory during initialization.
`runtime_flags`	`Iterable[str] \| None`	`None`	Extra IREE runtime flags.

Returns: Average wall-clock inference time in milliseconds.

`run_vmfb`

Run a .vmfb model via the iree-run-module CLI and return wall-clock time.

run_vmfb(
    model_path,
    inputs,
    outputs,
    device="torq",
    n_threads=None,
    iree_binary=None,
)

Parameters:

Parameter	Type	Default	Description
`model_path`	`str \| PathLike`	(required)	Path to the `.vmfb` file.
`inputs`	`Iterable[str]`	(required)	Input descriptors forwarded as `--input` flags.
`outputs`	`Iterable[str]`	(required)	Output descriptors forwarded as `--output` flags.
`device`	`str`	`"torq"`	IREE device URI.
`n_threads`	`int \| None`	`None`	Worker thread count (only for llvm-cpu device, defaults to `os.cpu_count()`).
`iree_binary`	`str \| PathLike \| None`	`None`	Path to the `iree-run-module` binary. Resolved from `PATH` if not provided.

Returns: Elapsed wall-clock time in milliseconds.

`TensorInfo`

Dataclass holding dtype and shape metadata for a tensor.

@dataclass
class TensorInfo:
    dtype: DTypeLike
    shape: list[int | str]

Field	Type	Description
`dtype`	`DTypeLike`	NumPy-compatible dtype.
`shape`	`list[int \| str]`	Tensor dimensions.

Methods:

is_valid() — Returns True if every dimension is an integer (i.e., no dynamic dimensions).

Utility Functions

`random_inputs_from_info(inputs_info)`

Generate random NumPy arrays matching the given tensor metadata. Useful for testing.

inputs_info — Iterable of TensorInfo.
Returns — List of NumPy arrays with appropriate shapes and dtypes.

Examples

Inspecting Model Inputs and Outputs

from torq.runtime import VMFBInferenceRunner

runner = VMFBInferenceRunner("model.vmfb", device_uri="torq")

if runner.inputs_info:
    for i, info in enumerate(runner.inputs_info):
        print(f"Input {i}: dtype={info.dtype}, shape={info.shape}")

if runner.outputs_info:
    for i, info in enumerate(runner.outputs_info):
        print(f"Output {i}: dtype={info.dtype}, shape={info.shape}")

Profiling Inference Latency

from torq.runtime import profile_vmfb_inference_time

avg_ms = profile_vmfb_inference_time(
    "model.vmfb",
    n_iters=10,
    do_warmup=True,
    device="torq",
)
print(f"Average inference time: {avg_ms:.2f} ms")

Running with Custom Inputs

import numpy as np
from torq.runtime import VMFBInferenceRunner

runner = VMFBInferenceRunner("model.vmfb", device_uri="torq")

# Load preprocessed input from a .npy file
input_data = np.load("preprocessed_input.npy")
outputs = runner.infer([input_data])

Torq Runtime Python API (Beta)

Installation

Quick Start

API Reference

VMFBInferenceRunner

infer(inputs)

profile_vmfb_inference_time

run_vmfb

TensorInfo

Utility Functions

random_inputs_from_info(inputs_info)

Examples

Inspecting Model Inputs and Outputs

Profiling Inference Latency

Running with Custom Inputs

`VMFBInferenceRunner`

`infer(inputs)`

`profile_vmfb_inference_time`

`run_vmfb`

`TensorInfo`

`random_inputs_from_info(inputs_info)`