Torq Runtime Python API (Beta)

The torq-runtime Python package provides bindings for loading and running compiled .vmfb models on a Torq device directly from Python.

Warning

torq-runtime is currently in beta and is not yet available on PyPI.

Installation

The torq-runtime package is included in the GitHub release. Install the runtime wheel directly from any release snapshot.

Quick Start

import numpy as np
from torq.runtime import VMFBInferenceRunner

# Load the compiled model
runner = VMFBInferenceRunner("mobilenetv2.vmfb", device_uri="torq")

# Prepare input data
input_data = np.random.randint(0, 255, size=(1, 224, 224, 3), dtype=np.int8)

# Run inference
outputs = runner.infer([input_data])
print(f"Inference took {runner.infer_time_ms:.2f} ms")

API Reference

VMFBInferenceRunner

The main class for loading and running .vmfb models via the IREE runtime.

VMFBInferenceRunner(
    model_path,
    *,
    function="main",
    device_uri="torq",
    n_threads=None,
    load_method="preload",
    load_model_to_mem=True,
    runtime_flags=None,
)

Parameters:

Parameter

Type

Default

Description

model_path

str | PathLike

(required)

Path to the .vmfb file.

function

str

"main"

Exported function name inside the module.

device_uri

str

"torq"

IREE device identifier.

n_threads

int | None

None

Worker thread count (only for llvm-cpu device).

load_method

"preload" | "mmap"

"preload"

"preload" copies into memory; "mmap" memory-maps the file.

load_model_to_mem

bool

True

Whether to load the model into memory during initialization.

runtime_flags

Iterable[str] | None

None

Extra IREE runtime flags.

Properties:

Property

Type

Description

model_path

PathLike

Path to the loaded model file.

infer_time_ms

float

Elapsed time in milliseconds for the last call to infer().

inputs_info

list[TensorInfo] | None

Input tensor metadata extracted from the model, or None if unavailable.

outputs_info

list[TensorInfo] | None

Output tensor metadata extracted from the model, or None if unavailable.

Methods:

infer(inputs)

Run inference and return the output arrays.

  • inputs — Either an iterable of NumPy arrays or a mapping of name to array.

  • Returns — A list of NumPy arrays containing the model outputs.

profile_vmfb_inference_time

Load a .vmfb model and run inference multiple times for profiling.

profile_vmfb_inference_time(
    model_path,
    inputs=None,
    *,
    n_iters=5,
    do_warmup=True,
    function="main",
    device="torq",
    n_threads=None,
    load_model_to_mem=True,
    runtime_flags=None,
)

Parameters:

Parameter

Type

Default

Description

model_path

str | PathLike

(required)

Path to the .vmfb file.

inputs

Iterable[NDArray] | None

None

Input arrays. Generated randomly from model metadata when None.

n_iters

int

5

Number of timed inference iterations.

do_warmup

bool

True

Whether to run one untimed warmup pass first.

function

str

"main"

Exported function name inside the module.

device

str

"torq"

IREE device URI.

n_threads

int | None

None

Worker thread count (only for llvm-cpu device).

load_model_to_mem

bool

True

Whether to load the model into memory during initialization.

runtime_flags

Iterable[str] | None

None

Extra IREE runtime flags.

Returns: Average wall-clock inference time in milliseconds.

run_vmfb

Run a .vmfb model via the iree-run-module CLI and return wall-clock time.

run_vmfb(
    model_path,
    inputs,
    outputs,
    device="torq",
    n_threads=None,
    iree_binary=None,
)

Parameters:

Parameter

Type

Default

Description

model_path

str | PathLike

(required)

Path to the .vmfb file.

inputs

Iterable[str]

(required)

Input descriptors forwarded as --input flags.

outputs

Iterable[str]

(required)

Output descriptors forwarded as --output flags.

device

str

"torq"

IREE device URI.

n_threads

int | None

None

Worker thread count (only for llvm-cpu device, defaults to os.cpu_count()).

iree_binary

str | PathLike | None

None

Path to the iree-run-module binary. Resolved from PATH if not provided.

Returns: Elapsed wall-clock time in milliseconds.

TensorInfo

Dataclass holding dtype and shape metadata for a tensor.

@dataclass
class TensorInfo:
    dtype: DTypeLike
    shape: list[int | str]

Field

Type

Description

dtype

DTypeLike

NumPy-compatible dtype.

shape

list[int | str]

Tensor dimensions.

Methods:

  • is_valid() — Returns True if every dimension is an integer (i.e., no dynamic dimensions).

Utility Functions

random_inputs_from_info(inputs_info)

Generate random NumPy arrays matching the given tensor metadata. Useful for testing.

  • inputs_info — Iterable of TensorInfo.

  • Returns — List of NumPy arrays with appropriate shapes and dtypes.

Examples

Inspecting Model Inputs and Outputs

from torq.runtime import VMFBInferenceRunner

runner = VMFBInferenceRunner("model.vmfb", device_uri="torq")

if runner.inputs_info:
    for i, info in enumerate(runner.inputs_info):
        print(f"Input {i}: dtype={info.dtype}, shape={info.shape}")

if runner.outputs_info:
    for i, info in enumerate(runner.outputs_info):
        print(f"Output {i}: dtype={info.dtype}, shape={info.shape}")

Profiling Inference Latency

from torq.runtime import profile_vmfb_inference_time

avg_ms = profile_vmfb_inference_time(
    "model.vmfb",
    n_iters=10,
    do_warmup=True,
    device="torq",
)
print(f"Average inference time: {avg_ms:.2f} ms")

Running with Custom Inputs

import numpy as np
from torq.runtime import VMFBInferenceRunner

runner = VMFBInferenceRunner("model.vmfb", device_uri="torq")

# Load preprocessed input from a .npy file
input_data = np.load("preprocessed_input.npy")
outputs = runner.infer([input_data])