# Torq Runtime Python API (Beta)

The `torq-runtime` Python package provides bindings for loading and running compiled `.vmfb` models on a Torq device directly from Python.

```{warning}
`torq-runtime` is currently in beta and is not yet available on PyPI.
```

## Installation

The `torq-runtime` package is included in the GitHub release. Install the runtime wheel directly from any release snapshot.

## Quick Start

```python
import numpy as np
from torq.runtime import VMFBInferenceRunner

# Load the compiled model
runner = VMFBInferenceRunner("mobilenetv2.vmfb", device_uri="torq")

# Prepare input data
input_data = np.random.randint(0, 255, size=(1, 224, 224, 3), dtype=np.int8)

# Run inference
outputs = runner.infer([input_data])
print(f"Inference took {runner.infer_time_ms:.2f} ms")
```

## API Reference

### `VMFBInferenceRunner`

The main class for loading and running `.vmfb` models via the IREE runtime.

```python
VMFBInferenceRunner(
    model_path,
    *,
    function="main",
    device_uri="torq",
    n_threads=None,
    load_method="preload",
    load_model_to_mem=True,
    runtime_flags=None,
)
```

**Parameters:**

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `model_path` | `str \| PathLike` | *(required)* | Path to the `.vmfb` file. |
| `function` | `str` | `"main"` | Exported function name inside the module. |
| `device_uri` | `str` | `"torq"` | IREE device identifier. |
| `n_threads` | `int \| None` | `None` | Worker thread count (only for llvm-cpu device). |
| `load_method` | `"preload" \| "mmap"` | `"preload"` | `"preload"` copies into memory; `"mmap"` memory-maps the file. |
| `load_model_to_mem` | `bool` | `True` | Whether to load the model into memory during initialization. |
| `runtime_flags` | `Iterable[str] \| None` | `None` | Extra IREE runtime flags. |

**Properties:**

| Property | Type | Description |
|----------|------|-------------|
| `model_path` | `PathLike` | Path to the loaded model file. |
| `infer_time_ms` | `float` | Elapsed time in milliseconds for the last call to `infer()`. |
| `inputs_info` | `list[TensorInfo] \| None` | Input tensor metadata extracted from the model, or `None` if unavailable. |
| `outputs_info` | `list[TensorInfo] \| None` | Output tensor metadata extracted from the model, or `None` if unavailable. |

**Methods:**

#### `infer(inputs)`

Run inference and return the output arrays.

- **inputs** — Either an iterable of NumPy arrays or a mapping of name to array.
- **Returns** — A list of NumPy arrays containing the model outputs.

### `profile_vmfb_inference_time`

Load a `.vmfb` model and run inference multiple times for profiling.

```python
profile_vmfb_inference_time(
    model_path,
    inputs=None,
    *,
    n_iters=5,
    do_warmup=True,
    function="main",
    device="torq",
    n_threads=None,
    load_model_to_mem=True,
    runtime_flags=None,
)
```

**Parameters:**

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `model_path` | `str \| PathLike` | *(required)* | Path to the `.vmfb` file. |
| `inputs` | `Iterable[NDArray] \| None` | `None` | Input arrays. Generated randomly from model metadata when `None`. |
| `n_iters` | `int` | `5` | Number of timed inference iterations. |
| `do_warmup` | `bool` | `True` | Whether to run one untimed warmup pass first. |
| `function` | `str` | `"main"` | Exported function name inside the module. |
| `device` | `str` | `"torq"` | IREE device URI. |
| `n_threads` | `int \| None` | `None` | Worker thread count (only for llvm-cpu device). |
| `load_model_to_mem` | `bool` | `True` | Whether to load the model into memory during initialization. |
| `runtime_flags` | `Iterable[str] \| None` | `None` | Extra IREE runtime flags. |

**Returns:** Average wall-clock inference time in milliseconds.

### `run_vmfb`

Run a `.vmfb` model via the `iree-run-module` CLI and return wall-clock time.

```python
run_vmfb(
    model_path,
    inputs,
    outputs,
    device="torq",
    n_threads=None,
    iree_binary=None,
)
```

**Parameters:**

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `model_path` | `str \| PathLike` | *(required)* | Path to the `.vmfb` file. |
| `inputs` | `Iterable[str]` | *(required)* | Input descriptors forwarded as `--input` flags. |
| `outputs` | `Iterable[str]` | *(required)* | Output descriptors forwarded as `--output` flags. |
| `device` | `str` | `"torq"` | IREE device URI. |
| `n_threads` | `int \| None` | `None` | Worker thread count (only for llvm-cpu device, defaults to `os.cpu_count()`). |
| `iree_binary` | `str \| PathLike \| None` | `None` | Path to the `iree-run-module` binary. Resolved from `PATH` if not provided. |

**Returns:** Elapsed wall-clock time in milliseconds.

### `TensorInfo`

Dataclass holding dtype and shape metadata for a tensor.

```python
@dataclass
class TensorInfo:
    dtype: DTypeLike
    shape: list[int | str]
```

| Field | Type | Description |
|-------|------|-------------|
| `dtype` | `DTypeLike` | NumPy-compatible dtype. |
| `shape` | `list[int \| str]` | Tensor dimensions. |

**Methods:**

- `is_valid()` — Returns `True` if every dimension is an integer (i.e., no dynamic dimensions).

### Utility Functions

#### `random_inputs_from_info(inputs_info)`

Generate random NumPy arrays matching the given tensor metadata. Useful for testing.

- **inputs_info** — Iterable of `TensorInfo`.
- **Returns** — List of NumPy arrays with appropriate shapes and dtypes.

## Examples

### Inspecting Model Inputs and Outputs

```python
from torq.runtime import VMFBInferenceRunner

runner = VMFBInferenceRunner("model.vmfb", device_uri="torq")

if runner.inputs_info:
    for i, info in enumerate(runner.inputs_info):
        print(f"Input {i}: dtype={info.dtype}, shape={info.shape}")

if runner.outputs_info:
    for i, info in enumerate(runner.outputs_info):
        print(f"Output {i}: dtype={info.dtype}, shape={info.shape}")
```

### Profiling Inference Latency

```python
from torq.runtime import profile_vmfb_inference_time

avg_ms = profile_vmfb_inference_time(
    "model.vmfb",
    n_iters=10,
    do_warmup=True,
    device="torq",
)
print(f"Average inference time: {avg_ms:.2f} ms")
```

### Running with Custom Inputs

```python
import numpy as np
from torq.runtime import VMFBInferenceRunner

runner = VMFBInferenceRunner("model.vmfb", device_uri="torq")

# Load preprocessed input from a .npy file
input_data = np.load("preprocessed_input.npy")
outputs = runner.infer([input_data])
```