# Model testing and debugging workflow ```{note} This page is under construction ``` The torq-compiler should by default produce an optimized model for hardware, however sometimes the results may not be what we want. This chapter explains how to leverage the fact torq-compiler is open source and that its behaviour can be changed. In the following we imagine we have a ONNX model and we want to make it run as fast as possible on the torq hardware. After installing the torq development environment you can start by adding the model in the testing framework. This framework automatically generates useful test cases to improve the compilation of a model. The guide assumes you installed the development environment for the compiler and compiled it. It also assumes you activated the compiler development environment ## Running test suite for a model ### Initial setup To run a test suite for a model add the ONNX file in the ``tests/testdata/onnx_models``. In the rest of this guide we will use the test ``example-matmul.onnx`` already present in the directory. The pytest framework will automatically detect the model and create a set of test cases for it for any new file found in this directory. ## Collect all the tests for a model To view the test cases run the following command: ``` $ pytest tests/test_onnx_model.py -k example-matmul --collect-only ``` The result will look as follows: ``` ``` The framework created a test case for the full model (denoted with the ``_full_model`` suffix) and a set of test cases for each of the layers of the model (denoted with the corresponding layer name). ## What the tests do The tests is defined in the function ``test_onnx_model_llvmcpu_torq`` in the ``tests/test_onnx_model.py``. The test compiles the model for Torq simulation (using torq-compile) and for the host CPU (using iree-compile and the LLVMCPU backend), runs them against some random inputs and compares the results. The test returns ERROR if the any of the steps required to compute the two inference results fail and return FAIL if the two inputs are not sufficiently similar. ## Executing the tests To run the tests use the following command line: ``` $ pytest tests/test_onnx_model.py -k example-matmul ``` Pytest will provide a report with all the successes, errors and failures and potentially related error logs. In the ideal situation all the tests will pass, in some cases, especially if models contain layers that are not yet supported a subset of the test will succeed while some will fail. Once the tests have run it is possible to interpret the results as follows: 1) If the test with name ``_full_model-sim-default`` it means that the full model was successfully executed and the results of the model are similar to IREE default LLVMCPU backend 2) If that's not the case typically a subset of the other tests have passed, indicating the layers that were successfully compiled and their output validated. To further understand failures it is possible to inspect the output of pytest. This will contain the exact point in the testing procedure that created a problem. Issues may be problems during compilation or simulation (these issues are marked as ERROR) or mismatch of outputs with the reference results (these are marked as FAIL). The two categories of problems need to be investigated differently. ## Investigating ERORRs The first step to investigate an error is pinpoint which step in the testing procedure failed. This can be identified by looking the stack trace provided by PyTest. In the line that starts with ``kwargs`` it is possible to see the sub-request that failed, in particular the name of the fixture that failed. For instance: ``` kwargs = {'llvmcpu_compiler': VersionedUncachedData(data='...>, ...} ``` Shows that the fixture ``torq_compiled_model_dir`` failed to execute. This means that the test framework was not able to compile the model with ``torq-compile``. When an external program failed it is possible to find the stdout and stderr of the invocation in the same pytest output log. Errors in the full model are often associated with errors in some of the layer tests. It is often better to make sure all per-layer tests pass before diving into the full model errors. Typical errors are: - Errors while converting from ONNX to MLIR (fixture **onnx_mlir_model_file**). These errors are typically due to an unsupported version of ONNX being used. Converting the model to a recent version may help. - Errors while compiling with IREE (fixture **llvmcpu_compiled_model**). These errors may be due to an unsupported version of ONNX or operator - Errors while compiling with ``torq-compile`` (fixture **torq_compiled_model_dir**) ## Investigating FAILUREs Debugging failures is more complex as the compiler fully executed the model but found a mismatch in the outputs. The pytest result provides a command line that can be run to execute an output comparison tool. This helps to visualize the difference between expected and actual results of the computation. Accuracy failure in the full model are often associated with failures in some of the layer tests. It is often better to make sure all per-layer tests pass before diving into the full model errors. Accuracy tests failures, especially with floating point models, may sometimes be false negatives. Floating point operations are executed differently on CPU and TORQ and the exact numerical results are often different. In some cases the default tolerance of the testing framework is too strict and the tests fails even if this is not actually a problem. ## Debugging a test In order to execute a single test you can invoke pytest as follows: ``` $ pytest tests/test_onnx_model.py -s -k example-matmul_layer_MatMul_0-sim-default ``` The ``-s`` option allows to see the output of the tools called by the testing framework while the test is running. When debugging an error while compiling with ``torq-compile`` it is often useful to inspect the IR being produced by the compiler. This allows to narrow down to the pass that actually is failing (or by inspecting the IR the pass that generated the problematic IR). This can be done by using the following command line: ``` $ pytest tests/test_onnx_model.py -s -k example-matmul_layer_MatMul_0-sim-default --debug-ir ir_dump ``` The IR produced by the compiler will be stored in the directory ``ir_dump``. Make sure you clean this directory between runs. ## Performance debugging Debugging the performance of the model to detect bottlnecks can be done with the same testing framework using two different approaches: - Compile time profiling: the compiler estimates the time operation take on Torq hardware and creates a report - Runtime profiling: the runtime records the timestamp for the operations and creates a report The first approach provides a very rough estimate while the second approach provides a more accurate measurement. In order to provide realistic results however runtime profiling must executed using real hardware as the simulation is not time accurate. To perform compile time profiling use the following command line option: ``` pytest tests/test_onnx_model.py -k example-matmul_layer --torq-compile-time-profiling-output-dir=profile --recompute-cache ``` The compile time logs are going to be available in the directory ``profile``. To perform runtime profiling use the following command line option: ``` pytest tests/test_onnx_model.py -k example-matmul_layer --torq-runtime-profiling-output-dir=profile --recompute-cache ``` The compile time logs are going to be available in the directory ``profile``. Both runtime and compile time performance tracing produce both ``.csv`` files and ``.pb`` files that can be viewed with the [Perfetto tool](https://ui.perfetto.dev/). ## Executing tests using real hardware The test by default run on an host based simulator (as denoted by the "-sim" suffix). It is possible to enable tests on the hardware with the following command line: ``` pytest tests/test_onnx_model.py -k example-matmul_layer --torq-runtime-hw-type=astra_machina --torq-addr ${board_address} ``` where ``${board_address}`` denotes the address of an astra board (e.g., ``root@10.3.120.55``). ### Auto-deploying the runtime to the board When iterating on the runtime binary (``torq-run-module``) or the NPU kernel module, use ``--update-astra-runtime`` to automatically deploy your locally built artifacts to the board before running tests: ``` pytest tests/test_onnx_model.py -k example-matmul_layer \ --torq-runtime-hw-type=astra_machina \ --torq-addr ${board_address} \ --update-astra-runtime ``` This flag enables: - **Auto-deploy torq-run-module**: the locally cross-compiled binary is compared (MD5) against the one in your user-specific board path and copied only if it differs. - **Per-user runner paths**: each developer gets an isolated binary on the board under ``/home/root/iree-build-soc//torq-run-module``. Runtime execution uses this copied binary, so tests run with the developer's locally built runtime rather than a shared system binary. - **Board hostname validation**: the board's hostname must match the pattern ``sl2619-dev-board-NNN`` (e.g. ``sl2619-dev-board-002``). If it doesn't, pytest fails immediately with guidance on how to set the hostname using ``scripts/set_board_hostname.py``. - **Exclusive board access**: a session-level lock is acquired at the start of the pytest session and released when it ends. The lock records the owner's username, hostname, and timestamp so others can see who is using the board. If a previous session crashed without releasing the lock you can clear it with: ``` python3 scripts/reset_board_lock.py root@ ``` Even if the session is interrupted with Ctrl+C or killed with SIGTERM, the lock is released automatically via signal handlers and ``atexit``. - **Wall-clock timing**: the remote execution time is measured **on the board itself** using the shell ``time`` built-in, so the reported value excludes SSH transport overhead. The ``real`` time printed by ``time`` is parsed from the command output and recorded in the profiling summary at the end of the session. ### Deploying a custom NPU kernel module To deploy a custom ``.ko`` file to the board, pass ``--torq-ko-path``: ``` pytest tests/test_onnx_model.py -k example-matmul_layer \ --torq-runtime-hw-type=astra_machina \ --torq-addr ${board_address} \ --update-astra-runtime \ --torq-ko-path /path/to/syna_npu.ko ``` The framework compares the local ``.ko`` hash with the on-board module at ``/usr/lib/modules/6.12.11/updates/syna_npu.ko``. If they differ, the on-board copy is replaced and the board is **rebooted** — the pytest session exits with an error asking you to re-run after the board finishes booting. ``--torq-ko-path`` requires ``--update-astra-runtime`` to be set. ### Finding boards on the network Three utility scripts help manage boards: **Scan for reachable boards** (only boards with hostname ``sl2619-dev-board-NNN`` are shown): ``` python3 scripts/scan_boards.py ``` Output: ``` IP HOSTNAME ---------------------------------------- 10.46.130.17 sl2619-dev-board-001 10.46.130.42 sl2619-dev-board-002 ---------------------------------------- 2 board(s) reachable. ``` **Search for a specific board by hostname:** ``` python3 scripts/scan_boards.py --find sl2619-dev-board-001 ``` **Set a hostname on a board** (persists across reboots): ``` python3 scripts/set_board_hostname.py 10.46.130.17 sl2619-dev-board-001 ``` **Reset a stale board lock** (if a previous session crashed without releasing it): ``` python3 scripts/reset_board_lock.py root@10.46.130.17 ``` ## Using the results dashboard In order to analyze large number of tests e.g. when the model contains many layers. It may be beneficial to use the performance dashboard. The dashboard is a web application following the instructions in ``webapps/dashboard``. Once started you can point the testing framework to it by setting the following environment variable: `` export TORQ_PERF_SERVER=http://localhost:8080 `` After each test you will see a link to the results in the dashboard. You can use the dashboard to compare the performance across different tests and to inspect detailed traces. ## Advanced: Test framework internals and PyTest fixtures The test leverages a set of [PyTest fixtures](https://docs.pytest.org/en/latest/how-to/fixtures.html) to obtain the different artifacts required to perform the comparison. Fixtures can in turn depend on other fixtures. The most important are: - **onnx_mlir_model_file** : this fixture returns the input model converted to MLIR using ``iree.compiler.tools.import_onnx`` - **torq_compiled_model_dir**: the model compiled with ``torq-compile`` - **llvmcpu_compiled_model**: the model compiled with ``iree-compile`` with the LLVMCPU backend - **tweaked_random_input_data**: random inputs suitable for the model - **llvmcpu_reference_results** : results of the inference using **tweaked_random_input_data** and the model **llvmcpu_compiled_model** - **torq_results** : results of the inference using **tweaked_random_input_data** and the model in **torq_compiled_model_dir** - **chip_config**: the target TORQ-enabled chip for which the model is compiled and simulated (different chips can be enabled with the ``--torq-chips`` command line option) - **runtime_hw_type**: the target hardware emulation used to run the torq model (can be changed with the ``--torq-runtime-hw-type``) The source code of the fixtures and their dependecy relatioship can be found by inspecting the files in ``python/torq/testing``. ## TFLite model testing The testing framework also supports TFLite models. Unlike ONNX models which are imported via ``iree.compiler.tools.import_onnx``, TFLite models go through a different conversion pipeline: ``iree-import-tflite`` (TFLite → TOSA bytecode) followed by ``iree-opt`` (TOSA → text MLIR). A key difference from the ONNX workflow is how layers are extracted. TFLite layers are extracted by directly manipulating the TFLite flatbuffer, which **preserves quantization parameters exactly** from the original model. This ensures that each extracted layer remains correctly quantized with its original scale and zero-point values. The test file is ``tests/test_tflite_model.py``. Place your ``.tflite`` model files in ``tests/testdata/tflite_models/`` and the framework will automatically detect them and create test cases for each layer and the full model. ### Initial setup Place the ``.tflite`` file in ``tests/testdata/tflite_models/``. The framework will automatically detect it and create test cases. ### Collecting tests To view the test cases: ``` $ pytest tests/test_tflite_model.py --collect-only ``` The output will show individual layer tests and a full model test: ``` ... ``` ### What the tests do Each TFLite test case: 1. **Extracts layers** from the TFLite flatbuffer (at collection time). Each layer becomes a standalone ``.tflite`` model preserving the original quantization. 2. **Converts to MLIR** via ``iree-import-tflite`` → ``iree-opt`` (at test runtime, cached by versioned fixtures). 3. **Compiles** the MLIR for Torq simulation (``torq-compile``) and LLVM-CPU (``iree-compile``). 4. **Runs inference** with random inputs on both backends. 5. **Compares results** element-wise between the two backends. ### Environment variables The following environment variables control test behaviour: - ``MAX_LAYERS``: limit the number of layers extracted (default: ``0`` = no limit). Example: ``MAX_LAYERS=5 pytest tests/test_tflite_model.py -v --collect-only`` - ``FORCE_EXTRACT``: set to ``1`` to force re-extraction of layers even if cached. Example: ``FORCE_EXTRACT=1 pytest tests/test_tflite_model.py -v --collect-only`` ### Running specific subsets ```bash # Full model only: pytest tests/test_tflite_model.py -v -k "full_model" # Specific layer type: pytest tests/test_tflite_model.py -v -s -k "layer_CONV_2D" # A single specific layer: pytest tests/test_tflite_model.py -v -s -k "my_model_layer_CONV_2D_1" # Parallel execution: pytest tests/test_tflite_model.py -v -n 8 ``` ### How layer extraction works The layer extraction is performed by ``torq.testing.tflite_layer_extractor.extract_all_layers``. It works directly on the TFLite flatbuffer: 1. Parses the flatbuffer to enumerate all operators. 2. For each operator, builds a new single-operator TFLite model that includes only the relevant tensors, buffers, and quantization parameters. 3. Saves each layer as a standalone ``.tflite`` file in ``tests/testdata/tflite_models/.mlir_cache/_layers/``. 4. A ``_cases_cache.json`` is saved alongside so that subsequent runs skip extraction entirely and load the test case list from the JSON cache. The extraction happens at **test collection time** (inside ``pytest_generate_tests``), since pytest needs the list of layers to parametrize the tests. The TOSA/MLIR conversion happens later at **test runtime** via the ``tflite_mlir_model_file`` versioned fixture. ### Caching TFLite tests use two caching layers: 1. **Layer extraction cache** (``tests/testdata/tflite_models/.mlir_cache/``): stores extracted layer ``.tflite`` files and ``_cases_cache.json``. Cleared with ``FORCE_EXTRACT=1``. 2. **Versioned fixtures cache** (``.pytest_cache/d/versioned_fixtures/``): stores TOSA/MLIR files, compiled models, and inference results. Each artifact is keyed by a hash of its inputs, so changes propagate automatically. Cleared with ``--recompute-cache``. ### Investigating errors and failures The same debugging approaches described in the ONNX sections above apply. The key fixture names differ for TFLite: - **tflite_mlir_model_file**: converts TFLite → TOSA (via ``iree-import-tflite``) → text MLIR (via ``iree-opt``). Errors here indicate unsupported TFLite operators or conversion issues. - **torq_compiled_model_dir**: compiles the MLIR with ``torq-compile``. Same as ONNX. Debugging options (``--debug-ir``, ``-s``, ``--recompute-cache``), performance profiling (``--torq-compile-time-profiling-output-dir``, ``--torq-runtime-profiling-output-dir``), and hardware execution (``--torq-runtime-hw-type``, ``--torq-addr`` , ``--update-astra-runtime``) all work identically to the ONNX workflow described above. ### Key TFLite fixtures - **tflite_layer_model**: provides the test case data (layer path, op name, quantization info) - **tflite_model_file**: versioned static file fixture wrapping the ``.tflite`` path - **tflite_mlir_model_file**: converts TFLite → TOSA → MLIR (cached) - **tweaked_random_input_data**: random inputs used for layer and full model tests - **llvmcpu_reference_results**: results of inference using LLVM-CPU backend - **torq_results**: results of inference using Torq backend