torq-gen-config: a TORQ config generation tool
A guide for finding the best execution configuration (NSS/CSS/Host) for each operation in a model.
Table of Contents
1. Overview
Why torq-gen-config?
TORQ has three executors, each with different strengths:
Executor |
Description |
Priority |
|---|---|---|
NSS |
NPU Subsystem (hardware) |
1st |
CSS |
CPU Subsystem (hardware) |
2nd |
Host |
CPU fallback |
3rd |
Different operations work better on different executors. For example, convolution layers often run efficiently on NSS, while some complex tensor operations may only work correctly on Host. Additionally, the same operation might work on one executor but fail on another due to hardware limitations, unsupported data types, or numerical precision issues. torq-gen-config automatically tests each operation with all executors and records which ones work correctly, ensuring optimal performance and correctness when running the full model.
How It Works
ONNX Model → Extract Layers → Test NSS/CSS/Host → Get Recommended Executor → Save JSON → Run Full Model
Example: SqueezeNet 1.0
This manual uses squeezenet1.0-12.onnx (66 operations: Conv, Relu, MaxPool, Concat, etc.) as a running example.
Tested Models
The following CNN models have been verified to work with torq-gen-config:
Model |
Type |
Status |
|---|---|---|
|
CNN |
Working |
|
CNN |
Working |
|
CNN |
Working |
|
CNN |
Working |
|
CNN |
Working |
Output Files
Discovery produces two JSON files:
File |
Suffix |
Purpose |
|---|---|---|
Report JSON |
|
Human-readable discovery results: statuses, timing, tolerances, per-layer details. Used by |
Compiler JSON |
|
Minimal |
Key rules:
Report JSON is the source of truth. If it exists,
runalways regenerates the compiler JSON from it.editonly touches report JSON. It never reads or writes compiler JSON directly.Compiler JSON is always a derived artifact. It’s either regenerated from report JSON (on
run) or hand-edited.Compiler JSON alone is a valid input. If report JSON is absent, the compiler consumes the compiler JSON as-is.
Hand-edited compiler JSON is overwritten on the next
run— unless the report JSON is missing, in which case your edits stick forever.
You normally only interact with the report JSON.
2. Three-Step Workflow
Step 1: Discover
Test each operation to find the best executor:
# Using torq-gen-config (recommended)
torq-gen-config discover \
--model ./tests/testdata/onnx_models/squeezenet1.0-12.onnx \
--output-dir ./results \
--skip-mode
# Or using pytest directly
pytest tests/test_onnx_gen_config.py \
-v -k "_layer_" \
--model-path=./tests/testdata/onnx_models/squeezenet1.0-12.onnx \
--output-dir=./results \
--skip-mode --recompute-cache
What it does:
Extracts each layer from the model
Tests NSS → CSS → Host
Stops after first success (with
--skip-mode)Creates
torq_gen_config_squeezenet1.0-12.json
Step 2: Review
View the JSON results with the built-in viewer:
# Using --model shortcut (auto-resolves JSON path from --output-dir)
torq-gen-config view \
--model ./tests/testdata/onnx_models/squeezenet1.0-12.onnx \
--output-dir ./results
# Or specify the JSON path directly
torq-gen-config view ./results/torq_gen_config_squeezenet1.0-12.json
Output shows each layer with status for all executors:
Layer NSS CSS HOST Recommended
----------------------------------------------------------------------------------------------------
Conv_conv1_1 success difference success nss
Relu_conv1_2 success - - nss
MaxPool_pool1_1 success - - nss
...
Status types:
Status |
Meaning |
Can Use? |
|---|---|---|
|
Works correctly, output matches reference |
Yes |
|
Runs but output differs from reference (within tolerance) |
Yes, with adjusted tolerance |
|
Compilation or runtime failure |
No |
Understanding difference status:
A difference status means the executor runs successfully but produces numerically different results. This often happens with BF16 models or hardware approximations. You can accept the difference by adjusting tolerance:
View the current difference:
torq-gen-config view torq_gen_config_squeezenet1.0-12.json Conv_conv1_1
Use the
editcommand to increase tolerance:torq-gen-config edit \ --model model.onnx \ --layer Conv_conv1_1 \ --tolerance-avg 0.1 \ --tolerance-max 0.1
Re-test just that layer:
torq-gen-config discover --model model.onnx -- -k "Conv_0_css" --recompute-cache
If the test passes with new tolerance, the recommended_executor will be updated to prefer that executor.
Step 3: Run Full Model
Compile and run the complete model with discovered assignments:
# Using torq-gen-config (recommended)
torq-gen-config run \
--model ./tests/testdata/onnx_models/squeezenet1.0-12.onnx \
--output-dir ./results \
--debug-ir=tmp
# Or using pytest directly
pytest tests/test_onnx_gen_config.py \
-v -k "_full_model" \
--model-path=./tests/testdata/onnx_models/squeezenet1.0-12.onnx \
--output-dir=./results \
--debug-ir=tmp --recompute-cache
3. Viewing Results
Using the Viewer Script
The viewer displays results layer by layer:
# Using --model shortcut (auto-resolves JSON path from --output-dir)
torq-gen-config view --model squeezenet1.0-12.onnx --output-dir results/
# Or specify the JSON path directly
torq-gen-config view results/torq_gen_config_squeezenet1.0-12.json
# View details for a specific layer
torq-gen-config view --model squeezenet1.0-12.onnx --output-dir results/ Conv_conv1_1
Viewing Compiler JSON
The viewer also handles compiler-format JSON (*_compiler.json) gracefully:
# View compiler JSON — shows executor distribution and line:col assignments
torq-gen-config view --model squeezenet1.0-12.onnx --output-dir results/
# (auto-detects compiler format if report JSON is absent)
# Or point directly at the compiler JSON
torq-gen-config view torq_gen_config_squeezenet1.0-12_compiler.json
Output for a compiler JSON:
============================================================
MODEL: squeezenet1.0-12 (compiler format)
============================================================
Total assignments: 66
Executor distribution:
NSS: 52
CSS: 10
HOST: 4
Assignments:
42:12 → nss
43:12 → nss
...
Understanding the JSON
{
"ops": {
"Conv_conv1_1": {
"executors": {
"nss": {"status": "success"},
"css": {"status": "difference"},
"host": {"status": "success"}
},
"recommended_executor": "nss",
"_node_index": 0,
"mlir_location": "271:12"
}
},
"discovery_report": {
"summary": {"total_layers": 66, "status_counts": {...}},
"critical_failures": []
}
}
How Recommended Executor is Determined
The recommended_executor is selected automatically based on executor priority and test results:
Priority Order: nss → css → host
Selection Logic:
Priority |
Status |
Example |
|---|---|---|
1st |
First executor with |
NSS success → recommend NSS |
2nd |
First executor with |
NSS error, CSS difference → recommend CSS |
3rd |
Fallback to |
NSS/CSS error → recommend Host |
Example scenarios:
NSS=
success, CSS=difference, Host=success→ recommendsnss(highest priority success)NSS=
error, CSS=difference, Host=success→ recommendscss(first working)NSS=
error, CSS=error, Host=success→ recommendshost(only option)NSS=
error, CSS=error, Host=error→ no recommendation (critical failure - full model cannot run!)
Changing the Recommended Executor
Use the edit command to safely override the recommendation. The command validates the JSON, updates the report, and regenerates the compiler config automatically.
Edit a single layer:
torq-gen-config edit \
--model ./tests/testdata/onnx_models/squeezenet1.0-12.onnx \
--layer Conv_conv1_1 \
--executor css
Layer matching: The --layer argument supports several strategies:
Strategy |
Example |
Matches |
|---|---|---|
Exact (case-insensitive) |
|
|
Substring |
|
All layers containing “conv1” |
fnmatch wildcard |
|
All layers starting with “Conv_” |
ALL |
|
Every layer |
Batch edit multiple layers:
# All Conv layers → NSS
torq-gen-config edit --model model.onnx --layer "Conv_*" --executor nss
# Every layer → CSS
torq-gen-config edit --model model.onnx --layer ALL --executor css
Edit tolerance:
torq-gen-config edit \
--model model.onnx \
--layer Conv_conv1_1 \
--tolerance-avg 0.1 \
--tolerance-max 0.5
List available layers:
torq-gen-config edit --model model.onnx --list
torq-gen-config edit --model model.onnx --list conv
Notes:
The
editcommand only modifies the report JSON (torq_gen_config_*.json). It never reads or writes the compiler JSON directly.If you accidentally pass the compiler JSON (e.g.,
torq_gen_config_model_compiler.json),editdetects it and refuses, pointing you to the correct report JSON file.After editing, run
torq-gen-config run— it will regenerate the compiler JSON from the updated report JSON before compiling.No need to re-run discovery. The compiler reads
recommended_executorfrom the JSON at compile time.
Timing-Based Executor Recommendation
By default, the recommended executor is determined by priority order (nss → css → host). However, you can use timing data to recommend the fastest executor instead.
How it works:
When
--collect-timingis enabled, runtime performance is measured for each executorWith
--recommend-by-timing, the fastest executor (lowestruntime_ms) withsuccessstatus is recommendedIf no executor has
successstatus, the fastest withdifferencestatus is recommended
Usage:
# Collect timing and recommend by performance
torq-gen-config discover \
--model ./model.onnx \
--collect-timing \
--timing-runs=5 \
--recommend-by-timing
# Or using pytest directly
pytest tests/test_onnx_gen_config.py \
-v -k "_layer_" \
--model-path=./model.onnx \
--collect-timing \
--timing-runs=5 \
--recommend-by-timing \
--recompute-cache
When to use:
When you want optimal performance rather than just functional correctness
When multiple executors work but you want the fastest one
For performance tuning and benchmarking
Note: --recommend-by-timing requires --collect-timing to be effective. If timing data is not available, falls back to priority-based selection.
4. Handling Issues
Critical Failures (All Executors Failed)
A critical failure occurs when ALL three executors fail for an operation:
CRITICAL FAILURES (all executors error): 1
- ProblematicOp_output (node: 42)
Important: If any operation has a critical failure, the full model cannot run at all. Every operation must have at least one working executor for the full model to compile and execute.
Solutions (in order):
Debug specific layer (see below) - understand why it’s failing
Use subgraph mode - isolate the problematic operation and surrounding context
Skip crashing executors - if NSS/CSS hang, test with Host only
Report issue - if Host also fails, this may be an unsupported operation
Debugging a Specific Layer
When a layer fails (e.g., CSS shows error), debug it individually:
# Re-test a specific layer with all executors via torq-gen-config
# (pass pytest -k filter through extra options after --)
torq-gen-config discover \
--model ./tests/testdata/onnx_models/squeezenet1.0-12.onnx \
-- -k "Conv_0" --recompute-cache
# Or using pytest directly
pytest tests/test_onnx_gen_config.py \
-v -k "squeezenet1.0-12_layer_Conv_0" \
--model-path=./tests/testdata/onnx_models/squeezenet1.0-12.onnx \
--recompute-cache
This runs the layer (Conv_0) with NSS, CSS, and Host separately to see detailed error output.
Important: Layer Tests are for Discovery Only
Layer tests (-k "_layer_") are designed for torq-gen-config only. They test which executor works for each operation but do NOT perform C++ executor assignment.
To see executor assignment in the IR dump, use subgraph test or full model test:
# Subgraph test - shows executor assignment for that subgraph
torq-gen-config discover \
--model model.onnx \
--subgraph-from=Conv_0 \
--subgraph-to=Conv_0
# Full model test - shows executor assignment for all operations
torq-gen-config run --model model.onnx --debug-ir=tmp
In the dumped IR:
// With executor assignment
linalg.conv_2d_nchw_fchw {...} {torq-executor = "nss"}
Complete Workflow:
First discovery (get initial results):
torq-gen-config discover --model model.onnx --skip-mode
Debug specific layers (optional - set
recommended_executor: nullto re-test):torq-gen-config edit --model model.onnx --layer Conv_0 --executor null # Then re-test via torq-gen-config: torq-gen-config discover --model model.onnx -- -k "Conv_0_nss" --recompute-cache # JSON automatically updated with new results
Verify executor assignment (use subgraph or full model - layer tests won’t show assignment):
# Subgraph test shows executor assignment in IR torq-gen-config discover \ --model model.onnx \ --subgraph-from=Conv_0 \ --subgraph-to=Conv_0 # Or full model test torq-gen-config run --model model.onnx --debug-ir=tmp
Example scenario - CSS error on Conv_0:
View the error in JSON:
torq-gen-config view \ torq_gen_config_squeezenet1.0-12.json Conv_conv1_1
Re-test that specific layer via torq-gen-config:
torq-gen-config discover --model model.onnx -- -k "Conv_0_css" --recompute-cache
Or using pytest directly:
pytest ... -k "squeezenet1.0-12_layer_Conv_0_css" --recompute-cache
If CSS consistently fails, use the
editcommand to switch to a different executor:torq-gen-config edit \ --model model.onnx \ --layer Conv_conv1_1 \ --executor nss
To test without any executor assignment (debug mode), set executor to
null:torq-gen-config edit \ --model model.onnx \ --layer Conv_conv1_1 \ --executor null
Subgraph Debugging
Subgraph mode tests a range of operations as a mini full-model. Use it to isolate problematic operations or debug specific model sections.
Find operation names:
python3 -c "
import onnx
model = onnx.load('./tests/testdata/onnx_models/squeezenet1.0-12.onnx')
for i, n in enumerate(model.graph.node):
print(f'{i}: {n.op_type}_{n.output[0]}')
"
Run subgraph discovery (nodes 10-16):
torq-gen-config discover \
--model ./tests/testdata/onnx_models/squeezenet1.0-12.onnx \
--subgraph-from=Conv_fire3/squeeze1x1_1 \
--subgraph-to=Concat_fire3/concat_1 \
--skip-mode
# Or using pytest directly
pytest tests/test_onnx_gen_config.py \
-v \
--model-path=./tests/testdata/onnx_models/squeezenet1.0-12.onnx \
--subgraph-from=Conv_fire3/squeeze1x1_1 \
--subgraph-to=Concat_fire3/concat_1 \
--skip-mode --recompute-cache
This creates torq_gen_config_squeezenet1.0-12_subgraph_10_16.json and runs layer discovery + full subgraph test.
Subgraph Options:
Option |
Description |
|---|---|
|
Start operation name |
|
End operation name |
|
Layer discovery only |
|
Full subgraph test only |
Full Model Issues
If full model fails but individual layers pass:
Check debug IR:
ls tmp/Verify assignments in viewer:
torq-gen-config view torq_gen_config_*.jsonRe-run discovery for problematic layers
Skipping Executors (Extra Debug Option)
If NSS or CSS crashes/hangs during discovery, skip them:
# Skip NSS only
torq-gen-config discover --model model.onnx --skip-executors=nss
# Skip both NSS and CSS (test only Host)
torq-gen-config discover --model model.onnx --skip-executors=nss,css
# Or using pytest directly
pytest ... --skip-executors=nss
pytest ... --skip-executors=nss,css
This helps identify if an operation works on at least one executor when others are unstable.
Important: How --skip-mode and JSON Cache Work
Understanding the interaction between --skip-mode, JSON file, and test execution:
1. --skip-mode Behavior
When --skip-mode is enabled:
First run: Tests each executor (NSS → CSS → Host) until one succeeds, then saves
"status": "success"to JSONSubsequent runs: Checks JSON file first - if a layer already has
"status": "success", the test is skipped entirely (pytest.skip)
This is designed for speeding up incremental discovery, not for re-testing.
2. Layer Test vs Full Model Test
Layer Test (-k "_layer_"):
Purpose: Discover which executor works for each operation
Test passes/fails based on comparison with reference results
JSON is updated with test results
Does NOT perform C++ executor assignment (layer MLIR has different line numbers)
Full Model Test (-k "_full_model"):
Purpose: Run the complete model with discovered assignments
If
recommended_executorexists and is not null in JSON, the C++ ExecutorAssignmentPass will assign that executorThe full model runs end-to-end
3. Debugging Specific Layers - Common Pitfall
Problem: You want to debug a layer and check its executor assignment, but:
Layer test shows “SKIPPED” even with
--recompute-cacheNo executor assignment happens in the dumped IR
The test seems to use cached results
Root Cause: --skip-mode reads the JSON file and skips tests for layers with "status": "success". The --recompute-cache only invalidates the ONNX/MLIR file cache, not the JSON test results.
Solution - To actually re-run and check executor assignment:
Option A: Remove
--skip-mode(recommended for debugging)# This will re-run all tests regardless of JSON status pytest ... -k "squeezenet1.0-12_layer_Conv_0" --recompute-cache # Note: WITHOUT --skip-mode
Option B: Set
recommended_executortonull{ "ops": { "Conv_conv1_1": { "recommended_executor": null, "executors": { "nss": {"status": "success"}, "css": {"status": "success"}, "host": {"status": "success"} } } } }
Then run with
--skip-mode- it will test all executors again.Option C: Delete the JSON file
rm torq_gen_config_*.json pytest ... -k "_layer_" --skip-mode --recompute-cache
4. Verifying Executor Assignment
To verify executor assignment in the IR:
Use subgraph or full model test (layer tests don’t show assignment):
# Subgraph test pytest ... --subgraph-from=Conv_0 --subgraph-to=Conv_0 -k "_full" --debug-ir=tmp # Or full model test pytest ... -k "_full_model" --debug-ir=tmp
Check the dumped IR in
tmp/- look fortorq-executorattributes:// Example: operation assigned to NSS linalg.conv_2d_nchw_fchw {...} {torq-executor = "nss"}
5. Summary Table: When to Use What
Scenario |
Skip Mode? |
Recompute Cache? |
Action on JSON |
|---|---|---|---|
First discovery |
Yes |
Yes |
None (will be created) |
Add more test data |
Yes |
No |
None (append mode) |
Re-test layer |
No |
Yes |
Set |
Force test specific executor |
No |
Yes |
Set |
Full model with new assignments |
N/A |
No |
Edit |
Key Takeaway: --skip-mode + existing JSON with "status": "success" = skipped tests. Remove skip mode or modify JSON to actually re-run tests.
5. Auto-Converting FP32 Models to BF16
TORQ NSS accelerator has limited FP32 support and requires BF16 (bfloat16) input for many operations. CSS and Host executors generally support FP32. The torq-gen-config framework provides automatic FP32 to BF16 conversion with accuracy validation.
Why Convert to BF16?
Scenario |
Action Required |
|---|---|
NSS executor fails with FP32 error |
Convert to BF16 |
Model weights are FP32 |
May need conversion for NSS compatibility |
Running on CSS/Host only |
FP32 usually works |
Note: BF16 conversion may introduce minor numerical differences. Always validate accuracy after conversion.
What is BF16?
BF16 is a 16-bit floating point format with:
1 sign bit
8 exponent bits (same as FP32)
7 mantissa bits (vs 23 for FP32)
Key characteristics:
Same dynamic range as FP32 (no overflow issues)
~50% memory bandwidth reduction
Truncation conversion is fast (just drop lower 16 bits)
Required for NSS accelerator on TORQ hardware
The Conversion Process
When --auto-convert-bf16 is enabled:
Weight Conversion: All FP32 weights/biases are converted to BF16
Type Annotation: Input/output/intermediate tensors are marked as BF16
Accuracy Validation: Errors are computed for each tensor
Accuracy Evaluation Method
The conversion accuracy is evaluated using bit-truncation comparison (see scripts/convert_onnx_to_bf16.py):
FP32 (32 bits) → BF16 (16 bits) → FP32 (for comparison)
Metrics computed per tensor:
max_error: Maximum absolute differencemean_error: Mean absolute differencermse: Root mean square errormax_rel_error: Maximum relative error
Interpretation guidelines:
Max Error |
Quality |
Usability |
|---|---|---|
< 0.01 |
Excellent |
Typical for BF16, safe for all use cases |
< 0.1 |
Good |
Acceptable for most inference tasks |
< 1.0 |
Fair |
May affect some sensitive layers |
>= 1.0 |
Poor |
Significant accuracy loss, review needed |
Inference-Level Accuracy Check (Optional)
Beyond weight-level checks, the conversion script can compare end-to-end inference:
python scripts/convert_onnx_to_bf16.py model.onnx model_bf16.onnx --compare-inference --num-samples 10
This runs both models with random inputs and compares outputs:
Runs
num_samples(default 5) random input comparisonsUses ONNX Runtime for both FP32 and BF16 inference
Reports per-sample and aggregate error statistics
Using BF16 with torq-gen-config
Basic usage:
torq-gen-config discover --model model.onnx --auto-convert-bf16 --skip-mode
# Or using pytest directly
pytest tests/test_onnx_gen_config.py \
-v -k "_layer_" \
--model-path=./model.onnx \
--auto-convert-bf16 \
--skip-mode --recompute-cache
Key points:
The conversion happens automatically before layer extraction
Cache is invalidated when
--auto-convert-bf16changes (via versioned fixtures)No manual pre-conversion needed - the framework handles everything
Batch Dimension Handling
The conversion script automatically fixes dynamic batch dimensions:
Converts symbolic dimensions (e.g., “batch”, “N”, “?”) to fixed size 1
Required for accurate inference comparison
Warning is printed for each modified input
Saving Converted Models
To save the BF16 model for external use:
torq-gen-config discover --model model.onnx --auto-convert-bf16 --save-bf16-model=/path/to/output.onnx
# Or using pytest directly
pytest ... --auto-convert-bf16 --save-bf16-model=/path/to/output.onnx
When to Use BF16 Conversion
Use BF16 when:
NSS executor reports FP32 is not supported
Running layer discovery with NSS executor enabled
Model has large weights (memory bandwidth constrained)
Avoid BF16 when:
Running on CSS/Host only (FP32 usually works)
Model has operations sensitive to numerical precision
Accuracy requirements are strict (< 0.01% error tolerance)
Note: If NSS fails with “FP32 not supported” or “data type not supported” errors, use --auto-convert-bf16. CSS and Host executors typically handle FP32 without conversion.
6. ONNX to MLIR Mapping Mechanism
This section explains how torq-gen-config maps ONNX operations to their corresponding line numbers in the MLIR generated by torch-mlir. This mapping is essential for the C++ compiler to assign the correct executor to each operation.
torch-mlir Import Guarantees
From the torch-mlir architecture documentation:
“The torch dialect is almost entirely in 1:1 correspondence with the JIT IR – this allows the importer to be extremely small”
The ONNX to torch-mlir import process provides these key guarantees:
Sequential Import: torch-mlir’s
onnx_importer.pyiterates through ONNX nodes sequentially:def import_all(self, func=True): """Imports all nodes topologically.""" for node in self._gi.graph_proto.node: # Sequential iteration self.import_node(node) # One ONNX node → one MLIR op
No Fusion During Import: Each ONNX node becomes exactly one
torch.operatorin MLIR (except for special handlers like Constant nodes)Topological Order: Both ONNX and torch-mlir rely on topological ordering:
“ONNX requires that graphs be sorted topologically and free of cycles, so we don’t take any special steps to order them for dominance.”
Position-Based Matching
The mapping uses position (index) as the matching key:
Source |
What We Track |
|---|---|
ONNX |
|
MLIR |
Line number of each |
Why Position-Based is Reliable:
Deterministic: Both use topological ordering
Verifiable: Can check that op types match at each position
Simple: No complex heuristics or fuzzy matching
Example Mapping:
ONNX Node[0]: Conv → MLIR Line 42:12
ONNX Node[1]: Relu → MLIR Line 43:12
ONNX Node[2]: Conv → MLIR Line 44:12
The JSON stores this mapping:
{
"ops": {
"Conv_output_0": {
"recommended_executor": "nss",
"_node_index": 0, // Position in ONNX
"mlir_location": "42:12" // Line in MLIR
}
}
}
Verification
torq-gen-config automatically verifies the mapping during test generation:
Count check: ONNX and MLIR have the same number of non-Constant ops
Type check: Op types match at each position
Warning output if verification fails
You can manually verify any model:
python scripts/verify_onnx_import_order.py --model-path=./model.onnx
If you see warnings like COUNT MISMATCH or OP TYPE MISMATCHES during discovery, the torch-mlir import behavior may have changed.
7. Command Reference
torq-gen-config CLI
The recommended way to interact with the discovery system.
discover — Run executor discovery
Option |
Description |
|---|---|
|
Path to ONNX model (required) |
|
Directory for generated JSON (default: current directory) |
|
Path to |
|
Stop after first success per layer |
|
Comma-separated list to skip (e.g., |
|
Convert FP32 model to BF16 |
|
Save converted BF16 model to path |
|
Start op name for subgraph |
|
End op name for subgraph |
|
Collect runtime timing data |
|
Number of runtime runs for timing average |
|
Recommend fastest executor based on timing |
|
Detect duplicate layers and copy results |
|
Redirect discovery output to log file |
# Basic discovery
torq-gen-config discover --model model.onnx
# With skip mode and BF16 conversion
torq-gen-config discover --model model.onnx --skip-mode --auto-convert-bf16
# Timing-based recommendation
torq-gen-config discover --model model.onnx --collect-timing --timing-runs=5 --recommend-by-timing
# Pass extra pytest flags (use '--' before flags starting with '-')
torq-gen-config discover --model model.onnx --skip-mode -- -s -v --tb=short
run — Run full model test
Option |
Description |
|---|---|
|
Path to ONNX model (required) |
|
Directory where config JSON is located |
|
Path to |
|
Convert FP32 model to BF16 |
|
Dump IR directory for debugging (default: |
|
Force recompute cached fixtures |
|
Redirect output to log file |
# Run full model with discovered assignments
torq-gen-config run --model model.onnx
# With debug IR dump
torq-gen-config run --model model.onnx --debug-ir=tmp
# Pass extra pytest flags (use '--' before flags starting with '-')
torq-gen-config run --model model.onnx -- -s -v
Note: run accepts either the report JSON or the compiler JSON. If the report JSON exists, run regenerates the compiler JSON from it before compiling. If only the compiler JSON exists, the full model test uses it directly.
view — View executor config
Argument |
Description |
|---|---|
|
Path to report or compiler JSON (optional; |
|
Path to ONNX model (auto-resolves JSON from model name) |
|
Directory where config JSON is located (default: current directory) |
|
Optional layer ID for detailed view |
# Using --model shortcut
torq-gen-config view --model model.onnx --output-dir results/
# View summary from report JSON path
torq-gen-config view torq_gen_config_model.json
# View details for one layer
torq-gen-config view torq_gen_config_model.json Conv_conv1_1
# View compiler JSON (auto-detected)
torq-gen-config view torq_gen_config_model_compiler.json
# View layer details with --model
torq-gen-config view --model model.onnx --output-dir results/ Conv_conv1_1
edit — Edit executor assignments
Option |
Description |
|---|---|
|
Path to report JSON (optional; |
|
Path to ONNX model (auto-resolves JSON from model name) |
|
Directory where config JSON is located |
|
Layer ID to edit. Supports exact, substring, fnmatch ( |
|
Set recommended executor ( |
|
Set |
|
Set |
|
List available layers and exit |
# Edit single layer
torq-gen-config edit --model model.onnx --layer Conv_0 --executor nss
# Batch edit all Conv layers
torq-gen-config edit --model model.onnx --layer "Conv_*" --executor nss
# Edit every layer
torq-gen-config edit --model model.onnx --layer ALL --executor css
# Update tolerance
torq-gen-config edit --model model.onnx --layer Conv_0 --tolerance-avg 0.1
# List layers
torq-gen-config edit --model model.onnx --list
torq-gen-config edit --model model.onnx --list conv
Advanced: raw pytest options
For advanced use cases (e.g., single-layer re-testing, custom pytest flags), you can invoke pytest directly. The torq-gen-config commands above are the recommended approach for normal workflows.
Option |
Description |
|---|---|
|
Path to ONNX model |
|
Run layer discovery |
|
Run full model test |
|
Stop after first success per layer |
|
Force recompute (ignore cache) |
|
Dump IR for debugging |
|
Skip specific executors |
|
Convert FP32 to BF16 |
|
Subgraph start |
|
Subgraph end |
|
Collect compile and runtime timing data |
|
Number of runtime runs for timing average (default: 1) |
|
Recommend fastest executor based on timing data |
|
Redirect all output to log file (pytest name; torq-gen-config uses |
|
Detect duplicate layers and copy results |
# Layer discovery with skip mode
pytest ... --model-path=model.onnx -k "_layer_" --skip-mode
# Full model with debug output
pytest ... --model-path=model.onnx -k "_full_model" --debug-ir=tmp
# Subgraph debugging
pytest ... --model-path=model.onnx --subgraph-from=StartOp --subgraph-to=EndOp
# Skip crashing executors
pytest ... --model-path=model.onnx --skip-executors=nss -k "_layer_"
# Timing-based executor recommendation
pytest ... --model-path=model.onnx -k "_layer_" --collect-timing --timing-runs=5 --recommend-by-timing
# Redirect output to log file
pytest ... --model-path=model.onnx -k "_layer_" -v -s \
--gen-config-log-file=discovery.log
# Skip duplicate layers
pytest ... --model-path=model.onnx -k "_layer_" --dedup-layers --skip-mode