Adding a new compute optimization

This section explains how to add a new optimization that rewrites a given linalg-on-tensors pattern into a more efficient version of the pattern. The rewrite transforms a set of linalg operations to a combination of torq_hl kernels and/or linalg operations that can be efficiently executed on the hardware.

To implement such an optimization it is recommended to follow two steps:

Add a test case for the pattern in test/testdata/linalg_ops. To create the test case it is possible to use as reference an existing mlir files in the same directory, look for mlir files in the third_party/iree directory, or run the compiler on some input model as follows:
```
$ {BUILD_DIR}/third_party/iree/tools/torq-compile --compile-to=input mymodel.mlir -o mymodel_linalg.mlir
```
Make sure to edit the file so that func.func and func.return operations are used instead of the util dialect variants.
Implement and register a new rewrite pattern in the LinalgToTorqHLConversionPass.
1. Implement a new class in compiler/torq/Conversions/LinalgToTorqHL/Patterns.cpp
2. Register the class in the populateLinalgToTorqHLPrePatterns function in the same file:

More information on how pattern rewriting works can be found in the MLIR documentation.

To test the pattern in isolation use the following command:

$ {BUILD_DIR}/third_party/iree/tools/iree-opt \
   --pass-pipeline='builtin.module(func.func(torq-linalg-to-torqhl-pre-conversion{enable-patterns=MyOptimizationPattern}))' \
   --debug-only=dialect-conversion \
   mytest.mlir

The command executes the conversion on the input test mlir file and prints on stdout the logs of the pattern application process and the resulting mlir after application.

Notice that in the same optimization pass other patterns may apply before and after the new pattern. To debug the overall pattern application process that happens concurrently use the following command line:

$ {BUILD_DIR}/third_party/iree/tools/iree-opt \
   --pass-pipeline='builtin.module(func.func(torq-linalg-to-torqhl-pre-conversion))' \
   --debug-only=dialect-conversion \
   mytest.mlir

To dump the state of the IR at the moment the pattern is applied use the following code at the beginning of the matchAndRewrite function:

The debug log will show the order in which patterns are applied. To change the order in which patterns that match the same operation should be applied you can change the benefit argument of the constructor.

The compiler will further process the output of this pass with other passes. To test the pattern within the whole compiler pipeline use the following command:

$ rm -rf tmp/ir && mkdir -p tmp/ir && pytest tests/test_linalg.py -k "test_with_torq[mytest.mlir]" -s \
   --extra-torq-compiler-options="--mlir-print-ir-before=torq-linalg-to-torqhl-pre-conversion --mlir-print-ir-after=torq-linalg-to-torqhl-pre-conversion --mlir-print-ir-tree-dir=$(pwd)/tmp/ir"

The compiler will create in tmp/ir two dump files, one before and one after applying the conversion.