# Benchmark Runner
This script benchmarks **forward/backward performance** of several operations (`rms`, `add_rms`, `poly`, `mul_poly`).
Results can be saved as **CSV files** or **plots**.
> **Note**
  
> To run the benchmarks, you must select the appropriate Torch version along with the corresponding CUDA/ROCm build from within the `build` directory.  
>
> **Example:**  
>
> ```bash
> export PYTHONPATH=$PYTHONPATH:/activation/build/torch27-cxx11-cu128-x86_64-linux
> ```
## Usage
```bash
python main.py --case  [--plot] [--save-path ]
```
- `--case` (required): one of `rms`, `add_rms`, `poly`, `mul_poly`
- `--plot`: save plots instead of CSVs
- `--save-path`: output directory (default: `./configs/`)
## Examples
```bash
python main.py --case add_rms --save-path ./results/
python main.py --case poly --plot --save-path ./plots/
```
## Output
- CSV: `-fwd-perf.csv`, `-bwd-perf.csv`
- Plots: `plot_-fwd-perf.png`, `plot_-bwd-perf.png`