File size: 3,474 Bytes

---
license: apache-2.0
base_model:
- Qwen/Qwen2.5-7B-Instruct
---

# AutoL2S-7B

This is the official model repository for **AutoL2S-7B**, a model fine-tuned for efficient reasoning based on [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct/tree/main).

## 💡 Overview

AutoL2S enables automatically switching between short and long reasoning paths based on input complexity. 
Auto Long-Short Reasoning (AutoL2S), a dynamic and model-agnostic framework that enables LLMs to dynamically compress their generated reasoning
path based on the complexity of the reasoning question. AutoL2S enables a learned paradigm, in which LLMs themselves can decide when longer reasoning is necessary and when shorter reasoning
suffices, by training on data annotated with our proposed method, which includes both long and short CoT paths and a special \<EASY\> token (\<specialLong\> in the implementation). We then use <EASY> token to indicate when the model can
skip generating lengthy CoT reasoning. This proposed annotation strategy can enhance the LLMs’ ability to generate shorter CoT reasoning paths with improved quality after training. 

This repository contains:

- Model weights
- Configuration files
- necessary scripts in the `examples/` directory

<p align="left">
  <img src="https://cdn-uploads.huggingface.co/production/uploads/66f9bb2dd5575ad6914756ce/dVpIjeIaU8Hv1M5z5VWYS.png" width="40%" style="display:inline-block; margin-right: 10px;" />
  <img src="https://cdn-uploads.huggingface.co/production/uploads/66f9bb2dd5575ad6914756ce/qxHTE-ZGTpxVjmkIX6Fk-.png" width="40%" style="display:inline-block;" />
</p>

---
## 🧩 Dependencies
We recommend using the model with [vLLM](https://github.com/vllm-project/vllm).  
The code has been tested with:

```
vLLM == 0.6.2
```

---
## 🚀 How to Use

Run the inference example:

```bash
cd examples
python run_inference.py
```

Alternatively, **please download examples/prefixLLM.py and examples/template.py from this repository and put them in your working dir**.

```python
from vllm import SamplingParams
from prefixLLM import PrefixLLM
from template import SYSTEM_PROMPT, SHORT_TRIGGER

llm = PrefixLLM(model="amandaa/AutoL2S-7b")
max_tokens, temp = 32768, 0.7
sampling_params_route = SamplingParams(max_tokens=max_tokens, temperature=temp, stop=["<specialLong>"], include_stop_str_in_output=True)
sampling_params_force_think = SamplingParams(max_tokens=max_tokens, temperature=temp)

question = "Convert the point $(0,3)$ in rectangular coordinates to polar coordinates.  Enter your answer in the form $(r,\\theta),$ where $r > 0$ and $0 \\le \\theta < 2 \\pi.$"
messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": question}
]
responses = llm.route_chat(messages=messages, sampling_params_route=sampling_params_route, sampling_params_force_think=sampling_params_force_think, use_tqdm=True, trigger_word=SHORT_TRIGGER)

print(SHORT_TRIGGER + responses[0].outputs[0].text)
```

---


## 🔍 Citation

If you use this model in your work, please consider citing:

```bibtex
@article{luo2025autol2s,
  title={AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models},
  author={Luo, Feng and Chuang, Yu-Neng and Wang, Guanchu and Le, Hoang Anh Duy and Zhong, Shaochen and Liu, Hongyi and Yuan, Jiayi and Sui, Yang and Braverman, Vladimir and Chaudhary, Vipin and others},
  journal={arXiv preprint arXiv:2505.22662},
  year={2025}
}
```