|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: |
|
|
- Qwen/Qwen2.5-7B-Instruct |
|
|
--- |
|
|
|
|
|
# AutoL2S-7B |
|
|
|
|
|
This is the official model repository for **AutoL2S-7B**, a model fine-tuned for efficient reasoning based on [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct/tree/main). |
|
|
|
|
|
## ๐ก Overview |
|
|
|
|
|
AutoL2S enables automatically switching between short and long reasoning paths based on input complexity. |
|
|
Auto Long-Short Reasoning (AutoL2S), a dynamic and model-agnostic framework that enables LLMs to dynamically compress their generated reasoning |
|
|
path based on the complexity of the reasoning question. AutoL2S enables a learned paradigm, in which LLMs themselves can decide when longer reasoning is necessary and when shorter reasoning |
|
|
suffices, by training on data annotated with our proposed method, which includes both long and short CoT paths and a special \<EASY\> token (\<specialLong\> in the implementation). We then use <EASY> token to indicate when the model can |
|
|
skip generating lengthy CoT reasoning. This proposed annotation strategy can enhance the LLMsโ ability to generate shorter CoT reasoning paths with improved quality after training. |
|
|
|
|
|
This repository contains: |
|
|
|
|
|
- Model weights |
|
|
- Configuration files |
|
|
- necessary scripts in the `examples/` directory |
|
|
|
|
|
<p align="left"> |
|
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/66f9bb2dd5575ad6914756ce/dVpIjeIaU8Hv1M5z5VWYS.png" width="40%" style="display:inline-block; margin-right: 10px;" /> |
|
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/66f9bb2dd5575ad6914756ce/qxHTE-ZGTpxVjmkIX6Fk-.png" width="40%" style="display:inline-block;" /> |
|
|
</p> |
|
|
|
|
|
--- |
|
|
## ๐งฉ Dependencies |
|
|
We recommend using the model with [vLLM](https://github.com/vllm-project/vllm). |
|
|
The code has been tested with: |
|
|
|
|
|
``` |
|
|
vLLM == 0.6.2 |
|
|
``` |
|
|
|
|
|
--- |
|
|
## ๐ How to Use |
|
|
|
|
|
Run the inference example: |
|
|
|
|
|
```bash |
|
|
cd examples |
|
|
python run_inference.py |
|
|
``` |
|
|
|
|
|
Alternatively, **please download examples/prefixLLM.py and examples/template.py from this repository and put them in your working dir**. |
|
|
|
|
|
```python |
|
|
from vllm import SamplingParams |
|
|
from prefixLLM import PrefixLLM |
|
|
from template import SYSTEM_PROMPT, SHORT_TRIGGER |
|
|
|
|
|
llm = PrefixLLM(model="amandaa/AutoL2S-7b") |
|
|
max_tokens, temp = 32768, 0.7 |
|
|
sampling_params_route = SamplingParams(max_tokens=max_tokens, temperature=temp, stop=["<specialLong>"], include_stop_str_in_output=True) |
|
|
sampling_params_force_think = SamplingParams(max_tokens=max_tokens, temperature=temp) |
|
|
|
|
|
question = "Convert the point $(0,3)$ in rectangular coordinates to polar coordinates. Enter your answer in the form $(r,\\theta),$ where $r > 0$ and $0 \\le \\theta < 2 \\pi.$" |
|
|
messages = [ |
|
|
{"role": "system", "content": SYSTEM_PROMPT}, |
|
|
{"role": "user", "content": question} |
|
|
] |
|
|
responses = llm.route_chat(messages=messages, sampling_params_route=sampling_params_route, sampling_params_force_think=sampling_params_force_think, use_tqdm=True, trigger_word=SHORT_TRIGGER) |
|
|
|
|
|
print(SHORT_TRIGGER + responses[0].outputs[0].text) |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
|
|
|
## ๐ Citation |
|
|
|
|
|
If you use this model in your work, please consider citing: |
|
|
|
|
|
```bibtex |
|
|
@article{luo2025autol2s, |
|
|
title={AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models}, |
|
|
author={Luo, Feng and Chuang, Yu-Neng and Wang, Guanchu and Le, Hoang Anh Duy and Zhong, Shaochen and Liu, Hongyi and Yuan, Jiayi and Sui, Yang and Braverman, Vladimir and Chaudhary, Vipin and others}, |
|
|
journal={arXiv preprint arXiv:2505.22662}, |
|
|
year={2025} |
|
|
} |
|
|
``` |