--- license: apache-2.0 base_model: - Qwen/Qwen2.5-32B-Instruct --- # LIMO: Less Is More for Reasoning 🚀 This is the **updated version (v2)** of the LIMO model, corresponding to the latest paper version as of July 30, 2025. ## Model Information | Model | Backbone | Size | |-------|----------|------| | LIMO-v2 | [Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) | 32B | ## Previous Version If you need the original LIMO model (corresponding to the initial paper version), you can access it at: - **LIMO v1**: [`GAIR/LIMO`](https://huggingface.co/GAIR/LIMO) ## Quick Start Our model is fine-tuned on [Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) and is compatible with most mainstream frameworks like [HF Transformers](https://github.com/huggingface/transformers), [VLLM](https://github.com/vllm-project/vllm), [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) and etc. ### Using HF Transformers ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch # Initialize model and tokenizer model = AutoModelForCausalLM.from_pretrained( "GAIR/LIMO-v2", torch_dtype="auto", trust_remote_code=True, device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained("GAIR/LIMO-v2", trust_remote_code=True) # Prepare input messages messages = [ {"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{}."}, {"role": "user", "content": "What is the result of 1+1?"} ] # Format input using chat template text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) # Tokenize input inputs = tokenizer(text, return_tensors="pt").to(model.device) # Generate response outputs = model.generate( **inputs, max_new_tokens=32768, temperature=0.7, top_p=0.95, do_sample=True ) # Decode and print response response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True) print(response) ``` ### Using VLLM ```python from vllm import LLM, SamplingParams from transformers import AutoTokenizer # Initialize the model llm = LLM( model="GAIR/LIMO-v2", tensor_parallel_size=4, # adjust based on available GPUs trust_remote_code=True, swap_space=60, gpu_memory_utilization=0.96, ) # Prepare input messages messages = [ {"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{}."}, {"role": "user", "content": "What is the result of 1+1?"} ] # Setup tokenizer tokenizer = AutoTokenizer.from_pretrained("GAIR/LIMO-v2", trust_remote_code=True) text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) # Configure generation parameters sampling_params = SamplingParams( temperature=0.7, max_tokens=32768, top_p=0.95, ) # Generate response output = llm.generate(text, sampling_params) print(output[0].outputs[0].text) ``` ## Citation ```bibtex @misc{ye2025limoreasoning, title={LIMO: Less is More for Reasoning}, author={Yixin Ye and Zhen Huang and Yang Xiao and Ethan Chern and Shijie Xia and Pengfei Liu}, year={2025}, eprint={2502.03387}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2502.03387}, } ``` For more details and training code, please visit our [GitHub repository](https://github.com/GAIR-NLP/LIMO).