Larp
Larp is a collection of instruction-tuned generative models designed for practical applications requiring intelligent planning and tool utilization. The models have been trained using Supervised Fine-Tuning (SFT) with a focus on three key aspects:
(1) Three-Stage Conversational Flow for Plan Generation Assistance The training data and learning process were carefully designed to support a structured three-stage conversational flow: plan proposal, user intent clarification through clarification requests, and plan generation. This approach ensures the model can effectively understand user requirements and develop appropriate action plans.
(2) Robust JSON Output and Tool-Use for Commercial Service Integration The models have been trained to be highly robust in generating various JSON output instructions and executing tool-use operations, making them well-suited for integration with commercial services and applications that require structured data formats and automated task execution.
(3) Balanced Knowledge Preservation While specializing in planning and tool utilization, the models maintain the inherent knowledge and reasoning capabilities of the base model. To achieve this, the models were trained on diverse tool-use and reasoning task data alongside the specialized training.
Model Overview
This model is a fine-tuned version of Qwen/Qwen2.5-32B.
Quickstart
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "aistrategyndev/Larp-Qwen32
Processing Long Texts
Qwen3 natively supports context lengths of up to 32,768 tokens. For conversations where the total length (including both input and output) significantly exceeds this limit, we recommend using RoPE scaling techniques to handle long texts effectively. We have validated the model's performance on context lengths of up to 131,072 tokens using the YaRN method.
YaRN is currently supported by several inference frameworks, e.g., transformers and llama.cpp for local use, vllm and sglang for deployment. In general, there are two approaches to enabling YaRN for supported frameworks:
Modifying the model files: In the config.json file, add the rope_scaling fields:
{
...,
"rope_scaling": {
"rope_type": "yarn",
"factor": 4.0,
"original_max_position_embeddings": 32768
}
}
For llama.cpp, you need to regenerate the GGUF file after the modification.
Passing command line arguments:
For vllm, you can use
vllm serve ... --rope-scaling '{"rope_type":"yarn","factor":4.0,"original_max_position_embeddings":32768}' --max-model-len 131072
For sglang, you can use
python -m sglang.launch_server ... --json-model-override-args '{"rope_scaling":{"rope_type":"yarn","factor":4.0,"original_max_position_embeddings":32768}}'
For llama-server from llama.cpp, you can use
llama-server ... --rope-scaling yarn --rope-scale 4 --yarn-orig-ctx 32768
Performance
The following shows performance comparison results for planning tasks compared to GPT-4o.
| Planning Tasks | gpt-4o | Larp-Qwen14b-250916 | Larp-Qwen32b-250916 |
|---|---|---|---|
| Aster (final pass rate) | 94% | 98% | 99% |
| Travel Planner (final pass rate) | 0.56% | 5.00% | 9.44% |
Aster is a test case that evaluates results generated through simulation using a 3-stage workflow (plan proposal, clarification, and plan generation) that operates in a tool-use manner. Travel Planner is a dataset that evaluates results by matching them against ground truth data generated based on user travel request constraints.Travel Planner evaluates results by matching them against ground truth data created based on user travel request constraints.
The following are performance comparison results for reasoning tasks against the base model.
| Tasks | Qwen3-14B | Qwen2.5-32B | Larp-Qwen14b-250916 | Larp-Qwen32b-250916 |
|---|---|---|---|---|
| aime24 | 80% | 20% | 26% | 40% |
| aime24_sky | 76% | 20% | 23% | 33% |
| math500 | 94% | 72% | 81% | 87% |
| gpqa_diamond | 60% | 38% | 47% | 56% |
Contact
- Downloads last month
- 2
Model tree for aistrategyndev/Larp-Qwen32B-250916
Base model
Qwen/Qwen2.5-32B