You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Larp

Larp is a collection of instruction-tuned generative models designed for practical applications requiring intelligent planning and tool utilization. The models have been trained using Supervised Fine-Tuning (SFT) with a focus on three key aspects:

(1) Three-Stage Conversational Flow for Plan Generation Assistance The training data and learning process were carefully designed to support a structured three-stage conversational flow: plan proposal, user intent clarification through clarification requests, and plan generation. This approach ensures the model can effectively understand user requirements and develop appropriate action plans.

(2) Robust JSON Output and Tool-Use for Commercial Service Integration The models have been trained to be highly robust in generating various JSON output instructions and executing tool-use operations, making them well-suited for integration with commercial services and applications that require structured data formats and automated task execution.

(3) Balanced Knowledge Preservation While specializing in planning and tool utilization, the models maintain the inherent knowledge and reasoning capabilities of the base model. To achieve this, the models were trained on diverse tool-use and reasoning task data alongside the specialized training.

Model Overview

This model is a fine-tuned version of Qwen/Qwen2.5-32B.

Quickstart

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "aistrategyndev/Larp-Qwen32

Processing Long Texts

Qwen3 natively supports context lengths of up to 32,768 tokens. For conversations where the total length (including both input and output) significantly exceeds this limit, we recommend using RoPE scaling techniques to handle long texts effectively. We have validated the model's performance on context lengths of up to 131,072 tokens using the YaRN method.

YaRN is currently supported by several inference frameworks, e.g., transformers and llama.cpp for local use, vllm and sglang for deployment. In general, there are two approaches to enabling YaRN for supported frameworks:

Modifying the model files: In the config.json file, add the rope_scaling fields:

{
    ...,
    "rope_scaling": {
        "rope_type": "yarn",
        "factor": 4.0,
        "original_max_position_embeddings": 32768
    }
}

For llama.cpp, you need to regenerate the GGUF file after the modification.

Passing command line arguments:

For vllm, you can use

vllm serve ... --rope-scaling '{"rope_type":"yarn","factor":4.0,"original_max_position_embeddings":32768}' --max-model-len 131072

For sglang, you can use

python -m sglang.launch_server ... --json-model-override-args '{"rope_scaling":{"rope_type":"yarn","factor":4.0,"original_max_position_embeddings":32768}}'

For llama-server from llama.cpp, you can use

llama-server ... --rope-scaling yarn --rope-scale 4 --yarn-orig-ctx 32768

Performance

The following shows performance comparison results for planning tasks compared to GPT-4o.

Planning Tasks	gpt-4o	Larp-Qwen14b-250916	Larp-Qwen32b-250916
Aster (final pass rate)	94%	98%	99%
Travel Planner (final pass rate)	0.56%	5.00%	9.44%

Aster is a test case that evaluates results generated through simulation using a 3-stage workflow (plan proposal, clarification, and plan generation) that operates in a tool-use manner. Travel Planner is a dataset that evaluates results by matching them against ground truth data generated based on user travel request constraints.Travel Planner evaluates results by matching them against ground truth data created based on user travel request constraints.

The following are performance comparison results for reasoning tasks against the base model.

Tasks	Qwen3-14B	Qwen2.5-32B	Larp-Qwen14b-250916	Larp-Qwen32b-250916
aime24	80%	20%	26%	40%
aime24_sky	76%	20%	23%	33%
math500	94%	72%	81%	87%
gpqa_diamond	60%	38%	47%	56%

Contact

[email protected]

Downloads last month: 2

Safetensors

Model size

677k params

Tensor type

BF16

Model tree for aistrategyndev/Larp-Qwen32B-250916

Base model

Qwen/Qwen2.5-32B

Finetuned

(95)

this model