You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Larp

Larp is a collection of instruction-tuned generative models designed for practical applications requiring intelligent planning and tool utilization. The models have been trained using Supervised Fine-Tuning (SFT) with a focus on three key aspects:

(1) Three-Stage Conversational Flow for Plan Generation Assistance The training data and learning process were carefully designed to support a structured three-stage conversational flow: plan proposal, user intent clarification through clarification requests, and plan generation. This approach ensures the model can effectively understand user requirements and develop appropriate action plans.

(2) Robust JSON Output and Tool-Use for Commercial Service Integration The models have been trained to be highly robust in generating various JSON output instructions and executing tool-use operations, making them well-suited for integration with commercial services and applications that require structured data formats and automated task execution.

(3) Balanced Knowledge Preservation While specializing in planning and tool utilization, the models maintain the inherent knowledge and reasoning capabilities of the base model. To achieve this, the models were trained on diverse tool-use and reasoning task data alongside the specialized training.

Model Overview

This model is a fine-tuned version of Qwen/Qwen2.5-32B.

Quickstart

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "aistrategyndev/Larp-Qwen32

Processing Long Texts

Qwen3 natively supports context lengths of up to 32,768 tokens. For conversations where the total length (including both input and output) significantly exceeds this limit, we recommend using RoPE scaling techniques to handle long texts effectively. We have validated the model's performance on context lengths of up to 131,072 tokens using the YaRN method.

YaRN is currently supported by several inference frameworks, e.g., transformers and llama.cpp for local use, vllm and sglang for deployment. In general, there are two approaches to enabling YaRN for supported frameworks:

Modifying the model files: In the config.json file, add the rope_scaling fields:

{
    ...,
    "rope_scaling": {
        "rope_type": "yarn",
        "factor": 4.0,
        "original_max_position_embeddings": 32768
    }
}

For llama.cpp, you need to regenerate the GGUF file after the modification.

Passing command line arguments:

For vllm, you can use

vllm serve ... --rope-scaling '{"rope_type":"yarn","factor":4.0,"original_max_position_embeddings":32768}' --max-model-len 131072  

For sglang, you can use

python -m sglang.launch_server ... --json-model-override-args '{"rope_scaling":{"rope_type":"yarn","factor":4.0,"original_max_position_embeddings":32768}}'

For llama-server from llama.cpp, you can use

llama-server ... --rope-scaling yarn --rope-scale 4 --yarn-orig-ctx 32768

Performance

The following shows performance comparison results for planning tasks compared to GPT-4o.

Planning Tasks gpt-4o Larp-Qwen14b-250916 Larp-Qwen32b-250916
Aster (final pass rate) 94% 98% 99%
Travel Planner (final pass rate) 0.56% 5.00% 9.44%

Aster is a test case that evaluates results generated through simulation using a 3-stage workflow (plan proposal, clarification, and plan generation) that operates in a tool-use manner. Travel Planner is a dataset that evaluates results by matching them against ground truth data generated based on user travel request constraints.Travel Planner evaluates results by matching them against ground truth data created based on user travel request constraints.

The following are performance comparison results for reasoning tasks against the base model.

Tasks Qwen3-14B Qwen2.5-32B Larp-Qwen14b-250916 Larp-Qwen32b-250916
aime24 80% 20% 26% 40%
aime24_sky 76% 20% 23% 33%
math500 94% 72% 81% 87%
gpqa_diamond 60% 38% 47% 56%

Contact

[email protected]

Downloads last month
2
Safetensors
Model size
677k params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for aistrategyndev/Larp-Qwen32B-250916

Base model

Qwen/Qwen2.5-32B
Finetuned
(95)
this model