AIREV Qwen-0.8B-AgentJSON
World's first sub-1B parameter model with functional tool calling capability.
Built by AIREV for the OnDemand Agentic AI Platform.
Key Results
| Metric | Score |
|---|---|
| JSON Validity (Easy queries) | 92% |
| Correct Plugin Selection | 94% |
| Exact Plugin ID Match (Easy) | 81% |
| Production Composite Score | 75.6% |
| Parameters | 752M |
| Quantized Size | ~400MB |
| Edge Inference Speed | ~30 tok/s |
What This Model Does
Generates structured JSON execution plans for tool/plugin orchestration. Given a user request and available tools, it produces a valid JSON object specifying which tools to call, with what parameters, and in what order.
Evaluation Results
Production Plugin Eval (Real OnDemand Plugins, 50 samples)
| Metric | Model | Base Qwen 0.8B | Improvement |
|---|---|---|---|
| Valid JSON | 94.0% | 18.0% | +76% |
| Correct Plugin IDs | 44.0% | 0.0% | +44% |
| Params Correct Type | 94.0% | 0.0% | +94% |
| Param Keys Match | 66.0% | 0.0% | +66% |
| Real Production IDs | 94.0% | 0.0% | +94% |
| Dependencies Present | 88.0% | 0.0% | +88% |
| Composite | 75.6% | 4.8% | +70.8% |
By Query Complexity
| Difficulty | JSON Valid | Real Plugin IDs | Exact Match |
|---|---|---|---|
| Easy (1 tool) | 92% | 92% | 81% |
| Medium (2-3 tools) | 96% | 96% | 4% |
Training Pipeline
This model was trained using a novel multi-stage approach developed by AIREV:
Stage 1: Supervised Fine-Tuning (SFT)
- Base model: Qwen 3.5-0.8B
- 47K+ curated samples with reasoning traces and structured JSON outputs
- Real production plugin schemas from the OnDemand platform (2,176 plugins)
- LLM-evaluated data quality filtering (score >= 8/10)
- Full fine-tune (not LoRA) — research shows full FT outperforms at sub-1B scale
Stage 2: Progressive Curriculum GRPO
- Group Relative Policy Optimization with a novel 4-phase progressive reward curriculum
- The reward progressively increases in difficulty across training:
- Phase 1: JSON structural validity
- Phase 2: Required field presence
- Phase 3: Correct tool selection (prompt-grounded)
- Phase 4: Parameter quality and completeness
- Innovation: Noise injection for zero-variance groups prevents common GRPO failure modes
- Prompt-aware reward function verifies selected tools against available options in the input
Why This Matters
- Sub-1B tool calling is novel — no published model under 3B demonstrates functional tool calling
- Progressive Curriculum GRPO — a new approach enabling skill stacking in small models
- Edge deployment viable — 400MB quantized, 30 tok/s on Snapdragon 8 Elite, <3s response time
Model Specs
| Parameter | Value |
|---|---|
| Base Model | Qwen 3.5-0.8B (752M params) |
| Training Data | 47,400 samples (real production plugins) |
| SFT Epochs | 3 |
| GRPO Steps | 1,250 |
| Precision | bf16 |
| Hardware | NVIDIA H100 80GB |
| Attention | SDPA |
Intended Use
- Primary: On-device agentic AI for tool orchestration on edge devices (AR glasses, mobile, IoT)
- Platform: Built for the OnDemand Agentic AI Platform (3,000+ tools)
- Best for: 1-2 step tool calling queries
- Not recommended for: Complex multi-step workflows (5+ tools)
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained("airev-ae/Qwen-0.8B-AgentJSON", torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained("airev-ae/Qwen-0.8B-AgentJSON")
messages = [
{"role": "system", "content": "You are an AI agent orchestrator. Generate a JSON execution plan."},
{"role": "user", "content": "Search for the latest AI news"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=512, temperature=0.8, do_sample=True)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Limitations
- Best accuracy on 1-2 tool queries; degrades on 3+ tool orchestration
- Trained on OnDemand plugin schema format
- Recommended: use with JSON validation at inference for production
- Temperature 0.7-0.8 produces best results
Citation
@misc{airev2026agentjson,
title={AIREV Qwen-0.8B-AgentJSON: Sub-1B Tool Calling via Progressive Curriculum GRPO},
author={AIREV FZ-LLC},
year={2026},
url={https://huggingface.co/airev-ae/Qwen-0.8B-AgentJSON}
}
Built by AIREV | OnDemand Platform | Abu Dhabi, UAE
Trained with Progressive Curriculum GRPO — a novel approach for sub-1B structured output generation.
- Downloads last month
- 628