AIREV Qwen-0.8B-AgentJSON

World's first sub-1B parameter model with functional tool calling capability.

Built by AIREV for the OnDemand Agentic AI Platform.

Key Results

Metric	Score
JSON Validity (Easy queries)	92%
Correct Plugin Selection	94%
Exact Plugin ID Match (Easy)	81%
Production Composite Score	75.6%
Parameters	752M
Quantized Size	~400MB
Edge Inference Speed	~30 tok/s

What This Model Does

Generates structured JSON execution plans for tool/plugin orchestration. Given a user request and available tools, it produces a valid JSON object specifying which tools to call, with what parameters, and in what order.

Evaluation Results

Production Plugin Eval (Real OnDemand Plugins, 50 samples)

Metric	Model	Base Qwen 0.8B	Improvement
Valid JSON	94.0%	18.0%	+76%
Correct Plugin IDs	44.0%	0.0%	+44%
Params Correct Type	94.0%	0.0%	+94%
Param Keys Match	66.0%	0.0%	+66%
Real Production IDs	94.0%	0.0%	+94%
Dependencies Present	88.0%	0.0%	+88%
Composite	75.6%	4.8%	+70.8%

By Query Complexity

Difficulty	JSON Valid	Real Plugin IDs	Exact Match
Easy (1 tool)	92%	92%	81%
Medium (2-3 tools)	96%	96%	4%

Training Pipeline

This model was trained using a novel multi-stage approach developed by AIREV:

Stage 1: Supervised Fine-Tuning (SFT)

Base model: Qwen 3.5-0.8B
47K+ curated samples with reasoning traces and structured JSON outputs
Real production plugin schemas from the OnDemand platform (2,176 plugins)
LLM-evaluated data quality filtering (score >= 8/10)
Full fine-tune (not LoRA) — research shows full FT outperforms at sub-1B scale

Stage 2: Progressive Curriculum GRPO

Group Relative Policy Optimization with a novel 4-phase progressive reward curriculum
The reward progressively increases in difficulty across training:
- Phase 1: JSON structural validity
- Phase 2: Required field presence
- Phase 3: Correct tool selection (prompt-grounded)
- Phase 4: Parameter quality and completeness
Innovation: Noise injection for zero-variance groups prevents common GRPO failure modes
Prompt-aware reward function verifies selected tools against available options in the input

Why This Matters

Sub-1B tool calling is novel — no published model under 3B demonstrates functional tool calling
Progressive Curriculum GRPO — a new approach enabling skill stacking in small models
Edge deployment viable — 400MB quantized, 30 tok/s on Snapdragon 8 Elite, <3s response time

Model Specs

Parameter	Value
Base Model	Qwen 3.5-0.8B (752M params)
Training Data	47,400 samples (real production plugins)
SFT Epochs	3
GRPO Steps	1,250
Precision	bf16
Hardware	NVIDIA H100 80GB
Attention	SDPA

Intended Use

Primary: On-device agentic AI for tool orchestration on edge devices (AR glasses, mobile, IoT)
Platform: Built for the OnDemand Agentic AI Platform (3,000+ tools)
Best for: 1-2 step tool calling queries
Not recommended for: Complex multi-step workflows (5+ tools)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained("airev-ae/Qwen-0.8B-AgentJSON", torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained("airev-ae/Qwen-0.8B-AgentJSON")

messages = [
    {"role": "system", "content": "You are an AI agent orchestrator. Generate a JSON execution plan."},
    {"role": "user", "content": "Search for the latest AI news"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

output = model.generate(**inputs, max_new_tokens=512, temperature=0.8, do_sample=True)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Limitations

Best accuracy on 1-2 tool queries; degrades on 3+ tool orchestration
Trained on OnDemand plugin schema format
Recommended: use with JSON validation at inference for production
Temperature 0.7-0.8 produces best results

Citation

@misc{airev2026agentjson,
  title={AIREV Qwen-0.8B-AgentJSON: Sub-1B Tool Calling via Progressive Curriculum GRPO},
  author={AIREV FZ-LLC},
  year={2026},
  url={https://huggingface.co/airev-ae/Qwen-0.8B-AgentJSON}
}

Built by AIREV | OnDemand Platform | Abu Dhabi, UAE

Trained with Progressive Curriculum GRPO — a novel approach for sub-1B structured output generation.

Downloads last month: 628

Safetensors

Model size

0.8B params

Tensor type

BF16

Model tree for airev-ae/Qwen-0.8B-AgentJSON

Base model

Qwen/Qwen3.5-0.8B-Base

Finetuned

Qwen/Qwen3.5-0.8B