AIREV Qwen-0.8B-AgentJSON

World's first sub-1B parameter model with functional tool calling capability.

Built by AIREV for the OnDemand Agentic AI Platform.

Key Results

Metric Score
JSON Validity (Easy queries) 92%
Correct Plugin Selection 94%
Exact Plugin ID Match (Easy) 81%
Production Composite Score 75.6%
Parameters 752M
Quantized Size ~400MB
Edge Inference Speed ~30 tok/s

What This Model Does

Generates structured JSON execution plans for tool/plugin orchestration. Given a user request and available tools, it produces a valid JSON object specifying which tools to call, with what parameters, and in what order.

Evaluation Results

Production Plugin Eval (Real OnDemand Plugins, 50 samples)

Metric Model Base Qwen 0.8B Improvement
Valid JSON 94.0% 18.0% +76%
Correct Plugin IDs 44.0% 0.0% +44%
Params Correct Type 94.0% 0.0% +94%
Param Keys Match 66.0% 0.0% +66%
Real Production IDs 94.0% 0.0% +94%
Dependencies Present 88.0% 0.0% +88%
Composite 75.6% 4.8% +70.8%

By Query Complexity

Difficulty JSON Valid Real Plugin IDs Exact Match
Easy (1 tool) 92% 92% 81%
Medium (2-3 tools) 96% 96% 4%

Training Pipeline

This model was trained using a novel multi-stage approach developed by AIREV:

Stage 1: Supervised Fine-Tuning (SFT)

  • Base model: Qwen 3.5-0.8B
  • 47K+ curated samples with reasoning traces and structured JSON outputs
  • Real production plugin schemas from the OnDemand platform (2,176 plugins)
  • LLM-evaluated data quality filtering (score >= 8/10)
  • Full fine-tune (not LoRA) — research shows full FT outperforms at sub-1B scale

Stage 2: Progressive Curriculum GRPO

  • Group Relative Policy Optimization with a novel 4-phase progressive reward curriculum
  • The reward progressively increases in difficulty across training:
    • Phase 1: JSON structural validity
    • Phase 2: Required field presence
    • Phase 3: Correct tool selection (prompt-grounded)
    • Phase 4: Parameter quality and completeness
  • Innovation: Noise injection for zero-variance groups prevents common GRPO failure modes
  • Prompt-aware reward function verifies selected tools against available options in the input

Why This Matters

  1. Sub-1B tool calling is novel — no published model under 3B demonstrates functional tool calling
  2. Progressive Curriculum GRPO — a new approach enabling skill stacking in small models
  3. Edge deployment viable — 400MB quantized, 30 tok/s on Snapdragon 8 Elite, <3s response time

Model Specs

Parameter Value
Base Model Qwen 3.5-0.8B (752M params)
Training Data 47,400 samples (real production plugins)
SFT Epochs 3
GRPO Steps 1,250
Precision bf16
Hardware NVIDIA H100 80GB
Attention SDPA

Intended Use

  • Primary: On-device agentic AI for tool orchestration on edge devices (AR glasses, mobile, IoT)
  • Platform: Built for the OnDemand Agentic AI Platform (3,000+ tools)
  • Best for: 1-2 step tool calling queries
  • Not recommended for: Complex multi-step workflows (5+ tools)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained("airev-ae/Qwen-0.8B-AgentJSON", torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained("airev-ae/Qwen-0.8B-AgentJSON")

messages = [
    {"role": "system", "content": "You are an AI agent orchestrator. Generate a JSON execution plan."},
    {"role": "user", "content": "Search for the latest AI news"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

output = model.generate(**inputs, max_new_tokens=512, temperature=0.8, do_sample=True)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Limitations

  • Best accuracy on 1-2 tool queries; degrades on 3+ tool orchestration
  • Trained on OnDemand plugin schema format
  • Recommended: use with JSON validation at inference for production
  • Temperature 0.7-0.8 produces best results

Citation

@misc{airev2026agentjson,
  title={AIREV Qwen-0.8B-AgentJSON: Sub-1B Tool Calling via Progressive Curriculum GRPO},
  author={AIREV FZ-LLC},
  year={2026},
  url={https://huggingface.co/airev-ae/Qwen-0.8B-AgentJSON}
}

Built by AIREV | OnDemand Platform | Abu Dhabi, UAE

Trained with Progressive Curriculum GRPO — a novel approach for sub-1B structured output generation.

Downloads last month
628
Safetensors
Model size
0.8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for airev-ae/Qwen-0.8B-AgentJSON

Finetuned
(103)
this model

Dataset used to train airev-ae/Qwen-0.8B-AgentJSON