Granite 4.0 H-Micro - Tool Calling Fine-tuned
Model Summary
This is a tool calling / function calling fine-tuned version of IBM's Granite 4.0 H-Micro model, trained on the high-quality Toucan-1.5M dataset. The model has been optimized to understand tool declarations and generate structured function calls for agentic workflows.
Key Features:
- 🛠️ Tool Calling Support: Trained to invoke functions with proper parameter formatting
- 🎯 Multi-turn Conversations: Handles complex dialogues with tool results integration
- 🚀 Efficient Training: LoRA fine-tuning with Unsloth for 2x faster training
- 💬 Multilingual: Supports English and French tool calling scenarios
- ⚡ Lightweight: Based on Granite 4.0 H-Micro (400M parameters)
Model Details
Model Description
This model extends IBM's Granite 4.0 H-Micro with advanced tool calling capabilities through supervised fine-tuning on the Toucan-1.5M dataset's SFT subset. It can:
- Parse tool/function declarations in system prompts
- Understand user requests requiring external tool usage
- Generate properly formatted function calls with correct parameters
- Integrate tool results into conversational responses
- Handle multi-step agentic workflows
Developed by: Shumatsurontek
Model type: Causal Language Model (Decoder-only)
Language(s): English, French
License: Apache 2.0
Finetuned from: unsloth/granite-4.0-h-micro
Base Architecture: IBM Granite 4.0 with MoE-Hybrid (Mamba + Attention)
Model Sources
- Repository: Unsloth GitHub
- Base Model: IBM Granite 4.0
- Training Dataset: Agent-Ark/Toucan-1.5M
- Training Notebook: Granite 4.0 Tool Calling
Intended Uses
Direct Use
This model is designed for agentic AI applications that require function calling capabilities:
from unsloth import FastLanguageModel
import torch
# Load model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="Shumatsurontek/granite-4.0-h-micro-Toucan-120k",
max_seq_length=2048,
load_in_4bit=True,
)
# Enable inference mode
FastLanguageModel.for_inference(model)
# Define tools
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"},
"units": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location"]
}
}
}
]
# Create message with tool declaration
messages = [
{
"role": "system",
"content": f"<|im_system|>tool_declare<|im_middle|>{json.dumps(tools)}<|im_end|>"
},
{
"role": "user",
"content": "What's the weather in Paris? I prefer Celsius."
}
]
# Generate response
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
).to("cuda")
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
top_p=0.9,
do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Downstream Use Cases
- AI Agents: Build autonomous agents that can use tools and APIs
- Customer Support Bots: Create support assistants that can query systems and databases
- Task Automation: Develop workflow automation with LLM-driven tool orchestration
- Research Assistants: Build agents that can search, retrieve, and process information
- DevOps Bots: Create intelligent bots that can interact with infrastructure APIs
Out-of-Scope Use
- Critical Decision Making: Not suitable for medical, legal, or financial decisions without human oversight
- Production Deployments: This is a research/experimental model trained for only 60 steps
- High-stakes Applications: Additional safety measures and full training required
- Malicious Tool Usage: Should not be used to generate harmful API calls or exploits
Training Details
Training Data
Dataset: Agent-Ark/Toucan-1.5M
Subset: SFT (Supervised Fine-Tuning)
Size: ~119,000 examples
Quality: High-quality curated tool calling conversations
The Toucan-1.5M dataset contains diverse tool calling scenarios including:
- Web search and information retrieval
- Weather and location services
- Email and communication tools
- Mathematical calculations
- Financial data queries
- Multi-step agentic workflows
Data Format:
{
"messages": [
{"role": "system", "content": "tool declarations..."},
{"role": "user", "content": "user query"},
{"role": "assistant", "content": "response", "function_call": {...}},
{"role": "function", "content": "tool result", "name": "tool_name"}
]
}
Training Procedure
Framework: Unsloth + TRL (Transformer Reinforcement Learning)
Method: Supervised Fine-Tuning (SFT) with LoRA
Hardware: Tesla T4 GPU (16GB VRAM) on Google Colab
LoRA Configuration
{
"r": 32,
"lora_alpha": 32,
"lora_dropout": 0.0,
"target_modules": [
"q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",
"shared_mlp.input_linear", "shared_mlp.output_linear"
],
"bias": "none",
"use_rslora": false
}
Training Hyperparameters
| Parameter | Value |
|---|---|
| Batch Size (per device) | 2 |
| Gradient Accumulation Steps | 4 |
| Effective Batch Size | 8 |
| Learning Rate | 2e-4 |
| Warmup Steps | 5 |
| Max Steps | 60 |
| Optimizer | AdamW 8-bit |
| Weight Decay | 0.01 |
| LR Scheduler | Linear |
| Max Sequence Length | 2048 |
| Gradient Checkpointing | Unsloth (30% less VRAM) |
| Mixed Precision | FP16 |
| Seed | 3407 |
Training Regime: FP16 mixed precision with 8-bit AdamW optimizer
Training Efficiency
- Training Time: ~15.9 minutes (954 seconds)
- GPU Memory Usage: 10.42 GB peak (70.7% of T4)
- Memory for Training: 4.36 GB
- Trainable Parameters: 1,703,936 (0.05% of total)
- Speed Improvement: 2x faster with Unsloth optimizations
Special Technique: Train-on-completions masking to only compute loss on assistant responses, improving fine-tune quality.
Performance & Evaluation
Example Outputs
Scenario 1: Weather Service Debugging
User: Ma page météo reste bloquée sur "Chargement…" depuis hier soir.
Agent: D'accord, je vais vérifier le statut du service météo.
Function Call:
{
"name": "get_weather_status",
"arguments": {
"location": "current",
"service": "MeteoAPI"
}
}
Scenario 2: Billing Issue Resolution
User: J'ai été débité deux fois pour mon abonnement Pro ce mois-ci.
Agent: Je vais vérifier vos factures récentes.
Function Call:
{
"name": "get_invoice",
"arguments": {
"user_id": "U3421",
"month": "2025-10"
}
}
Known Limitations
- Training Duration: Trained for only 60 steps (experimental/demo purposes)
- Dataset Coverage: Limited to SFT subset (~119k examples)
- Tool Format: Optimized for specific JSON function call format
- Multilingual: Primarily English with some French support
- Context Length: Limited to 2048 tokens
Recommendations for Production:
- Train for full epoch (num_train_epochs=1)
- Use larger subset or full dataset
- Implement evaluation metrics and validation set
- Add safety guardrails for tool execution
- Test thoroughly on your specific use case
Bias, Risks, and Limitations
Potential Biases
- Dataset Bias: Inherits biases from Toucan-1.5M and base Granite model
- Language Bias: Primarily trained on English, limited French support
- Domain Bias: Tool calling scenarios may not generalize to all domains
Risks
- Hallucination: May generate plausible but incorrect function calls
- Security: Could generate malicious API calls if not properly constrained
- Over-reliance: Users should validate tool calls before execution
- Data Leakage: May inadvertently expose training data patterns
Recommendations
- ✅ Validate All Tool Calls: Parse and verify function calls before execution
- ✅ Implement Guardrails: Add safety checks and allowlists for tools
- ✅ Human-in-the-Loop: Use human oversight for critical operations
- ✅ Monitor Usage: Track and log all tool invocations
- ✅ Test Thoroughly: Validate on your specific use cases before deployment
- ✅ Update Documentation: Maintain clear documentation of available tools
Technical Specifications
Model Architecture
Base: IBM Granite 4.0 H-Micro
Type: MoE-Hybrid (Mamba SSM + Multi-Head Attention)
Parameters: ~400M (base model)
Trainable Parameters: 1.7M (LoRA adapters)
Context Length: 2048 tokens
Vocabulary Size: 49,152
Compute Infrastructure
Hardware
- GPU: NVIDIA Tesla T4 (16GB VRAM)
- Platform: Google Colab (Free Tier)
- Compute Region: US (Google Cloud)
Software
- Framework: PyTorch 2.8.0
- Training Library: Unsloth + TRL 0.22.2
- Transformers: 4.55.4
- CUDA: 12.6
- Mamba SSM: 2.2.5
- Causal Conv1D: 1.5.2
Carbon Footprint
Estimated CO2 Emissions: < 0.01 kg CO2eq
Based on ~16 minutes of T4 GPU usage
- Hardware: NVIDIA Tesla T4
- Hours Used: 0.27 hours
- Cloud Provider: Google Cloud Platform
- Compute Region: us-central1
Chat Template
The model uses Granite 4.0's chat template:
<|start_of_role|>system<|end_of_role|>
<|im_system|>tool_declare<|im_middle|>[tool definitions]<|im_end|>
<|end_of_text|>
<|start_of_role|>user<|end_of_role|>
User query here
<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|>
Assistant response with function calls
<|end_of_text|>
Citation
@misc{granite-toucan-2025,
author = {Shumatsurontek},
title = {Granite 4.0 H-Micro Fine-tuned for Tool Calling},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/Shumatsurontek/granite-4.0-h-micro-Toucan-120k}}
}
@misc{toucan2024,
title={Toucan-1.5M: A Dataset for Training and Evaluating Tool-Calling Agents},
author={Agent-Ark},
year={2024},
publisher={HuggingFace},
howpublished={\url{https://huggingface.co/datasets/Agent-Ark/Toucan-1.5M}}
}
@software{unsloth2024,
title={Unsloth: 2x Faster LLM Training},
author={Unsloth Team},
year={2024},
url={https://github.com/unslothai/unsloth}
}
@misc{granite2024,
title={Granite 4.0: IBM's Open Foundation Models},
author={IBM Research},
year={2024},
publisher={HuggingFace},
howpublished={\url{https://huggingface.co/ibm-granite/granite-4.0-h-micro}}
}
Acknowledgments
- IBM Research for the Granite 4.0 base model
- Agent-Ark for the Toucan-1.5M dataset
- Unsloth Team for the efficient fine-tuning framework
- Google Colab for providing free GPU resources
License
This model inherits the Apache 2.0 license from the base Granite 4.0 model.
Model Card Contact
For questions, issues, or collaboration:
- HuggingFace: @Shumatsurontek
- Model Repository: granite-4.0-h-micro-Toucan-120k
Built with ❤️ using Unsloth | Powered by IBM Granite
Model tree for Shumatsurontek/granite-4.0-h-micro-Toucan-120k
Base model
ibm-granite/granite-4.0-h-micro