Granite 4.0 H-Micro - Tool Calling Fine-tuned

Model Summary

This is a tool calling / function calling fine-tuned version of IBM's Granite 4.0 H-Micro model, trained on the high-quality Toucan-1.5M dataset. The model has been optimized to understand tool declarations and generate structured function calls for agentic workflows.

Key Features:

🛠️ Tool Calling Support: Trained to invoke functions with proper parameter formatting
🎯 Multi-turn Conversations: Handles complex dialogues with tool results integration
🚀 Efficient Training: LoRA fine-tuning with Unsloth for 2x faster training
💬 Multilingual: Supports English and French tool calling scenarios
⚡ Lightweight: Based on Granite 4.0 H-Micro (400M parameters)

Model Details

Model Description

This model extends IBM's Granite 4.0 H-Micro with advanced tool calling capabilities through supervised fine-tuning on the Toucan-1.5M dataset's SFT subset. It can:

Parse tool/function declarations in system prompts
Understand user requests requiring external tool usage
Generate properly formatted function calls with correct parameters
Integrate tool results into conversational responses
Handle multi-step agentic workflows

Developed by: Shumatsurontek
Model type: Causal Language Model (Decoder-only)
Language(s): English, French
License: Apache 2.0
Finetuned from: unsloth/granite-4.0-h-micro
Base Architecture: IBM Granite 4.0 with MoE-Hybrid (Mamba + Attention)

Model Sources

Repository: Unsloth GitHub
Base Model: IBM Granite 4.0
Training Dataset: Agent-Ark/Toucan-1.5M
Training Notebook: Granite 4.0 Tool Calling

Intended Uses

Direct Use

This model is designed for agentic AI applications that require function calling capabilities:

from unsloth import FastLanguageModel
import torch

# Load model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="Shumatsurontek/granite-4.0-h-micro-Toucan-120k",
    max_seq_length=2048,
    load_in_4bit=True,
)

# Enable inference mode
FastLanguageModel.for_inference(model)

# Define tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string", "description": "City name"},
                    "units": {"type": "string", "enum": ["celsius", "fahrenheit"]}
                },
                "required": ["location"]
            }
        }
    }
]

# Create message with tool declaration
messages = [
    {
        "role": "system",
        "content": f"<|im_system|>tool_declare<|im_middle|>{json.dumps(tools)}<|im_end|>"
    },
    {
        "role": "user",
        "content": "What's the weather in Paris? I prefer Celsius."
    }
]

# Generate response
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to("cuda")

outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Downstream Use Cases

AI Agents: Build autonomous agents that can use tools and APIs
Customer Support Bots: Create support assistants that can query systems and databases
Task Automation: Develop workflow automation with LLM-driven tool orchestration
Research Assistants: Build agents that can search, retrieve, and process information
DevOps Bots: Create intelligent bots that can interact with infrastructure APIs

Out-of-Scope Use

Critical Decision Making: Not suitable for medical, legal, or financial decisions without human oversight
Production Deployments: This is a research/experimental model trained for only 60 steps
High-stakes Applications: Additional safety measures and full training required
Malicious Tool Usage: Should not be used to generate harmful API calls or exploits

Training Details

Training Data

Dataset: Agent-Ark/Toucan-1.5M
Subset: SFT (Supervised Fine-Tuning)
Size: ~119,000 examples
Quality: High-quality curated tool calling conversations

The Toucan-1.5M dataset contains diverse tool calling scenarios including:

Web search and information retrieval
Weather and location services
Email and communication tools
Mathematical calculations
Financial data queries
Multi-step agentic workflows

Data Format:

{
  "messages": [
    {"role": "system", "content": "tool declarations..."},
    {"role": "user", "content": "user query"},
    {"role": "assistant", "content": "response", "function_call": {...}},
    {"role": "function", "content": "tool result", "name": "tool_name"}
  ]
}

Training Procedure

Framework: Unsloth + TRL (Transformer Reinforcement Learning)
Method: Supervised Fine-Tuning (SFT) with LoRA
Hardware: Tesla T4 GPU (16GB VRAM) on Google Colab

LoRA Configuration

{
  "r": 32,
  "lora_alpha": 32,
  "lora_dropout": 0.0,
  "target_modules": [
    "q_proj", "k_proj", "v_proj", "o_proj",
    "gate_proj", "up_proj", "down_proj",
    "shared_mlp.input_linear", "shared_mlp.output_linear"
  ],
  "bias": "none",
  "use_rslora": false
}

Training Hyperparameters

Parameter	Value
Batch Size (per device)	2
Gradient Accumulation Steps	4
Effective Batch Size	8
Learning Rate	2e-4
Warmup Steps	5
Max Steps	60
Optimizer	AdamW 8-bit
Weight Decay	0.01
LR Scheduler	Linear
Max Sequence Length	2048
Gradient Checkpointing	Unsloth (30% less VRAM)
Mixed Precision	FP16
Seed	3407

Training Regime: FP16 mixed precision with 8-bit AdamW optimizer

Training Efficiency

Training Time: ~15.9 minutes (954 seconds)
GPU Memory Usage: 10.42 GB peak (70.7% of T4)
Memory for Training: 4.36 GB
Trainable Parameters: 1,703,936 (0.05% of total)
Speed Improvement: 2x faster with Unsloth optimizations

Special Technique: Train-on-completions masking to only compute loss on assistant responses, improving fine-tune quality.

Performance & Evaluation

Example Outputs

Scenario 1: Weather Service Debugging

User: Ma page météo reste bloquée sur "Chargement…" depuis hier soir.
Agent: D'accord, je vais vérifier le statut du service météo.

Function Call:
{
  "name": "get_weather_status",
  "arguments": {
    "location": "current",
    "service": "MeteoAPI"
  }
}

Scenario 2: Billing Issue Resolution

User: J'ai été débité deux fois pour mon abonnement Pro ce mois-ci.
Agent: Je vais vérifier vos factures récentes.

Function Call:
{
  "name": "get_invoice",
  "arguments": {
    "user_id": "U3421",
    "month": "2025-10"
  }
}

Known Limitations

Training Duration: Trained for only 60 steps (experimental/demo purposes)
Dataset Coverage: Limited to SFT subset (~119k examples)
Tool Format: Optimized for specific JSON function call format
Multilingual: Primarily English with some French support
Context Length: Limited to 2048 tokens

Recommendations for Production:

Train for full epoch (num_train_epochs=1)
Use larger subset or full dataset
Implement evaluation metrics and validation set
Add safety guardrails for tool execution
Test thoroughly on your specific use case

Bias, Risks, and Limitations

Potential Biases

Dataset Bias: Inherits biases from Toucan-1.5M and base Granite model
Language Bias: Primarily trained on English, limited French support
Domain Bias: Tool calling scenarios may not generalize to all domains

Risks

Hallucination: May generate plausible but incorrect function calls
Security: Could generate malicious API calls if not properly constrained
Over-reliance: Users should validate tool calls before execution
Data Leakage: May inadvertently expose training data patterns

Recommendations

✅ Validate All Tool Calls: Parse and verify function calls before execution
✅ Implement Guardrails: Add safety checks and allowlists for tools
✅ Human-in-the-Loop: Use human oversight for critical operations
✅ Monitor Usage: Track and log all tool invocations
✅ Test Thoroughly: Validate on your specific use cases before deployment
✅ Update Documentation: Maintain clear documentation of available tools

Technical Specifications

Model Architecture

Base: IBM Granite 4.0 H-Micro
Type: MoE-Hybrid (Mamba SSM + Multi-Head Attention)
Parameters: ~400M (base model)
Trainable Parameters: 1.7M (LoRA adapters)
Context Length: 2048 tokens
Vocabulary Size: 49,152

Compute Infrastructure

Hardware

GPU: NVIDIA Tesla T4 (16GB VRAM)
Platform: Google Colab (Free Tier)
Compute Region: US (Google Cloud)

Software

Framework: PyTorch 2.8.0
Training Library: Unsloth + TRL 0.22.2
Transformers: 4.55.4
CUDA: 12.6
Mamba SSM: 2.2.5
Causal Conv1D: 1.5.2

Carbon Footprint

Estimated CO2 Emissions: < 0.01 kg CO2eq
Based on ~16 minutes of T4 GPU usage

Hardware: NVIDIA Tesla T4
Hours Used: 0.27 hours
Cloud Provider: Google Cloud Platform
Compute Region: us-central1

Chat Template

The model uses Granite 4.0's chat template:

<|start_of_role|>system<|end_of_role|>
<|im_system|>tool_declare<|im_middle|>[tool definitions]<|im_end|>
<|end_of_text|>

<|start_of_role|>user<|end_of_role|>
User query here
<|end_of_text|>

<|start_of_role|>assistant<|end_of_role|>
Assistant response with function calls
<|end_of_text|>

Citation

@misc{granite-toucan-2025,
  author = {Shumatsurontek},
  title = {Granite 4.0 H-Micro Fine-tuned for Tool Calling},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/Shumatsurontek/granite-4.0-h-micro-Toucan-120k}}
}

@misc{toucan2024,
  title={Toucan-1.5M: A Dataset for Training and Evaluating Tool-Calling Agents},
  author={Agent-Ark},
  year={2024},
  publisher={HuggingFace},
  howpublished={\url{https://huggingface.co/datasets/Agent-Ark/Toucan-1.5M}}
}

@software{unsloth2024,
  title={Unsloth: 2x Faster LLM Training},
  author={Unsloth Team},
  year={2024},
  url={https://github.com/unslothai/unsloth}
}

@misc{granite2024,
  title={Granite 4.0: IBM's Open Foundation Models},
  author={IBM Research},
  year={2024},
  publisher={HuggingFace},
  howpublished={\url{https://huggingface.co/ibm-granite/granite-4.0-h-micro}}
}

Acknowledgments

IBM Research for the Granite 4.0 base model
Agent-Ark for the Toucan-1.5M dataset
Unsloth Team for the efficient fine-tuning framework
Google Colab for providing free GPU resources

License

This model inherits the Apache 2.0 license from the base Granite 4.0 model.

Model Card Contact

For questions, issues, or collaboration:

HuggingFace: @Shumatsurontek
Model Repository: granite-4.0-h-micro-Toucan-120k

Built with ❤️ using Unsloth | Powered by IBM Granite

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Shumatsurontek/granite-4.0-h-micro-Toucan-120k

Base model

ibm-granite/granite-4.0-h-micro

Finetuned

unsloth/granite-4.0-h-micro

Adapter

(2)

this model

Shumatsurontek
/

granite-4.0-h-micro-Toucan-120k