Granite 4.0 H-Micro - Tool Calling Fine-tuned

Unsloth License Base Model

Model Summary

This is a tool calling / function calling fine-tuned version of IBM's Granite 4.0 H-Micro model, trained on the high-quality Toucan-1.5M dataset. The model has been optimized to understand tool declarations and generate structured function calls for agentic workflows.

Key Features:

  • 🛠️ Tool Calling Support: Trained to invoke functions with proper parameter formatting
  • 🎯 Multi-turn Conversations: Handles complex dialogues with tool results integration
  • 🚀 Efficient Training: LoRA fine-tuning with Unsloth for 2x faster training
  • 💬 Multilingual: Supports English and French tool calling scenarios
  • Lightweight: Based on Granite 4.0 H-Micro (400M parameters)

Model Details

Model Description

This model extends IBM's Granite 4.0 H-Micro with advanced tool calling capabilities through supervised fine-tuning on the Toucan-1.5M dataset's SFT subset. It can:

  • Parse tool/function declarations in system prompts
  • Understand user requests requiring external tool usage
  • Generate properly formatted function calls with correct parameters
  • Integrate tool results into conversational responses
  • Handle multi-step agentic workflows

Developed by: Shumatsurontek
Model type: Causal Language Model (Decoder-only)
Language(s): English, French
License: Apache 2.0
Finetuned from: unsloth/granite-4.0-h-micro
Base Architecture: IBM Granite 4.0 with MoE-Hybrid (Mamba + Attention)

Model Sources

Intended Uses

Direct Use

This model is designed for agentic AI applications that require function calling capabilities:

from unsloth import FastLanguageModel
import torch

# Load model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="Shumatsurontek/granite-4.0-h-micro-Toucan-120k",
    max_seq_length=2048,
    load_in_4bit=True,
)

# Enable inference mode
FastLanguageModel.for_inference(model)

# Define tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string", "description": "City name"},
                    "units": {"type": "string", "enum": ["celsius", "fahrenheit"]}
                },
                "required": ["location"]
            }
        }
    }
]

# Create message with tool declaration
messages = [
    {
        "role": "system",
        "content": f"<|im_system|>tool_declare<|im_middle|>{json.dumps(tools)}<|im_end|>"
    },
    {
        "role": "user",
        "content": "What's the weather in Paris? I prefer Celsius."
    }
]

# Generate response
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
).to("cuda")

outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Downstream Use Cases

  • AI Agents: Build autonomous agents that can use tools and APIs
  • Customer Support Bots: Create support assistants that can query systems and databases
  • Task Automation: Develop workflow automation with LLM-driven tool orchestration
  • Research Assistants: Build agents that can search, retrieve, and process information
  • DevOps Bots: Create intelligent bots that can interact with infrastructure APIs

Out-of-Scope Use

  • Critical Decision Making: Not suitable for medical, legal, or financial decisions without human oversight
  • Production Deployments: This is a research/experimental model trained for only 60 steps
  • High-stakes Applications: Additional safety measures and full training required
  • Malicious Tool Usage: Should not be used to generate harmful API calls or exploits

Training Details

Training Data

Dataset: Agent-Ark/Toucan-1.5M
Subset: SFT (Supervised Fine-Tuning)
Size: ~119,000 examples
Quality: High-quality curated tool calling conversations

The Toucan-1.5M dataset contains diverse tool calling scenarios including:

  • Web search and information retrieval
  • Weather and location services
  • Email and communication tools
  • Mathematical calculations
  • Financial data queries
  • Multi-step agentic workflows

Data Format:

{
  "messages": [
    {"role": "system", "content": "tool declarations..."},
    {"role": "user", "content": "user query"},
    {"role": "assistant", "content": "response", "function_call": {...}},
    {"role": "function", "content": "tool result", "name": "tool_name"}
  ]
}

Training Procedure

Framework: Unsloth + TRL (Transformer Reinforcement Learning)
Method: Supervised Fine-Tuning (SFT) with LoRA
Hardware: Tesla T4 GPU (16GB VRAM) on Google Colab

LoRA Configuration

{
  "r": 32,
  "lora_alpha": 32,
  "lora_dropout": 0.0,
  "target_modules": [
    "q_proj", "k_proj", "v_proj", "o_proj",
    "gate_proj", "up_proj", "down_proj",
    "shared_mlp.input_linear", "shared_mlp.output_linear"
  ],
  "bias": "none",
  "use_rslora": false
}

Training Hyperparameters

Parameter Value
Batch Size (per device) 2
Gradient Accumulation Steps 4
Effective Batch Size 8
Learning Rate 2e-4
Warmup Steps 5
Max Steps 60
Optimizer AdamW 8-bit
Weight Decay 0.01
LR Scheduler Linear
Max Sequence Length 2048
Gradient Checkpointing Unsloth (30% less VRAM)
Mixed Precision FP16
Seed 3407

Training Regime: FP16 mixed precision with 8-bit AdamW optimizer

Training Efficiency

  • Training Time: ~15.9 minutes (954 seconds)
  • GPU Memory Usage: 10.42 GB peak (70.7% of T4)
  • Memory for Training: 4.36 GB
  • Trainable Parameters: 1,703,936 (0.05% of total)
  • Speed Improvement: 2x faster with Unsloth optimizations

Special Technique: Train-on-completions masking to only compute loss on assistant responses, improving fine-tune quality.

Performance & Evaluation

Example Outputs

Scenario 1: Weather Service Debugging

User: Ma page météo reste bloquée sur "Chargement…" depuis hier soir.
Agent: D'accord, je vais vérifier le statut du service météo.

Function Call:
{
  "name": "get_weather_status",
  "arguments": {
    "location": "current",
    "service": "MeteoAPI"
  }
}

Scenario 2: Billing Issue Resolution

User: J'ai été débité deux fois pour mon abonnement Pro ce mois-ci.
Agent: Je vais vérifier vos factures récentes.

Function Call:
{
  "name": "get_invoice",
  "arguments": {
    "user_id": "U3421",
    "month": "2025-10"
  }
}

Known Limitations

  • Training Duration: Trained for only 60 steps (experimental/demo purposes)
  • Dataset Coverage: Limited to SFT subset (~119k examples)
  • Tool Format: Optimized for specific JSON function call format
  • Multilingual: Primarily English with some French support
  • Context Length: Limited to 2048 tokens

Recommendations for Production:

  • Train for full epoch (num_train_epochs=1)
  • Use larger subset or full dataset
  • Implement evaluation metrics and validation set
  • Add safety guardrails for tool execution
  • Test thoroughly on your specific use case

Bias, Risks, and Limitations

Potential Biases

  • Dataset Bias: Inherits biases from Toucan-1.5M and base Granite model
  • Language Bias: Primarily trained on English, limited French support
  • Domain Bias: Tool calling scenarios may not generalize to all domains

Risks

  • Hallucination: May generate plausible but incorrect function calls
  • Security: Could generate malicious API calls if not properly constrained
  • Over-reliance: Users should validate tool calls before execution
  • Data Leakage: May inadvertently expose training data patterns

Recommendations

  • Validate All Tool Calls: Parse and verify function calls before execution
  • Implement Guardrails: Add safety checks and allowlists for tools
  • Human-in-the-Loop: Use human oversight for critical operations
  • Monitor Usage: Track and log all tool invocations
  • Test Thoroughly: Validate on your specific use cases before deployment
  • Update Documentation: Maintain clear documentation of available tools

Technical Specifications

Model Architecture

Base: IBM Granite 4.0 H-Micro
Type: MoE-Hybrid (Mamba SSM + Multi-Head Attention)
Parameters: ~400M (base model)
Trainable Parameters: 1.7M (LoRA adapters)
Context Length: 2048 tokens
Vocabulary Size: 49,152

Compute Infrastructure

Hardware

  • GPU: NVIDIA Tesla T4 (16GB VRAM)
  • Platform: Google Colab (Free Tier)
  • Compute Region: US (Google Cloud)

Software

  • Framework: PyTorch 2.8.0
  • Training Library: Unsloth + TRL 0.22.2
  • Transformers: 4.55.4
  • CUDA: 12.6
  • Mamba SSM: 2.2.5
  • Causal Conv1D: 1.5.2

Carbon Footprint

Estimated CO2 Emissions: < 0.01 kg CO2eq
Based on ~16 minutes of T4 GPU usage

  • Hardware: NVIDIA Tesla T4
  • Hours Used: 0.27 hours
  • Cloud Provider: Google Cloud Platform
  • Compute Region: us-central1

Chat Template

The model uses Granite 4.0's chat template:

<|start_of_role|>system<|end_of_role|>
<|im_system|>tool_declare<|im_middle|>[tool definitions]<|im_end|>
<|end_of_text|>

<|start_of_role|>user<|end_of_role|>
User query here
<|end_of_text|>

<|start_of_role|>assistant<|end_of_role|>
Assistant response with function calls
<|end_of_text|>

Citation

@misc{granite-toucan-2025,
  author = {Shumatsurontek},
  title = {Granite 4.0 H-Micro Fine-tuned for Tool Calling},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/Shumatsurontek/granite-4.0-h-micro-Toucan-120k}}
}

@misc{toucan2024,
  title={Toucan-1.5M: A Dataset for Training and Evaluating Tool-Calling Agents},
  author={Agent-Ark},
  year={2024},
  publisher={HuggingFace},
  howpublished={\url{https://huggingface.co/datasets/Agent-Ark/Toucan-1.5M}}
}

@software{unsloth2024,
  title={Unsloth: 2x Faster LLM Training},
  author={Unsloth Team},
  year={2024},
  url={https://github.com/unslothai/unsloth}
}

@misc{granite2024,
  title={Granite 4.0: IBM's Open Foundation Models},
  author={IBM Research},
  year={2024},
  publisher={HuggingFace},
  howpublished={\url{https://huggingface.co/ibm-granite/granite-4.0-h-micro}}
}

Acknowledgments

  • IBM Research for the Granite 4.0 base model
  • Agent-Ark for the Toucan-1.5M dataset
  • Unsloth Team for the efficient fine-tuning framework
  • Google Colab for providing free GPU resources

License

This model inherits the Apache 2.0 license from the base Granite 4.0 model.

Model Card Contact

For questions, issues, or collaboration:


Built with ❤️ using Unsloth | Powered by IBM Granite

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Shumatsurontek/granite-4.0-h-micro-Toucan-120k

Adapter
(2)
this model

Dataset used to train Shumatsurontek/granite-4.0-h-micro-Toucan-120k