--- library_name: transformers license: other license_name: lfm1.0 license_link: LICENSE language: - en - ar - zh - fr - de - ja - ko - es pipeline_tag: text-generation tags: - liquid - lfm2 - edge base_model: LiquidAI/LFM2-1.2B --- # LFM2-1.2B-Tool GGUF Models ## Model Generation Details This model was generated using [llama.cpp](https://github.com/ggerganov/llama.cpp) at commit [`c8dedc99`](https://github.com/ggerganov/llama.cpp/commit/c8dedc9999eccf7821a9fe5b29f10e8d075e2217). --- Click here to get info on choosing the right GGUF model format ---
Liquid AI
Playground Leap
# LFM2-1.2B-Tool Based on [LFM2-1.2B](https://huggingface.co/LiquidAI/LFM2-1.2B), LFM2-1.2B-Tool is designed for **concise and precise tool calling**. The key challenge was designing a non-thinking model that outperforms similarly sized thinking models for tool use. **Use cases**: - Mobile and edge devices requiring instant API calls, database queries, or system integrations without cloud dependency. - Real-time assistants in cars, IoT devices, or customer support, where response latency is critical. - Resource-constrained environments like embedded systems or battery-powered devices needing efficient tool execution. You can find more information about other task-specific models in this [blog post](https://www.liquid.ai/blog/introducing-liquid-nanos-frontier-grade-performance-on-everyday-devices). ## πŸ“„ Model details **Generation parameters**: We recommend using greedy decoding with a `temperature=0`. **System prompt**: The system prompt must provide all the available tools **Supported languages**: English, Arabic, Chinese, French, German, Japanese, Korean, Portuguese, and Spanish. ![68d41b9699b7e1fafd645300_Model Library-Prompt + Answer](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/dc2-lsJ5PwoA8039z-wW_.png) **Tool use**: It consists of four main steps: 1. **Function definition**: LFM2 takes JSON function definitions as input (JSON objects between `<|tool_list_start|>` and `<|tool_list_end|>` special tokens), usually in the system prompt 2. **Function call**: LFM2 writes Pythonic function calls (a Python list between `<|tool_call_start|>` and `<|tool_call_end|>` special tokens), as the assistant answer. 3. **Function execution**: The function call is executed and the result is returned (string between `<|tool_response_start|>` and `<|tool_response_end|>` special tokens), as a "tool" role. 4. **Final answer**: LFM2 interprets the outcome of the function call to address the original user prompt in plain text. Here is a simple example of a conversation using tool use: ``` <|startoftext|><|im_start|>system List of tools: <|tool_list_start|>[{"name": "get_candidate_status", "description": "Retrieves the current status of a candidate in the recruitment process", "parameters": {"type": "object", "properties": {"candidate_id": {"type": "string", "description": "Unique identifier for the candidate"}}, "required": ["candidate_id"]}}]<|tool_list_end|><|im_end|> <|im_start|>user What is the current status of candidate ID 12345?<|im_end|> <|im_start|>assistant <|tool_call_start|>[get_candidate_status(candidate_id="12345")]<|tool_call_end|>Checking the current status of candidate ID 12345.<|im_end|> <|im_start|>tool <|tool_response_start|>{"candidate_id": "12345", "status": "Interview Scheduled", "position": "Clinical Research Associate", "date": "2023-11-20"}<|tool_response_end|><|im_end|> <|im_start|>assistant The candidate with ID 12345 is currently in the "Interview Scheduled" stage for the position of Clinical Research Associate, with an interview date set for 2023-11-20.<|im_end|> ``` > [!WARNING] > ⚠️ The model supports both single-turn and multi-turn conversations. ## πŸ“ˆ Performance For edge inference, latency is a crucial factor in delivering a seamless and satisfactory user experience. Consequently, while test-time-compute inherently provides more accuracy, it ultimately compromises the user experience due to increased waiting times for function calls. Therefore, the goal was to develop a tool calling model that is competitive with thinking models, yet operates without any internal chain-of-thought process. ![image](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/gdPh7juYTsxc7wpD5iYS_.png) We evaluated each model on a proprietary benchmark that was specifically designed to prevent data contamination. The benchmark ensures that performance metrics reflect genuine tool-calling capabilities rather than memorized patterns from training data. ## πŸƒ How to run - Hugging Face: [LFM2-350M](https://huggingface.co/LiquidAI/LFM2-350M) - llama.cpp: [LFM2-350M-Extract-GGUF](https://huggingface.co/LiquidAI/LFM2-350M-Extract-GGUF) - LEAP: [LEAP model library](https://leap.liquid.ai/models?model=lfm2-350M-extract) ## πŸ“¬ Contact If you are interested in custom solutions with edge deployment, please contact [our sales team](https://www.liquid.ai/contact). --- # πŸš€ If you find these models useful Help me test my **AI-Powered Quantum Network Monitor Assistant** with **quantum-ready security checks**: πŸ‘‰ [Quantum Network Monitor](https://readyforquantum.com/?assistant=open&utm_source=huggingface&utm_medium=referral&utm_campaign=huggingface_repo_readme) The full Open Source Code for the Quantum Network Monitor Service available at my github repos ( repos with NetworkMonitor in the name) : [Source Code Quantum Network Monitor](https://github.com/Mungert69). You will also find the code I use to quantize the models if you want to do it yourself [GGUFModelBuilder](https://github.com/Mungert69/GGUFModelBuilder) πŸ’¬ **How to test**: Choose an **AI assistant type**: - `TurboLLM` (GPT-4.1-mini) - `HugLLM` (Hugginface Open-source models) - `TestLLM` (Experimental CPU-only) ### **What I’m Testing** I’m pushing the limits of **small open-source models for AI network monitoring**, specifically: - **Function calling** against live network services - **How small can a model go** while still handling: - Automated **Nmap security scans** - **Quantum-readiness checks** - **Network Monitoring tasks** 🟑 **TestLLM** – Current experimental model (llama.cpp on 2 CPU threads on huggingface docker space): - βœ… **Zero-configuration setup** - ⏳ 30s load time (slow inference but **no API costs**) . No token limited as the cost is low. - πŸ”§ **Help wanted!** If you’re into **edge-device AI**, let’s collaborate! ### **Other Assistants** 🟒 **TurboLLM** – Uses **gpt-4.1-mini** : - **It performs very well but unfortunatly OpenAI charges per token. For this reason tokens usage is limited. - **Create custom cmd processors to run .net code on Quantum Network Monitor Agents** - **Real-time network diagnostics and monitoring** - **Security Audits** - **Penetration testing** (Nmap/Metasploit) πŸ”΅ **HugLLM** – Latest Open-source models: - 🌐 Runs on Hugging Face Inference API. Performs pretty well using the lastest models hosted on Novita. ### πŸ’‘ **Example commands you could test**: 1. `"Give me info on my websites SSL certificate"` 2. `"Check if my server is using quantum safe encyption for communication"` 3. `"Run a comprehensive security audit on my server"` 4. '"Create a cmd processor to .. (what ever you want)" Note you need to install a [Quantum Network Monitor Agent](https://readyforquantum.com/Download/?utm_source=huggingface&utm_medium=referral&utm_campaign=huggingface_repo_readme) to run the .net code on. This is a very flexible and powerful feature. Use with caution! ### Final Word I fund the servers used to create these model files, run the Quantum Network Monitor service, and pay for inference from Novita and OpenAIβ€”all out of my own pocket. All the code behind the model creation and the Quantum Network Monitor project is [open source](https://github.com/Mungert69). Feel free to use whatever you find helpful. If you appreciate the work, please consider [buying me a coffee](https://www.buymeacoffee.com/mahadeva) β˜•. Your support helps cover service costs and allows me to raise token limits for everyone. I'm also open to job opportunities or sponsorship. Thank you! 😊