Playground

--- library_name: transformers license: other license_name: lfm1.0 license_link: LICENSE language: - en - ar - zh - fr - de - ja - ko - es pipeline_tag: text-generation tags: - liquid - lfm2 - edge base_model: LiquidAI/LFM2-1.2B --- # LFM2-1.2B-Tool GGUF Models ## Model Generation Details This model was generated using [llama.cpp](https://github.com/ggerganov/llama.cpp) at commit [`c8dedc99`](https://github.com/ggerganov/llama.cpp/commit/c8dedc9999eccf7821a9fe5b29f10e8d075e2217). --- Click here to get info on choosing the right GGUF model format ---

# LFM2-1.2B-Tool Based on [LFM2-1.2B](https://huggingface.co/LiquidAI/LFM2-1.2B), LFM2-1.2B-Tool is designed for **concise and precise tool calling**. The key challenge was designing a non-thinking model that outperforms similarly sized thinking models for tool use. **Use cases**: - Mobile and edge devices requiring instant API calls, database queries, or system integrations without cloud dependency. - Real-time assistants in cars, IoT devices, or customer support, where response latency is critical. - Resource-constrained environments like embedded systems or battery-powered devices needing efficient tool execution. You can find more information about other task-specific models in this [blog post](https://www.liquid.ai/blog/introducing-liquid-nanos-frontier-grade-performance-on-everyday-devices). ## 📄 Model details **Generation parameters**: We recommend using greedy decoding with a `temperature=0`. **System prompt**: The system prompt must provide all the available tools **Supported languages**: English, Arabic, Chinese, French, German, Japanese, Korean, Portuguese, and Spanish. ![68d41b9699b7e1fafd645300_Model Library-Prompt + Answer](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/dc2-lsJ5PwoA8039z-wW_.png) **Tool use**: It consists of four main steps: 1. **Function definition**: LFM2 takes JSON function definitions as input (JSON objects between `<|tool_list_start|>` and `<|tool_list_end|>` special tokens), usually in the system prompt 2. **Function call**: LFM2 writes Pythonic function calls (a Python list between `<|tool_call_start|>` and `<|tool_call_end|>` special tokens), as the assistant answer. 3. **Function execution**: The function call is executed and the result is returned (string between `<|tool_response_start|>` and `<|tool_response_end|>` special tokens), as a "tool" role. 4. **Final answer**: LFM2 interprets the outcome of the function call to address the original user prompt in plain text. Here is a simple example of a conversation using tool use: ``` <|startoftext|><|im_start|>system List of tools: <|tool_list_start|>[{"name": "get_candidate_status", "description": "Retrieves the current status of a candidate in the recruitment process", "parameters": {"type": "object", "properties": {"candidate_id": {"type": "string", "description": "Unique identifier for the candidate"}}, "required": ["candidate_id"]}}]<|tool_list_end|><|im_end|> <|im_start|>user What is the current status of candidate ID 12345?<|im_end|> <|im_start|>assistant <|tool_call_start|>[get_candidate_status(candidate_id="12345")]<|tool_call_end|>Checking the current status of candidate ID 12345.<|im_end|> <|im_start|>tool <|tool_response_start|>{"candidate_id": "12345", "status": "Interview Scheduled", "position": "Clinical Research Associate", "date": "2023-11-20"}<|tool_response_end|><|im_end|> <|im_start|>assistant The candidate with ID 12345 is currently in the "Interview Scheduled" stage for the position of Clinical Research Associate, with an interview date set for 2023-11-20.<|im_end|> ``` > [!WARNING] > ⚠️ The model supports both single-turn and multi-turn conversations. ## 📈 Performance For edge inference, latency is a crucial factor in delivering a seamless and satisfactory user experience. Consequently, while test-time-compute inherently provides more accuracy, it ultimately compromises the user experience due to increased waiting times for function calls. Therefore, the goal was to develop a tool calling model that is competitive with thinking models, yet operates without any internal chain-of-thought process. ![image](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/gdPh7juYTsxc7wpD5iYS_.png) We evaluated each model on a proprietary benchmark that was specifically designed to prevent data contamination. The benchmark ensures that performance metrics reflect genuine tool-calling capabilities rather than memorized patterns from training data. ## 🏃 How to run - Hugging Face: [LFM2-350M](https://huggingface.co/LiquidAI/LFM2-350M) - llama.cpp: [LFM2-350M-Extract-GGUF](https://huggingface.co/LiquidAI/LFM2-350M-Extract-GGUF) - LEAP: [LEAP model library](https://leap.liquid.ai/models?model=lfm2-350M-extract) ## 📬 Contact If you are interested in custom solutions with edge deployment, please contact [our sales team](https://www.liquid.ai/contact). --- # 🚀 If you find these models useful Help me test my **AI-Powered Quantum Network Monitor Assistant** with **quantum-ready security checks**: 👉 [Quantum Network Monitor](https://readyforquantum.com/?assistant=open&utm_source=huggingface&utm_medium=referral&utm_campaign=huggingface_repo_readme) The full Open Source Code for the Quantum Network Monitor Service available at my github repos ( repos with NetworkMonitor in the name) : [Source Code Quantum Network Monitor](https://github.com/Mungert69). You will also find the code I use to quantize the models if you want to do it yourself [GGUFModelBuilder](https://github.com/Mungert69/GGUFModelBuilder) 💬 **How to test**: Choose an **AI assistant type**: - `TurboLLM` (GPT-4.1-mini) - `HugLLM` (Hugginface Open-source models) - `TestLLM` (Experimental CPU-only) ### **What I’m Testing** I’m pushing the limits of **small open-source models for AI network monitoring**, specifically: - **Function calling** against live network services - **How small can a model go** while still handling: - Automated **Nmap security scans** - **Quantum-readiness checks** - **Network Monitoring tasks** 🟡 **TestLLM** – Current experimental model (llama.cpp on 2 CPU threads on huggingface docker space): - ✅ **Zero-configuration setup** - ⏳ 30s load time (slow inference but **no API costs**) . No token limited as the cost is low. - 🔧 **Help wanted!** If you’re into **edge-device AI**, let’s collaborate! ### **Other Assistants** 🟢 **TurboLLM** – Uses **gpt-4.1-mini** : - **It performs very well but unfortunatly OpenAI charges per token. For this reason tokens usage is limited. - **Create custom cmd processors to run .net code on Quantum Network Monitor Agents** - **Real-time network diagnostics and monitoring** - **Security Audits** - **Penetration testing** (Nmap/Metasploit) 🔵 **HugLLM** – Latest Open-source models: - 🌐 Runs on Hugging Face Inference API. Performs pretty well using the lastest models hosted on Novita. ### 💡 **Example commands you could test**: 1. `"Give me info on my websites SSL certificate"` 2. `"Check if my server is using quantum safe encyption for communication"` 3. `"Run a comprehensive security audit on my server"` 4. '"Create a cmd processor to .. (what ever you want)" Note you need to install a [Quantum Network Monitor Agent](https://readyforquantum.com/Download/?utm_source=huggingface&utm_medium=referral&utm_campaign=huggingface_repo_readme) to run the .net code on. This is a very flexible and powerful feature. Use with caution! ### Final Word I fund the servers used to create these model files, run the Quantum Network Monitor service, and pay for inference from Novita and OpenAI—all out of my own pocket. All the code behind the model creation and the Quantum Network Monitor project is [open source](https://github.com/Mungert69). Feel free to use whatever you find helpful. If you appreciate the work, please consider [buying me a coffee](https://www.buymeacoffee.com/mahadeva) ☕. Your support helps cover service costs and allows me to raise token limits for everyone. I'm also open to job opportunities or sponsorship. Thank you! 😊