Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,51 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
base_model:
|
| 3 |
+
- Qwen/Qwen3-VL-2B-Thinking
|
| 4 |
+
---
|
| 5 |
+
# Qwen3-VL-2B-Thinking
|
| 6 |
+
|
| 7 |
+
Currently, only [NexaSDK](https://github.com/NexaAI/nexa-sdk) supports this GGUF.
|
| 8 |
+
|
| 9 |
+
## Quickstart:
|
| 10 |
+
- Download [NexaSDK](https://github.com/NexaAI/nexa-sdk) with one click
|
| 11 |
+
- one line of code to run in your terminal:
|
| 12 |
+
```
|
| 13 |
+
nexa infer NexaAI/Qwen3-VL-2B-Thinking-GGUF
|
| 14 |
+
```
|
| 15 |
+
|
| 16 |
+
## Model Description
|
| 17 |
+
**Qwen3-VL-2B-Thinking** is a 2-billion-parameter multimodal model from the Qwen3-VL family, optimized for **explicit reasoning and step-by-step visual understanding**.
|
| 18 |
+
It builds upon Qwen3-VL-2B with additional “thinking” supervision, allowing the model to **explain its reasoning process** across both text and images—ideal for research, education, and agentic applications requiring transparent decision traces.
|
| 19 |
+
|
| 20 |
+
## Features
|
| 21 |
+
- **Visual reasoning**: Performs detailed, interpretable reasoning across images, diagrams, and UI elements.
|
| 22 |
+
- **Step-by-step thought traces**: Generates intermediate reasoning steps for transparency and debugging.
|
| 23 |
+
- **Multimodal understanding**: Supports text, images, and video inputs with consistent logical grounding.
|
| 24 |
+
- **Compact yet capable**: 2B parameters, optimized for low-latency inference and on-device deployment.
|
| 25 |
+
- **Instruction-tuned**: Enhanced alignment for “think-aloud” question answering and visual problem solving.
|
| 26 |
+
|
| 27 |
+
## Use Cases
|
| 28 |
+
- Visual question answering with reasoning chains
|
| 29 |
+
- Step-by-step image or chart analysis for education and tutoring
|
| 30 |
+
- Debuggable AI agents and reasoning assistants
|
| 31 |
+
- Research on interpretable multimodal reasoning
|
| 32 |
+
- On-device transparent AI inference for visual domains
|
| 33 |
+
|
| 34 |
+
## Inputs and Outputs
|
| 35 |
+
**Inputs**
|
| 36 |
+
- Text prompts or questions
|
| 37 |
+
- Images, diagrams, or UI screenshots
|
| 38 |
+
- Optional multi-turn reasoning chains
|
| 39 |
+
|
| 40 |
+
**Outputs**
|
| 41 |
+
- Natural language answers with explicit thought steps
|
| 42 |
+
- Detailed reasoning traces combining visual and textual logic
|
| 43 |
+
|
| 44 |
+
## License
|
| 45 |
+
This model is released under the **Apache 2.0 License**.
|
| 46 |
+
Refer to the official Hugging Face page for license details and usage terms.
|
| 47 |
+
|
| 48 |
+
## References
|
| 49 |
+
- [Qwen3-VL-2B-Thinking on Hugging Face](https://huggingface.co/Qwen/Qwen3-VL-2B-Thinking)
|
| 50 |
+
- [Qwen3 Technical Report (arXiv)](https://arxiv.org/abs/2407.10671)
|
| 51 |
+
- [Qwen GitHub Repository](https://github.com/QwenLM)
|