nexaml commited on
Commit
7550715
·
verified ·
1 Parent(s): e359336

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -0
README.md ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - Qwen/Qwen3-VL-2B-Thinking
4
+ ---
5
+ # Qwen3-VL-2B-Thinking
6
+
7
+ Currently, only [NexaSDK](https://github.com/NexaAI/nexa-sdk) supports this GGUF.
8
+
9
+ ## Quickstart:
10
+ - Download [NexaSDK](https://github.com/NexaAI/nexa-sdk) with one click
11
+ - one line of code to run in your terminal:
12
+ ```
13
+ nexa infer NexaAI/Qwen3-VL-2B-Thinking-GGUF
14
+ ```
15
+
16
+ ## Model Description
17
+ **Qwen3-VL-2B-Thinking** is a 2-billion-parameter multimodal model from the Qwen3-VL family, optimized for **explicit reasoning and step-by-step visual understanding**.
18
+ It builds upon Qwen3-VL-2B with additional “thinking” supervision, allowing the model to **explain its reasoning process** across both text and images—ideal for research, education, and agentic applications requiring transparent decision traces.
19
+
20
+ ## Features
21
+ - **Visual reasoning**: Performs detailed, interpretable reasoning across images, diagrams, and UI elements.
22
+ - **Step-by-step thought traces**: Generates intermediate reasoning steps for transparency and debugging.
23
+ - **Multimodal understanding**: Supports text, images, and video inputs with consistent logical grounding.
24
+ - **Compact yet capable**: 2B parameters, optimized for low-latency inference and on-device deployment.
25
+ - **Instruction-tuned**: Enhanced alignment for “think-aloud” question answering and visual problem solving.
26
+
27
+ ## Use Cases
28
+ - Visual question answering with reasoning chains
29
+ - Step-by-step image or chart analysis for education and tutoring
30
+ - Debuggable AI agents and reasoning assistants
31
+ - Research on interpretable multimodal reasoning
32
+ - On-device transparent AI inference for visual domains
33
+
34
+ ## Inputs and Outputs
35
+ **Inputs**
36
+ - Text prompts or questions
37
+ - Images, diagrams, or UI screenshots
38
+ - Optional multi-turn reasoning chains
39
+
40
+ **Outputs**
41
+ - Natural language answers with explicit thought steps
42
+ - Detailed reasoning traces combining visual and textual logic
43
+
44
+ ## License
45
+ This model is released under the **Apache 2.0 License**.
46
+ Refer to the official Hugging Face page for license details and usage terms.
47
+
48
+ ## References
49
+ - [Qwen3-VL-2B-Thinking on Hugging Face](https://huggingface.co/Qwen/Qwen3-VL-2B-Thinking)
50
+ - [Qwen3 Technical Report (arXiv)](https://arxiv.org/abs/2407.10671)
51
+ - [Qwen GitHub Repository](https://github.com/QwenLM)