NexaAI
/

Granite-4-Micro-NPU

Text Generation

Model card Files Files and versions

Granite-4-Micro-NPU / README.md

zackli4ai's picture

Update README.md

3f2cf4c verified about 1 month ago

|

history blame contribute delete

2.3 kB

	---
	pipeline_tag: text-generation
	tags:
	- NPU
	---
	# Granite-4.0-Micro
	Run Granite-4.0-Micro optimized for Qualcomm NPUs with [nexaSDK](https://sdk.nexa.ai).

	## Quickstart

	1. Install NexaSDK and create a free account at [sdk.nexa.ai](https://sdk.nexa.ai)
	2. Activate your device with your access token:

	```bash
	nexa config set license '<access_token>'
	```
	3. Run the model on Qualcomm NPU in one line:

	```bash
	nexa infer NexaAI/Granite-4-Micro-NPU
	```

	## Model Description
	Granite-4.0-Micro is a 3B parameter instruction-tuned model in the Granite 4.0 family, developed by IBM.
	It’s optimized for long-context reasoning (128K tokens), efficient inference, and enterprise-ready capabilities such as tool calling and retrieval-augmented generation. The model balances compact size with strong performance across general NLP tasks, making it suitable for both experimentation and production workloads.

	## Features
	- Compact transformer architecture: 3B parameters with GQA, RoPE, SwiGLU, and RMSNorm layers.
	- Instruction-following & tool calling: Tuned with supervised finetuning, alignment (RLHF), and model merging for robust enterprise tasks.
	- Multilingual support: Covers 12+ languages including English, German, Spanish, French, Japanese, Korean, Arabic, and Chinese.
	- Extended context window: Supports sequences up to 128K tokens for long-form reasoning.

	## Use Cases
	- Conversational AI and virtual assistants.
	- Enterprise applications needing tool/API calling and structured outputs.
	- Long-document summarization, classification, and extraction.
	- Retrieval-augmented generation (RAG) for knowledge-intensive workflows.
	- Lightweight coding assistants and multilingual dialog systems.

	## Inputs and Outputs
	Input: Natural language text prompts, chat conversations, or tool-augmented requests.
	Output: Natural language responses—answers, explanations, summaries, structured JSON for function calls, or code snippets.

	## License
	This model is released under the Creative Commons Attribution–NonCommercial 4.0 (CC BY-NC 4.0) license.
	Non-commercial use, modification, and redistribution are permitted with attribution.
	For commercial licensing, please contact [email protected].