Nayana-cognitivelab
/

NayanaSectionOCR

@@ -1,21 +1,157 @@
 ---
-base_model: Nayana-cognitivelab/SectionOCR_SFT_v3_half
 tags:
 - text-generation-inference
 - transformers
 - unsloth
 - gemma3n
 license: apache-2.0
 language:
 - en
 ---
-# Uploaded finetuned  model
-- **Developed by:** Nayana-cognitivelab
-- **License:** apache-2.0
-- **Finetuned from model :** Nayana-cognitivelab/SectionOCR_SFT_v3_half
-This gemma3n model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 ---
+base_model: unsloth/gemma-3n-E4B-it
 tags:
 - text-generation-inference
 - transformers
 - unsloth
 - gemma3n
+- ocr
+- document-understanding
+- multilingual
+- vision-language
 license: apache-2.0
 language:
 - en
+- kn
+- hi
+- mr
+- sa
+datasets:
+- Nayana-cognitivelab/SectionOCR-SFT
+library_name: transformers
+pipeline_tag: image-text-to-text
 ---
+# 🔍 Nayana SectionOCR - Advanced Multilingual OCR Model
+**Developed by:** [CognitiveLab](https://nayana.cognitivelab.in/)
+**License:** Apache 2.0
+**Base Model:** unsloth/gemma-3n-E4B-it
+**Architecture:** Gemma 3n (4B parameters)
+## 🌟 Model Overview
+Nayana SectionOCR is an advanced multilingual vision-language model specifically fine-tuned for Optical Character Recognition (OCR) and Document Visual Question Answering (Document VQA) tasks. Built on the powerful Gemma 3n architecture, this model excels at understanding and extracting text from complex visual documents across multiple languages.
+## 🌍 Supported Languages
+- **English** (en) - Primary language
+- **Kannada** (kn) - Indian regional language
+- **Hindi** (hi) - Indian national language
+- **Marathi** (mr) - Indian regional language
+- **Sanskrit** (sa) - Classical language
+17 other languages coming soon !!!!!
+## 🎯 Key Features
+- **Multilingual OCR**: Accurate text extraction in 5 languages
+- **Document Understanding**: Advanced layout and structure comprehension
+- **Fast Inference**: Optimized for real-time applications
+- **High Accuracy**: Fine-tuned on diverse document datasets
+- **Easy Integration**: Compatible with Transformers and Modal deployment
+## 📋 Model Specifications
+| Parameter | Value |
+|-----------|-------|
+| Model Size | 4B parameters |
+| Context Length | 32K tokens |
+| Image Resolution | Flexible (optimized for documents) |
+| Precision | BFloat16 |
+| Framework | Transformers + Unsloth |
+## 🚀 Quick Start
+### Installation
+````python
+pip install transformers torch pillow unsloth
+````
+### Basic Usage
+````python
+from transformers import AutoProcessor, AutoModelForImageTextToText
+from PIL import Image
+import torch
+# Load model and processor
+model_id = "Nayana-cognitivelab/NayanaSectionOCR"
+processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
+model = AutoModelForImageTextToText.from_pretrained(
+    model_id,
+    device_map="auto",
+    torch_dtype=torch.bfloat16,
+    trust_remote_code=True
+)
+# System prompt
+system_prompt = "You are Nayana, an advanced AI assistant developed by CognitiveLab. You specialize in vision-based tasks, particularly Optical Character Recognition (OCR) and Document Visual Question Answering (Document VQA). You are highly accurate, fast, and reliable when working with complex visual documents. Most importantly, you are multilingual, capable of understanding and processing documents in a wide range of languages with precision."
+# Load and process image
+image = Image.open("your_document.jpg")
+language = "English"  # or "Kannada", "Hindi", "Marathi", "Sanskrit"
+user_prompt = f"Extract the text from this image in {language}"
+# Prepare messages
+messages = [
+    {
+        "role": "system",
+        "content": [{"type": "text", "text": system_prompt}]
+    },
+    {
+        "role": "user",
+        "content": [
+            {"type": "text", "text": user_prompt},
+            {"type": "image", "image": image}
+        ]
+    }
+]
+# Apply chat template
+inputs = processor.apply_chat_template(
+    messages,
+    add_generation_prompt=True,
+    tokenize=True,
+    return_dict=True,
+    return_tensors="pt"
+)
+# Generate response
+with torch.inference_mode():
+    outputs = model.generate(
+        **inputs,
+        max_new_tokens=1024,
+        temperature=1.0,
+        top_p=0.95,
+        top_k=64,
+        do_sample=True
+    )
+# Decode response
+response = processor.tokenizer.decode(
+    outputs[0][inputs["input_ids"].shape[1]:],
+    skip_special_tokens=True
+)
+print(response)
+````
+This model was trained **2x faster** with [Unsloth](https://github.com/unslothai/unsloth) and Hugging Face's TRL library.
 [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
+## 📜 Citation
+```bibtex
+@model{nayana_sectionocr_2024,
+  title={Nayana SectionOCR: Multilingual Document Understanding with Gemma 3n},
+  author={CognitiveLab},
+  year={2024},
+  url={https://huggingface.co/Nayana-cognitivelab/SectionOCR_SFT_v3_half_en_kn_hi_sa_mr_7250}
+}
+```