NayanaSectionOCR / README.md

AdithyaSK

Update README.md

5898981 verified 3 months ago

preview code

raw

history blame contribute delete

4.48 kB

metadata

base_model: unsloth/gemma-3n-E4B-it
tags:
  - text-generation-inference
  - transformers
  - unsloth
  - gemma3n
  - ocr
  - document-understanding
  - multilingual
  - vision-language
license: apache-2.0
language:
  - en
  - kn
  - hi
  - mr
  - sa
datasets:
  - Nayana-cognitivelab/SectionOCR-SFT
library_name: transformers
pipeline_tag: image-text-to-text

🔍 Nayana SectionOCR - Advanced Multilingual OCR Model

Developed by: CognitiveLab
License: Apache 2.0
Base Model: unsloth/gemma-3n-E4B-it
Architecture: Gemma 3n (4B parameters)

🌟 Model Overview

Nayana SectionOCR is an advanced multilingual vision-language model specifically fine-tuned for Optical Character Recognition (OCR) and Document Visual Question Answering (Document VQA) tasks. Built on the powerful Gemma 3n architecture, this model excels at understanding and extracting text from complex visual documents across multiple languages.

🌍 Supported Languages

English (en) - Primary language
Kannada (kn) - Indian regional language
Hindi (hi) - Indian national language
Marathi (mr) - Indian regional language
Sanskrit (sa) - Classical language

17 other languages coming soon !!!!!

🎯 Key Features

Multilingual OCR: Accurate text extraction in 5 languages
Document Understanding: Advanced layout and structure comprehension
Fast Inference: Optimized for real-time applications
High Accuracy: Fine-tuned on diverse document datasets
Easy Integration: Compatible with Transformers and Modal deployment

📋 Model Specifications

Parameter	Value
Model Size	4B parameters
Context Length	32K tokens
Image Resolution	Flexible (optimized for documents)
Precision	BFloat16
Framework	Transformers + Unsloth

🚀 Quick Start

Installation

pip install transformers torch pillow unsloth

Basic Usage

from transformers import AutoProcessor, AutoModelForImageTextToText
from PIL import Image
import torch

# Load model and processor
model_id = "Nayana-cognitivelab/NayanaSectionOCR"
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True
)

# System prompt
system_prompt = "You are Nayana, an advanced AI assistant developed by CognitiveLab. You specialize in vision-based tasks, particularly Optical Character Recognition (OCR) and Document Visual Question Answering (Document VQA). You are highly accurate, fast, and reliable when working with complex visual documents. Most importantly, you are multilingual, capable of understanding and processing documents in a wide range of languages with precision."

# Load and process image
image = Image.open("your_document.jpg")
language = "English"  # or "Kannada", "Hindi", "Marathi", "Sanskrit"
user_prompt = f"Extract the text from this image in {language}"

# Prepare messages
messages = [
    {
        "role": "system",
        "content": [{"type": "text", "text": system_prompt}]
    },
    {
        "role": "user", 
        "content": [
            {"type": "text", "text": user_prompt},
            {"type": "image", "image": image}
        ]
    }
]

# Apply chat template
inputs = processor.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt"
)

# Generate response
with torch.inference_mode():
    outputs = model.generate(
        **inputs,
        max_new_tokens=1024,
        temperature=1.0,
        top_p=0.95,
        top_k=64,
        do_sample=True
    )

# Decode response
response = processor.tokenizer.decode(
    outputs[0][inputs["input_ids"].shape[1]:], 
    skip_special_tokens=True
)
print(response)

This model was trained 2x faster with Unsloth and Hugging Face's TRL library.

📜 Citation

@model{nayana_sectionocr_2024,
  title={Nayana SectionOCR: Multilingual Document Understanding with Gemma 3n},
  author={CognitiveLab},
  year={2024},
  url={https://huggingface.co/Nayana-cognitivelab/SectionOCR_SFT_v3_half_en_kn_hi_sa_mr_7250}
}