Model Card for Qwen3-VL-8B-old-church-slavonic

This model is a fine-tuned version of Qwen/Qwen3-VL-8B-Instruct for transcribing medieval Old Church Slavonic manuscripts from images. It has been trained using TRL on the medieval-data/church-slavonic-region dataset.

Model Description

This vision-language model specializes in transcribing text from images of medieval Old Church Slavonic manuscripts documents. Given an image of shorthand text, the model generates the corresponding transcription.

Quick start

from transformers import AutoProcessor, Qwen3VLForConditionalGeneration
from peft import PeftModel
from PIL import Image

# Load model and processor
base_model = "Qwen/Qwen3-VL-8B-Instruct"
adapter_model = "wjbmattingly/Qwen3-VL-8B-old-church-slavonic"

model = Qwen3VLForConditionalGeneration.from_pretrained(
    base_model,
    torch_dtype="auto",
    device_map="auto"
)
model = PeftModel.from_pretrained(model, adapter_model)
processor = AutoProcessor.from_pretrained(base_model)

# Load your image
image = Image.open("path/to/your/shorthand_image.jpg")

# Prepare the message
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": "Transcribe the text shown in this image."},
        ],
    },
]

# Generate transcription
inputs = processor.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_dict=True,
    return_tensors="pt"
).to(model.device)

generated_ids = model.generate(**inputs, max_new_tokens=256)
generated_ids_trimmed = [
    out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
transcription = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)[0]

print(transcription)

Use Cases

This model is designed for:

Transcribing medieval Old Church Slavonic manuscripts documents
Digitizing historical manuscripts
Supporting historical research and archival work
Optical Character Recognition (OCR) for specialized historical texts

Training procedure

This model was fine-tuned using Supervised Fine-Tuning (SFT) with LoRA adapters on the Qwen3-VL-8B-Instruct base model.

Training Data

The model was trained on medieval-data/church-slavonic-region, a dataset containing images of medieval Old Church Slavonic manuscripts with corresponding text transcriptions.

Training Configuration

Base Model: Qwen/Qwen3-VL-8B-Instruct
Training Method: Supervised Fine-Tuning (SFT) with LoRA
LoRA Configuration:
- Rank (r): 16
- Alpha: 32
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Dropout: 0.1
Training Arguments:
- Epochs: 3
- Batch size per device: 2
- Gradient accumulation steps: 4
- Learning rate: 5e-05
- Optimizer: AdamW
- Mixed precision: FP16

Framework versions

TRL: 0.23.0
Transformers: 4.57.1
Pytorch: 2.8.0
Datasets: 4.1.1
Tokenizers: 0.22.1

Citations

If you use this model, please cite the base model and training framework:

Qwen3-VL

@article{Qwen3-VL,
  title={Qwen3-VL: Large Vision Language Models Pretrained on Massive Data},
  author={Qwen Team},
  journal={arXiv preprint},
  year={2024}
}

TRL (Transformer Reinforcement Learning)

@misc{vonwerra2022trl,
        title        = {{TRL: Transformer Reinforcement Learning}},
        author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'{'e}}dec},
        year         = 2020,
        journal      = {GitHub repository},
        publisher    = {GitHub},
        howpublished = {\url{https://github.com/huggingface/trl}}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for wjbmattingly/Qwen3-VL-8B-old-church-slavonic

Base model

Qwen/Qwen3-VL-8B-Instruct

Finetuned

(22)

this model