Model Card for Qwen3-VL-8B-old-church-slavonic
This model is a fine-tuned version of Qwen/Qwen3-VL-8B-Instruct for transcribing medieval Old Church Slavonic manuscripts from images. It has been trained using TRL on the medieval-data/church-slavonic-region dataset.
Model Description
This vision-language model specializes in transcribing text from images of medieval Old Church Slavonic manuscripts documents. Given an image of shorthand text, the model generates the corresponding transcription.
Quick start
from transformers import AutoProcessor, Qwen3VLForConditionalGeneration
from peft import PeftModel
from PIL import Image
# Load model and processor
base_model = "Qwen/Qwen3-VL-8B-Instruct"
adapter_model = "wjbmattingly/Qwen3-VL-8B-old-church-slavonic"
model = Qwen3VLForConditionalGeneration.from_pretrained(
base_model,
torch_dtype="auto",
device_map="auto"
)
model = PeftModel.from_pretrained(model, adapter_model)
processor = AutoProcessor.from_pretrained(base_model)
# Load your image
image = Image.open("path/to/your/shorthand_image.jpg")
# Prepare the message
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": "Transcribe the text shown in this image."},
],
},
]
# Generate transcription
inputs = processor.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_dict=True,
return_tensors="pt"
).to(model.device)
generated_ids = model.generate(**inputs, max_new_tokens=256)
generated_ids_trimmed = [
out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
transcription = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)[0]
print(transcription)
Use Cases
This model is designed for:
- Transcribing medieval Old Church Slavonic manuscripts documents
- Digitizing historical manuscripts
- Supporting historical research and archival work
- Optical Character Recognition (OCR) for specialized historical texts
Training procedure
This model was fine-tuned using Supervised Fine-Tuning (SFT) with LoRA adapters on the Qwen3-VL-8B-Instruct base model.
Training Data
The model was trained on medieval-data/church-slavonic-region, a dataset containing images of medieval Old Church Slavonic manuscripts with corresponding text transcriptions.
Training Configuration
- Base Model: Qwen/Qwen3-VL-8B-Instruct
- Training Method: Supervised Fine-Tuning (SFT) with LoRA
- LoRA Configuration:
- Rank (r): 16
- Alpha: 32
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Dropout: 0.1
- Training Arguments:
- Epochs: 3
- Batch size per device: 2
- Gradient accumulation steps: 4
- Learning rate: 5e-05
- Optimizer: AdamW
- Mixed precision: FP16
Framework versions
- TRL: 0.23.0
- Transformers: 4.57.1
- Pytorch: 2.8.0
- Datasets: 4.1.1
- Tokenizers: 0.22.1
Citations
If you use this model, please cite the base model and training framework:
Qwen3-VL
@article{Qwen3-VL,
title={Qwen3-VL: Large Vision Language Models Pretrained on Massive Data},
author={Qwen Team},
journal={arXiv preprint},
year={2024}
}
TRL (Transformer Reinforcement Learning)
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'{'e}}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
Model tree for wjbmattingly/Qwen3-VL-8B-old-church-slavonic
Base model
Qwen/Qwen3-VL-8B-Instruct