πŸ“œ TypoRef YOLOv8 Historical Document Detection

Author: Martin Badrous

This repository packages an industrial research project on detecting decorative elements in historical documents. It provides a clear, reproducible pipeline built with YOLOv8 for local training and a ready‑to‑deploy Gradio demo for inference. The aim is to automatically find lettrines, illustrations, bandeaux, and vignettes in scanned pages from 16th–18th century printed works. Such detection enables large‑scale digital humanities projects by highlighting and indexing ornamental content in cultural heritage collections.


🧾 Overview

The TypoRef dataset comprises high‑resolution scans of printed books from the TypoRef corpus. Annotators labeled four types of graphical elements: lettrine (decorative initials), illustration (engraved images), bandeau (horizontal bands), and vignette (small ornaments). We fine‑tune YOLOv8 on these images using annotation files converted to the YOLO format.

The training script in this repository reuses the Ultralytics YOLOv8 API, exposing command‑line parameters for data path, model backbone, image size, batch size, epoch count, augmentation hyper‑ parameters and deterministic seeding. Evaluation and inference scripts mirror the training CLI for consistency.

Once trained, the model achieves an impressive mAP β‰ˆ 0.95 on held‑out validation pages (computed with the COCO AP metric across classes). Inference runs in real time on consumer GPUs, making it suitable for production pipelines.


πŸ—ƒοΈ Dataset

The dataset used to train this model originates from the TypoRef collection of historical prints. Each page was scanned at 300–600 dpi and annotated with bounding boxes around ornaments. Labels and images must be organised into a YOLO dataset structure. A sample dataset configuration (configs/ornaments.yaml) is provided and expects the following folder structure relative to the file:

dataset_yolo/
β”œβ”€β”€ train/
β”‚   β”œβ”€β”€ images/
β”‚   └── labels/
β”œβ”€β”€ val/
β”‚   β”œβ”€β”€ images/
β”‚   └── labels/
└── test/
    β”œβ”€β”€ images/
    └── labels/

If you start from VIA annotation JSON files, use src/dataset_tools/convert_via_to_yolo.py to convert them to YOLO text labels. Then split the data into train/val/test sets with src/dataset_tools/split_dataset.py.


πŸ› οΈ Training

Install the dependencies and run the training script:

python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt

# Train YOLOv8 on the TypoRef dataset
python src/train.py \
  --data configs/ornaments.yaml \
  --model yolov8s.pt \
  --imgsz 1024 \
  --epochs 100 \
  --batch 8 \
  --project runs/typoref \
  --name yolov8s_typoref

Checkpoints and logs will be saved under runs/typoref/.


πŸ” Inference

To perform inference on a folder of images using a trained model:

python src/infer.py \
  --weights runs/typoref/yolov8s_typoref/weights/best.pt \
  --source path/to/page_images \
  --imgsz 1024 \
  --conf 0.25 \
  --save_txt --save_conf

The predictions (bounding boxes and labels) will be written to runs/predict/. You can visualise them using the example Gradio app or the provided scripts.


🧠 Model Architecture & Training Details

  • Backbone: YOLOv8 (choose from yolov8n.pt, yolov8s.pt, etc.)
  • Input size: 1024Γ—1024 pixels
  • Batch size: 8
  • Epochs: 100
  • Optimisation: SGD with momentum, weight decay, learning rate schedule provided in configs/hyp_augment.yaml
  • Augmentations: Horizontal flips, scale jittering, colour jitter, mosaic, and mixup
  • Metrics: mAP@50–95 β‰ˆ 0.95 on validation set

The training pipeline is deterministic when --seed is set. See configs/hyp_augment.yaml for the full list of augmentation hyper‑parameters.


πŸ“Š Performance Metrics

Metric Value
mAP@50–95 0.95
Precision 0.94
Recall 0.93
FPS (RTX 3060) > 60

These numbers are indicative of the typographical ornament detection task and may vary depending on dataset size and augmentations.


πŸ“Έ Example Output

An example historical page showing a detected ornament.

Detected ornament example


πŸŽ›οΈ Demo Application

A Gradio demo is included in app.py. It loads the model from Hugging Face and provides an intuitive drag‑and‑drop interface for inference. To run the demo locally:

python app.py

The model identifier in app.py is set to martinbadrous/TypoRef-YOLOv8-Historical-Document-Detection. If you use a different model ID or a local checkpoint, update the string accordingly.


πŸ“– Citation

If you use this repository or the model in your research, please cite it as follows:

@misc{badrous2025typoref,
  author       = {Martin Badrous},
  title        = {TypoRef YOLOv8 Historical Document Detection},
  year         = {2025},
  howpublished = {Hugging Face repository},
  url          = {https://huggingface.co/martinbadrous/TypoRef-YOLOv8-Historical-Document-Detection}
}

πŸ‘€ Contact

For questions or collaboration requests, feel free to email [email protected].


πŸͺͺ License

This project is released under the MIT License. See the LICENSE file for details.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Evaluation results