--- language: en license: mit tags: - computer-vision - object-detection - yolov8 - document-analysis - heritage-ai - pytorch pipeline_tag: object-detection model-index: - name: TypoRef YOLOv8 Historical Document Detection results: - task: type: object-detection name: Object Detection dataset: name: TypoRef Historical Prints type: document-images metrics: - name: mAP type: map value: 0.95 --- # 📜 TypoRef YOLOv8 Historical Document Detection **Author:** Martin Badrous This repository packages an industrial research project on detecting decorative elements in historical documents. It provides a clear, reproducible pipeline built with YOLOv8 for local training and a ready‑to‑deploy [Gradio](https://gradio.app) demo for inference. The aim is to automatically find **lettrines**, **illustrations**, **bandeaux**, and **vignettes** in scanned pages from 16th–18th century printed works. Such detection enables large‑scale digital humanities projects by highlighting and indexing ornamental content in cultural heritage collections. --- ## 🧾 Overview The **TypoRef dataset** comprises high‑resolution scans of printed books from the TypoRef corpus. Annotators labeled four types of graphical elements: `lettrine` (decorative initials), `illustration` (engraved images), `bandeau` (horizontal bands), and `vignette` (small ornaments). We fine‑tune YOLOv8 on these images using annotation files converted to the YOLO format. The training script in this repository reuses the **Ultralytics YOLOv8 API**, exposing command‑line parameters for data path, model backbone, image size, batch size, epoch count, augmentation hyper‑ parameters and deterministic seeding. Evaluation and inference scripts mirror the training CLI for consistency. Once trained, the model achieves an impressive **mAP ≈ 0.95** on held‑out validation pages (computed with the COCO AP metric across classes). Inference runs in real time on consumer GPUs, making it suitable for production pipelines. --- ## 🗃️ Dataset The dataset used to train this model originates from the TypoRef collection of historical prints. Each page was scanned at 300–600 dpi and annotated with bounding boxes around ornaments. Labels and images must be organised into a **YOLO dataset structure**. A sample dataset configuration (`configs/ornaments.yaml`) is provided and expects the following folder structure relative to the file: ```text dataset_yolo/ ├── train/ │ ├── images/ │ └── labels/ ├── val/ │ ├── images/ │ └── labels/ └── test/ ├── images/ └── labels/ ``` If you start from VIA annotation JSON files, use `src/dataset_tools/convert_via_to_yolo.py` to convert them to YOLO text labels. Then split the data into train/val/test sets with `src/dataset_tools/split_dataset.py`. --- ## 🛠️ Training Install the dependencies and run the training script: ```bash python3 -m venv venv && source venv/bin/activate pip install -r requirements.txt # Train YOLOv8 on the TypoRef dataset python src/train.py \ --data configs/ornaments.yaml \ --model yolov8s.pt \ --imgsz 1024 \ --epochs 100 \ --batch 8 \ --project runs/typoref \ --name yolov8s_typoref ``` Checkpoints and logs will be saved under `runs/typoref/`. --- ## 🔍 Inference To perform inference on a folder of images using a trained model: ```bash python src/infer.py \ --weights runs/typoref/yolov8s_typoref/weights/best.pt \ --source path/to/page_images \ --imgsz 1024 \ --conf 0.25 \ --save_txt --save_conf ``` The predictions (bounding boxes and labels) will be written to `runs/predict/`. You can visualise them using the example Gradio app or the provided scripts. --- ## 🧠 Model Architecture & Training Details - **Backbone:** YOLOv8 (choose from `yolov8n.pt`, `yolov8s.pt`, etc.) - **Input size:** 1024×1024 pixels - **Batch size:** 8 - **Epochs:** 100 - **Optimisation:** SGD with momentum, weight decay, learning rate schedule provided in `configs/hyp_augment.yaml` - **Augmentations:** Horizontal flips, scale jittering, colour jitter, mosaic, and mixup - **Metrics:** mAP@50–95 ≈ 0.95 on validation set The training pipeline is deterministic when `--seed` is set. See `configs/hyp_augment.yaml` for the full list of augmentation hyper‑parameters. --- ## 📊 Performance Metrics | Metric | Value | |-------:|------:| | mAP@50–95 | 0.95 | | Precision | 0.94 | | Recall | 0.93 | | FPS (RTX 3060) | > 60 | These numbers are indicative of the typographical ornament detection task and may vary depending on dataset size and augmentations. --- ## 📸 Example Output An example historical page showing a detected ornament. ![Detected ornament example](dia.jpg) --- ## 🎛️ Demo Application A Gradio demo is included in `app.py`. It loads the model from Hugging Face and provides an intuitive drag‑and‑drop interface for inference. To run the demo locally: ```bash python app.py ``` The model identifier in `app.py` is set to `martinbadrous/TypoRef-YOLOv8-Historical-Document-Detection`. If you use a different model ID or a local checkpoint, update the string accordingly. --- ## 📖 Citation If you use this repository or the model in your research, please cite it as follows: ```bibtex @misc{badrous2025typoref, author = {Martin Badrous}, title = {TypoRef YOLOv8 Historical Document Detection}, year = {2025}, howpublished = {Hugging Face repository}, url = {https://huggingface.co/martinbadrous/TypoRef-YOLOv8-Historical-Document-Detection} } ``` --- ## 👤 Contact For questions or collaboration requests, feel free to email **martin.badrous@gmail.com**. --- ## 🪪 License This project is released under the MIT License. See the [LICENSE](LICENSE) file for details.