layoutlmv3-receipt-invoice

LayoutLMv3 model initialized for receipt and invoice field extraction.

Model Status

⚠️ This is an initialized base model - not yet fine-tuned on custom data.

Base Model: microsoft/layoutlmv3-base
Status: Ready for deployment and fine-tuning
Custom Labels: Configured for receipt/invoice field extraction

Intended Use

This model is configured to extract the following fields from receipts and invoices:

Supported Fields

[ "O", "B-MerchantName", "I-MerchantName", "B-MerchantAddress", "I-MerchantAddress", "B-TransactionDate", "I-TransactionDate", "B-Currency", "I-Currency", "B-Total", "I-Total", "B-TotalTax", "I-TotalTax", "B-InvoiceNumber", "I-InvoiceNumber", "B-Subtotal", "I-Subtotal", "B-LineItems", "I-LineItems" ]

Training Status

This repository contains:

✅ Base LayoutLMv3 architecture
✅ Custom label configuration for receipts/invoices
⏳ Not yet fine-tuned - using pre-trained weights from microsoft/layoutlmv3-base

Training the Model

To fine-tune this model on your custom data:

# On RunPod GPU pod or local machine with GPU
python main.py --mode train --push-to-hub --version v1.0

This will:

Train on your labeled receipt/invoice data
Update this repository with fine-tuned weights
Tag the trained version (e.g., v1.0, v1.1, etc.)

Usage

Local Inference

from transformers import LayoutLMv3ForTokenClassification, LayoutLMv3Processor
from PIL import Image

# Load model and processor
model = LayoutLMv3ForTokenClassification.from_pretrained("mkdigitalgmbh/runpo-LayoutLM3-Invoice-Receipt")
processor = LayoutLMv3Processor.from_pretrained("mkdigitalgmbh/runpo-LayoutLM3-Invoice-Receipt", apply_ocr=False)

# Prepare inputs (you need OCR results: words and bounding boxes)
image = Image.open("receipt.jpg").convert("RGB")
words = ["STORE", "NAME", "Total:", "$10.99"]
boxes = [[10, 10, 100, 30], [110, 10, 200, 30], [10, 50, 80, 70], [90, 50, 150, 70]]

# Normalize boxes to 0-1000 range
width, height = image.size
normalized_boxes = [[int(1000*x0/width), int(1000*y0/height),
                      int(1000*x1/width), int(1000*y1/height)] for x0,y0,x1,y1 in boxes]

encoding = processor(image, words, boxes=normalized_boxes, return_tensors="pt")
outputs = model(**encoding)
predictions = outputs.logits.argmax(-1)

RunPod Serverless Deployment

This model is designed for deployment on RunPod Serverless:

Build and push Docker image:

cd deployment/runpod/LayoutLMv3
python deploy.py --action deploy

Create RunPod endpoint:
- Docker Image: registry.hf.space/your-username/layoutlmv3-inference:latest
- Environment Variables:
  - HF_REPO_ID=mkdigitalgmbh/runpo-LayoutLM3-Invoice-Receipt
  - HF_TOKEN=<your-token>
  - MODEL_VERSION=main (or specific version tag after training)

Model Architecture

Base: microsoft/layoutlmv3-base
Task: Token Classification
Input: Image + Words + Bounding Boxes
Output: Field labels (IOB tagging scheme)
Number of Labels: 19

Label Schema

The model uses IOB (Inside-Outside-Beginning) tagging:

O: Outside any field
B-FieldName: Beginning of a field
I-FieldName: Inside/continuation of a field

Example

Text:        ["Total:", "$", "10", ".", "99"]
Labels:      ["B-Total", "I-Total", "I-Total", "I-Total", "I-Total"]
Extracted:   Total: "$ 10 . 99"

Version History

Version	Date	Description	Status
main	2025-11-13	Initialized with base model + custom labels	Base (not trained)

After training, versions will be tagged (v1.0, v1.1, etc.).

Training Configuration

When training is performed, the following configuration will be used:

{
  "model_name": "microsoft/layoutlmv3-base",
  "learning_rate": 5e-05,
  "batch_size": 4,
  "num_epochs": 20,
  "warmup_steps": 500,
  "max_length": 512,
  "validation_split": 0.2,
  "random_seed": 42,
  "gradient_accumulation_steps": 2,
  "eval_steps": 100,
  "save_steps": 500,
  "logging_steps": 50
}

Citation

@misc{layoutlmv3-receipt-invoice,
  author = {MK Digital GmbH},
  title = {LayoutLMv3 Receipt/Invoice Field Extraction},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/mkdigitalgmbh/runpo-LayoutLM3-Invoice-Receipt}}
}

@article{huang2022layoutlmv3,
  title={LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking},
  author={Huang, Yupan and Lv, Tengchao and Cui, Lei and Lu, Yutong and Wei, Furu},
  journal={arXiv preprint arXiv:2204.08387},
  year={2022}
}

License

Apache 2.0

Contact

For questions or issues, please open an issue in the repository.

Downloads last month: 5

Safetensors

Model size

0.1B params

Tensor type

F32