ML-Danbooru ONNX Models

Summary

This repository provides ONNX-optimized implementations of the ML-Danbooru image tagging models, originally developed by 7eu7d7. ML-Danbooru is a sophisticated deep learning system specifically designed for automated tagging of anime-style images, leveraging modern transformer architectures to achieve high-precision classification across thousands of Danbooru-style tags. The models in this repository have been converted to ONNX format for improved inference performance and cross-platform compatibility.

The core architecture employs Caformer (Convolution-Augmented Transformer) models, which combine the global receptive field of transformers with the local feature extraction capabilities of convolutional networks. This hybrid approach enables the models to effectively capture both fine-grained details and global contextual information in anime artwork. The repository includes multiple model variants trained with different configurations and epochs, providing users with options ranging from faster inference to higher accuracy depending on their specific requirements.

Performance-wise, these models demonstrate exceptional accuracy in recognizing common anime character attributes, clothing items, accessories, backgrounds, and compositional elements. They can reliably identify tags such as hair colors, eye colors, clothing types, character poses, and scene settings with confidence scores typically exceeding 0.7-0.9 for relevant features. The models support batch processing and can handle images of various aspect ratios through intelligent resizing strategies that preserve important visual information while maintaining computational efficiency.

Usage

The models in this repository are designed to be used with the dghs-imgutils library, which provides a comprehensive interface for image tagging tasks.

Installation

pip install dghs-imgutils

Basic Usage

from imgutils.tagging import get_mldanbooru_tags

# Tag an image with default settings
tags = get_mldanbooru_tags('your_image.jpg')
print(tags)

# Tag with custom threshold and settings
tags_custom = get_mldanbooru_tags(
    'your_image.jpg',
    threshold=0.5,
    size=448,
    keep_ratio=True,
    drop_overlap=True,
    use_real_name=False
)
print(tags_custom)

Model Variants

This repository contains multiple ML-Danbooru model variants:

ml_caformer_m36_dec-5-97527.onnx: Primary model with Caformer-M36 architecture
ml_caformer_m36_dec-3-80000.onnx: Alternative checkpoint with different training
TResnet-D-FLq_ema_2-40000.onnx: TResnet-based variant
TResnet-D-FLq_ema_4-10000.onnx: Lightweight TResnet variant
TResnet-D-FLq_ema_6-10000.onnx: Additional TResnet checkpoint
TResnet-D-FLq_ema_6-30000.onnx: Extended training TResnet variant
caformer_m36-3-80000.onnx: Base Caformer model

Tag Information

The repository includes comprehensive tag information:

classes.json: Contains 1,527 simplified tag names for common anime attributes
tags.csv: Complete tag database with 12,547 entries including:
- Original tag names
- Root forms for morphological variations
- Part-of-speech classifications
- Usage frequency counts

Performance Characteristics

Input Size: Default 448x448 pixels (configurable)
Tag Count: 12,547 possible tags
Threshold: Default 0.7 (configurable)
Supported Tags: Character attributes, clothing, accessories, backgrounds, compositions
Architecture: Caformer-M36 and TResnet variants
Format: ONNX for optimized inference

Model Architecture Details

The ML-Danbooru models utilize modern transformer-based architectures:

Caformer-M36: Combines convolutional layers with transformer blocks for efficient feature extraction
TResnet-D: Transformer-enhanced ResNet variants with focal loss optimization
ONNX Optimization: Models are exported with optimized operators for fast inference across different hardware platforms

Citation

@misc{deepghs_ml_danbooru_onnx,
  title        = {{ML-Danbooru ONNX Models: Optimized Anime Image Tagging}},
  author       = {7eu7d7 and DeepGHS Contributors},
  howpublished = {\url{https://huggingface.co/deepghs/ml-danbooru-onnx}},
  year         = {2023},
  note         = {ONNX-optimized implementations of ML-Danbooru models for efficient anime image tagging with transformer-based architectures},
  abstract     = {This repository provides ONNX-optimized implementations of the ML-Danbooru image tagging models, originally developed by 7eu7d7. ML-Danbooru is a sophisticated deep learning system specifically designed for automated tagging of anime-style images, leveraging modern transformer architectures to achieve high-precision classification across thousands of Danbooru-style tags. The models employ Caformer (Convolution-Augmented Transformer) architectures that combine the global receptive field of transformers with local feature extraction capabilities of convolutional networks, enabling effective capture of both fine-grained details and global contextual information in anime artwork.},
  keywords     = {image-classification, anime, tagging, danbooru, transformer, onnx}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

deepghs
/

ml-danbooru-onnx