LanguageBind-MLP Model

Model Description

This is a fine-tuned LanguageBind model for detecting machine-generated content across multiple modalities (text, image, and audio). The model is part of the RU-AI project, which introduces a large multimodal dataset for AI-generated content detection.

This model leverages LanguageBind's multi-modal semantic alignment capabilities to identify whether content is human-generated or machine-generated across different modalities.

Model Details

Model Type: Multi-modal classification model based on LanguageBind
Architecture: LanguageBind with MLP classifier head
Paper: RU-AI: A Large Multimodal Dataset for Machine Generated Content Detection
GitHub Repository: ZhihaoZhang97/RU-AI
Accepted at: WWW'25 Resource Track
Modalities Supported: Text, Image, and Audio

Intended Use

This model is designed for detecting AI-generated content in:

Text: Identifying AI-written articles, essays, responses, and general text
Images: Detecting images generated by models like Stable Diffusion, DALL-E, etc.
Audio: Identifying synthetic speech from TTS models

Use Cases

Content moderation and authenticity verification
Academic integrity checking
Media forensics and fact-checking
Research on AI-generated content detection

Training Data

The model was trained on the RU-AI dataset, which includes:

245,895 real/human-generated samples
1,229,475 machine-generated samples
Multiple data sources: COCO, Flickr8k, Places dataset
AI-generated content from various models:
- Images: Stable Diffusion (v1.5, v6.0, XL v3.0, AbsoluteReality, EpicRealism)
- Audio: EfficientSpeech, StyleTTS2, VITS, XTTS2, YourTTS
- Text: Various LLM-generated captions and descriptions

Dataset is publicly available at Zenodo.

Requirements

Hardware

NVIDIA GPU with at least 16GB VRAM (RTX 3090 24GB or higher recommended)
At least 500GB disk space for the full dataset

Software

Python >= 3.8
PyTorch >= 1.13.1
CUDA >= 11.6

Installation

# Clone the repository
git clone https://github.com/ZhihaoZhang97/RU-AI.git
cd RU-AI

# Create virtual environment
conda create -n ruai python=3.8
conda activate ruai

# Install dependencies
pip3 install -r requirements.txt

Usage

Model Inference

# See infer_languagebind_model.py in the GitHub repository
python infer_languagebind_model.py

Before running inference, you need to:

Download the dataset or prepare your own data
Update the data paths in infer_languagebind_model.py:
- image_data_paths
- audio_data_paths
- text_data

Quick Start with Sample Data

# Download Flickr8k sample data
python ./download_flickr.py

# Or download the full dataset (157GB compressed, 500GB uncompressed)
python ./download_all.py

Model Performance

This model is designed to detect AI-generated content across multiple modalities simultaneously, leveraging LanguageBind's language-based semantic alignment to create unified representations.

For detailed performance metrics and evaluation results, please refer to the paper.

Limitations

The model's performance depends on the quality and diversity of training data
May not generalize well to AI models or techniques not represented in the training set
Detection accuracy may vary across different modalities
Requires significant computational resources for inference

Ethical Considerations

This model is intended for research and legitimate content verification purposes. Users should:

Consider privacy implications when analyzing user-generated content
Be aware of potential biases in training data
Use the model responsibly and not for censorship without human oversight
Understand that detection is probabilistic and may produce false positives/negatives

Citation

If you use this model in your research, please cite:

@misc{huang2024ruai,
  title={RU-AI: A Large Multimodal Dataset for Machine Generated Content Detection},
  author={Liting Huang and Zhihao Zhang and Yiran Zhang and Xiyue Zhou and Shoujin Wang},
  year={2024},
  eprint={2406.04906},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

Acknowledgments

This work builds upon:

We appreciate the open-source community for the datasets and models that made this work possible.

License

Please refer to the GitHub repository for license information.

Contact

For questions and issues:

Open an issue on the GitHub repository
Refer to the paper for contact information of the authors

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support