DiffSeg30k: A Multi-Turn Diffusion Editing Benchmark for Localized AIGC Detection
Abstract
DiffSeg30k, a dataset of 30k diffusion-edited images, supports fine-grained detection of AI-generated content through semantic segmentation.
Diffusion-based editing enables realistic modification of local image regions, making AI-generated content harder to detect. Existing AIGC detection benchmarks focus on classifying entire images, overlooking the localization of diffusion-based edits. We introduce DiffSeg30k, a publicly available dataset of 30k diffusion-edited images with pixel-level annotations, designed to support fine-grained detection. DiffSeg30k features: 1) In-the-wild images--we collect images or image prompts from COCO to reflect real-world content diversity; 2) Diverse diffusion models--local edits using eight SOTA diffusion models; 3) Multi-turn editing--each image undergoes up to three sequential edits to mimic real-world sequential editing; and 4) Realistic editing scenarios--a vision-language model (VLM)-based pipeline automatically identifies meaningful regions and generates context-aware prompts covering additions, removals, and attribute changes. DiffSeg30k shifts AIGC detection from binary classification to semantic segmentation, enabling simultaneous localization of edits and identification of the editing models. We benchmark three baseline segmentation approaches, revealing significant challenges in semantic segmentation tasks, particularly concerning robustness to image distortions. Experiments also reveal that segmentation models, despite being trained for pixel-level localization, emerge as highly reliable whole-image classifiers of diffusion edits, outperforming established forgery classifiers while showing great potential in cross-generator generalization. We believe DiffSeg30k will advance research in fine-grained localization of AI-generated content by demonstrating the promise and limitations of segmentation-based methods. DiffSeg30k is released at: https://huggingface.co/datasets/Chaos2629/Diffseg30k
Community
30k edited images from 8 diffusion models, up to 3-turn edits, with pixel-accurate masks for fine-grained AIGC localization + attribution.
Dataset: https://huggingface.co/datasets/Chaos2629/Diffseg30k
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- ManipShield: A Unified Framework for Image Manipulation Detection, Localization and Explanation (2025)
- UniAIDet: A Unified and Universal Benchmark for AI-Generated Image Content Detection and Localization (2025)
- Unlocking the Forgery Detection Potential of Vanilla MLLMs: A Novel Training-Free Pipeline (2025)
- Data Factory with Minimal Human Effort Using VLMs (2025)
- OmniDFA: A Unified Framework for Open Set Synthesis Image Detection and Few-Shot Attribution (2025)
- Mirage: Unveiling Hidden Artifacts in Synthetic Images with Large Vision-Language Models (2025)
- EditTrack: Detecting and Attributing AI-assisted Image Editing (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper