neikos00's picture
Create README.md
0b09a4b verified
|
raw
history blame
1.06 kB
metadata
library_name: transformers
license: mit
tags:
  - vision
  - image-segmentation
  - pytorch

EoMT

PyTorch

EoMT (Encoder-only Mask Transformer) is a Vision Transformer (ViT) architecture designed for high-quality and efficient image segmentation. It was introduced in the CVPR 2025 highlight paper:
Your ViT is Secretly an Image Segmentation Model
by Tommie Kerssies, Niccolò Cavagnero, Alexander Hermans, Narges Norouzi, Giuseppe Averta, Bastian Leibe, Gijs Dubbelman, and Daan de Geus.

Key Insight: Given sufficient scale and pretraining, a plain ViT along with additional few params can perform segmentation without the need for task-specific decoders or pixel fusion modules. The same model backbone supports semantic, instance, and panoptic segmentation with different post-processing 🤗

The original implementation can be found in this repository