YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
AMOE: Agglomerative Mixture-of-Experts Vision Foundation Model
A vision encoder distilled from DINOv3 and SigLIP2 teachers, supporting multi-resolution image understanding with Mixture-of-Experts (MoE) architecture.
AMOE is an MoE vision foundation model distilled from DINOv3 and SigLIP2 teachers.
Installation
pip install torch transformers einops pillow
Quick Start
import torch
from PIL import Image
from transformers import AutoModel, AutoImageProcessor
# Load model and processor
model_id = "tiiuae/amoe"
model = AutoModel.from_pretrained(model_id, trust_remote_code=True).to("cuda", dtype=torch.bfloat16)
processor = AutoImageProcessor.from_pretrained(model_id, trust_remote_code=True)
# Preprocess image
image = Image.open("image.jpg").convert("RGB")
inputs = processor(image, return_tensors="pt").to("cuda")
inputs["pixel_values"] = inputs["pixel_values"].to(torch.bfloat16)
# Inference
with torch.no_grad():
outputs = model(**inputs)
# Access specialized features
# Options: 'amoe' (768d), 'siglip2' (1152d), 'dinov3' (1024d)
patch_features = outputs["patch_features"]["amoe"] # (Batch, Tokens, 768)
summary_features = outputs["summary_features"]["siglip2"] # (Batch, 1152)
Citation
If you use AMoE in your research, please cite:
@article{chaybouti2025amoe,
title={AMOE: Agglomerative Mixture-of-Experts Vision Foundation Models},
author={Chaybouti, Sofian and Narayan, Sanath and Dahou, Yasser and Le Khac, Phuc H. and Singh, Ankit and Huynh, Ngoc Dung and Para, Wamiq Reyaz and Kuehne, Hilde and Hacid, Hakim},
journal={arXiv preprint arXiv:2512.20157},
year={2025}
}
- Downloads last month
- 13
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support