Abstract
AION-1, a family of large-scale multimodal foundation models, integrates diverse astronomical data using tokenization and transformer-based modeling, achieving strong performance across various downstream tasks.
While foundation models have shown promise across a variety of fields, astronomy still lacks a unified framework for joint modeling across its highly diverse data modalities. In this paper, we present AION-1, a family of large-scale multimodal foundation models for astronomy. AION-1 integrates heterogeneous imaging, spectroscopic, and scalar data using a two-stage architecture: modality-specific tokenization followed by transformer-based masked modeling of cross-modal token sequences. The model is pretrained on five large-scale surveys: Legacy Survey, Hyper Suprime-Cam (HSC), Sloan Digital Sky Survey (SDSS), Dark Energy Spectroscopic Instrument (DESI), and Gaia. These span more than 200 million observations of stars, galaxies, and quasars. With a single frozen encoder, AION-1 achieves strong results on a broad suite of downstream tasks, including galaxy and stellar property estimation, galaxy morphology classification, similarity-based retrieval, galaxy image segmentation, and spectral super-resolution. We release AION-1 model variants ranging from 300 M to 3.1 B parameters. Beyond astronomy, AION-1 provides a scalable blueprint for multimodal scientific foundation models that can seamlessly integrate noisy, instrument-specific observations. All code, tokenizers, pretrained weights, and a lightweight evaluation suite are released under an open-source license.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Universal Spectral Tokenization via Self-Supervised Panchromatic Representation Learning (2025)
- The Platonic Universe: Do Foundation Models See the Same Sky? (2025)
- Multi-modal Foundation Model for Cosmological Simulation Data (2025)
- Simulation-Based Pretraining and Domain Adaptation for Astronomical Time Series with Minimal Labeled Data (2025)
- ASTROCO: Self-Supervised Conformer-Style Transformers for Light-Curve Embeddings (2025)
- StarEmbed: Benchmarking Time Series Foundation Models on Astronomical Observations of Variable Stars (2025)
- AppleCiDEr II: SpectraNet -- A Deep Learning Network for Spectroscopic Data (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper