MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources
Paper
•
2509.25531
•
Published
•
7
Formerly, MDEL, we have renamed ourselves after the model we deployed, Aurora-M. Visit us here: https://huggingface.co/aurora-m