The ChemPile is a dataset with over 77 billion curated multimodal tokens about chemistry. For more information, visit https://chempile.lamalab.org/.