ChemPile: A 250GB Diverse and Curated Dataset for Chemical Foundation Models Paper • 2505.12534 • Published May 18 • 3
ChemPile Collection The ChemPile is a dataset with over 77 billion curated multimodal tokens about chemistry. For more information, visit https://chempile.lamalab.org/. • 8 items • Updated 28 days ago • 16
ChemBench-Collection Collection Datasets, Spaces and Results related to ChemBench • 4 items • Updated Oct 3 • 4
MaCBench-collection Collection Dataset, Spaces, Results related to MaCBench • 7 items • Updated Sep 10 • 3
MatText: Do Language Models Need More than Text & Scale for Materials Modeling? Paper • 2406.17295 • Published Jun 25, 2024 • 1
Probing the limitations of multimodal language models for chemistry and materials research Paper • 2411.16955 • Published Nov 25, 2024 • 1
Reflections from the 2024 Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry Paper • 2411.15221 • Published Nov 20, 2024 • 32