Gemstone-384x36
	
Gemstone-384x36 is part of the Gemstone Suite of Models. A set of models trained with varying widths and depths.
	
		
	
	
		Training
	
We train using litgpt and AxoNN using AMD MI250X GPUs on Frontier at Oak Ridge National Laboratory with a global batch size of 2048.
	
		
	
	
		Data
	
Train and validation data is taken from non-overlapping subsets of dolma. As such it is not an instruction model.
This model is trained for 350 billion tokens, we upload checkpoints every 2 billion tokens (477 steps).
	
		
	
	
		Using Gemstone-384x36
	
The Gemstones are based on the gemma-2b architecture and use modeling_gemma.py to run using the transformers library.
	
		
	
	
		Licence
	
This model is released under the apache-2.0 licence.
	
		
	
	
		Contact
	
Please, feel free to contact us with any questions, or open a discussion thread.
	
		
	
	
		Citation
	
@article{mcleish2024gemstones
    title={Gemstones: A Model Suite for Multi-Faceted Scaling Laws}, 
    author={Sean McLeish and John Kirchenbauer and David Yu Miller and Siddharth Singh and Abhinav Bhatele and Micah Goldblum and Ashwinee Panda and Tom Goldstein},
    journal={arXiv preprint arXiv:2502.},
    year={2025}
}