Text Generation
Safetensors
English
qwen3
Marin Logo

Model Card for Marin 32B

This is the model card for the Marin 32B base model. The Marin Project is a collaborative effort to develop open-source foundation models.

Datasets

Datasets used in Marin 32B Base

Marin 32B Base was trained in multiple phases that reused our 8B recipe and introduced new high-quality cooldown data:

The Markdownified datasets are licensed under the original licenses of the individual documents. Please refer to StackExchange, Wikipedia, and arXiv for more information.

The Datashop Science QA dataset is licensed under CC-BY-SA 4.0.

Checkpoints

Base Model Checkpoints

Main Page: marin-community/marin-32b-base

main currently refers to the mantis revision.

Installation

Marin 32B follows a Llama-style transformer architecture with QK-Norm attention (matching the Qwen3 32B backbone) and works out-of-the-box with the Hugging Face Transformers library and other libraries that support Llama/Qwen-style causal language models.

We use the stanford-crfm/marin-tokenizer tokenizer.

Inference

You can use Marin 32B with the standard Hugging Face Transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer
marin = AutoModelForCausalLM.from_pretrained("marin-community/marin-32b-base", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("marin-community/marin-32b-base")
message = ["The Marin wind is"]
inputs = tokenizer(message, return_tensors="pt", return_token_type_ids=False)
response = marin.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])

Model Description

  • Developed by: The Marin community team.
  • Model type: Transformer-style autoregressive language model.
  • Knowledge Cutoff: To the best of our knowledge, the base model has no data from later than July 2024.
  • Language(s) (NLP): English
  • License: The code and model are released under Apache 2.0.
  • Contact: dlwh at stanford.edu

Model Sources

Evaluation

We evaluate with EleutherAI's LM Eval Harness defaults across a standard suite. Numbers may differ from model cards or other evaluation harnesses due to prompt/format differences. “Average” is a simple mean over the shown tasks.

Model Average AGI Eval LSAT-AR ARC Easy ARC Challenge BoolQ CommonSense QA COPA HellaSwag lambada_openai OpenBookQA PIQA WinoGrande WSC MMLU GPQA BBH MMLU Pro HumanEval GSM8K MATH
Marin 32B (Mantis) 65.2 24.8 88.0 65.7 89.4 82.8 93.0 86.9 77.2 46.4 85.9 79.3 79.5 74.7 34.0 59.6 45.1 42.7 69.1 15.3
Marin 32B (Bison) 63.0 23.4 87.8 65.8 88.9 82.3 94.0 86.6 77.4 46.6 86.1 78.6 82.4 72.9 32.1 55.2 41.9 29.3 54.7 10.4
OLMo 2 32B Base 63.2 22.6 85.9 61.9 83.0 78.6 93.0 85.9 78.3 47.2 83.1 78.9 86.8 71.9 32.2 56.1 42.0 23.8 76.4 12.7
Qwen 2.5 32B Base 68.1 30.4 80.8 55.9 87.7 88.5 87.0 84.1 77.6 44.4 82.4 75.7 81.0 80.8 39.0 67.4 57.9 48.8 89.3 36.3
Gemma 3 27B PT 65.1 22.2 88.2 65.4 87.1 73.4 93.0 83.0 78.1 45.0 84.1 79.0 91.9 75.3 35.7 61.4 49.4 17.6 82.0 25.8
NVIDIA Nemotron Nano 12B v2 Base 68.6 28.7 83.6 60.6 84.8 76.1 85.0 81.4 72.9 45.8 82.8 74.4 85.4 77.9 36.6 62.0 53.1 59.2 84.1 68.3

The Mantis cooldown improves coding (HumanEval) and math (GSM8K, MATH) performance dramatically compared with the earlier Bison cooldown while maintaining competitive accuracy across general-language benchmarks.

Model Details

Please see our technical retrospective for more details on the pretraining process.

Architecture Details

  • Architecture: Qwen3-style 32B with QK-Norm attention
  • Hidden size: 5120
  • Feedforward size: 27648
  • Number of layers: 64
  • Number of attention heads: 40
  • Number of KV heads: 8
  • Sequence length: 4096

Tokenizer Details

Marin 32B uses the stanford-crfm/marin-tokenizer. It has the same vocabulary as Llama 3 but bundles a chat template into the base tokenizer for convenience.

Training Phases

Bias, Risks, and Limitations

Like any base language model or fine-tuned model without safety filtering, these models can be prompted to generate harmful or sensitive content. Such content may also be produced unintentionally, especially in cases involving bias, so users should consider the risks when applying this technology. Additionally, many statements from Marin or any LLM can be inaccurate, so responses should be verified.

Marin 32B has not undergone safety tuning or evaluation. We strongly recommend using this model with caution and considering the risks when applying this technology. In particular, this model is not intended for fully autonomous use.

Model Card Contact

For errors in this model card, please open an issue in this repository. For technical inquiries, please contact dlwh at stanford.edu.

Acknowledgements

The compute for this model was generously provided by Google's TPU Research Cloud.

Downloads last month
1,220
Safetensors
Model size
33B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for marin-community/marin-32b-base

Quantizations
5 models

Datasets used to train marin-community/marin-32b-base