Instructions to use allenai/OLMo-2-0325-32B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use allenai/OLMo-2-0325-32B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="allenai/OLMo-2-0325-32B")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-2-0325-32B") model = AutoModelForCausalLM.from_pretrained("allenai/OLMo-2-0325-32B") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use allenai/OLMo-2-0325-32B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "allenai/OLMo-2-0325-32B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "allenai/OLMo-2-0325-32B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/allenai/OLMo-2-0325-32B
- SGLang
How to use allenai/OLMo-2-0325-32B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "allenai/OLMo-2-0325-32B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "allenai/OLMo-2-0325-32B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "allenai/OLMo-2-0325-32B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "allenai/OLMo-2-0325-32B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use allenai/OLMo-2-0325-32B with Docker Model Runner:
docker model run hf.co/allenai/OLMo-2-0325-32B
Clarification on `rope_theta` in the checkpoints
Hi OLMo team and community,
First, thank you for openly sharing the OLMo-2-0325-32B model and the accompanying technical report, your transparency is invaluable to the research community.
In the report, you mention increasing RoPE theta from 10,000 to 500,000. When I inspect the model_config.json of the stage1-step1000-tokens9B checkpoint, I see:
"rope_theta": 500000
Could you clarify whether this checkpoint was indeed trained with rope_theta=500,000? If so, is it safe to run inference directly with this configuration, or are there any additional considerations we should keep in mind?
Thanks again for your outstanding open-source contribution, looking forward to your guidance!
Best regards,
Minjun Kim
Hello, thanks for the kind words and the question!
Can I ask where you're seeing the mention of increasing RoPE theta from 10,000 to 500,000? I'm wondering if you're reading it from the RoPE theta section of this paper, https://arxiv.org/pdf/2501.00656, which says (page 6):
RoPE theta = 5e5: We increase the RoPE to 500,000 from 10,000. This approach increases the resolution of
positional encoding, matching Grattafiori et al. (2024).
This section is explaining how the OLMo2 models are different from previous iteration, where rope_theta was 10,000. For OLMo2, rope_theta should be 500,000. And yup, should be safe to run inference with this config!
Hi OLMo team,
Thanks so much for the quick clarification!
I realize I’d misunderstood the note about RoPE theta. I initially thought theta started at 10,000 during early training and was later increased to 500,000 within OLMo-2. Your explanation makes it clear that 500,000 is the setting for OLMo-2, and the 10,000 figure refers to earlier iterations, got it.
Appreciate the help and the great work you’re doing.
Thank you!