Llama3.1-deep-o1
- A robust merge of DeepSeek R1 distilled and O1-style long chain-of-thought (CoT) large language models (LLMs).
- It can generate long, coherent solutions and excels at problem-solving tasks among models with 8 billion parameters.
Model Overview
- Supports long reasoning and non-reasoning modes.
- In reasoning mode, generates the thought process for problem solving.
- Suitable for creating solution outlines and analyzing problems
- Or as a foundation model for finetuning and merging.
To use the long CoT reasoning mode, use a system prompt like
Explain your reasoning step-by-step using <think>...</think>, then give the final answer inside <response>...</response>.
Examples to Try:
- Write the equations for glycolysis and pyruvate oxidation.
- Calculate net ATP formation from glucose metabolism (excluding electron transport chain).
- Integrate x^2 e^x dx.
- Prove that the complete bipartite graph K_{3,3} isn't planar.
- Derive a formula for the critical angle between two media with refractive indices n_1 and n_2.
- Compare steam vs. diesel engines including their capabilities and historical significance.
Limitations
This model is experimental. While the model provides coherent and expert-like responses, users should verify its outputs for accuracy - especially in calculations or logical reasoning tasks.
- It is not optimized for conversational tasks but performs well in single-turn question answering.
- Inconsistent formatting for mathematical equations and LaTeX code.
- May have inaccurate data, make calculation errors or reasoning mistakes.
- Struggles with multiturn conversations and user alignment.
Model Details
The model was created using the following Mergekit YAML configuration:
models:
- model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
- model: Skywork/Skywork-o1-Open-Llama-3.1-8B
- model: SimpleBerry/LLaMA-O1-Supervised-1129
- model: NousResearch/DeepHermes-3-Llama-3-8B-Preview
- model: O1-OPEN/OpenO1-LLama-8B-v0.1
- model: nvidia/Llama-3.1-Nemotron-Nano-8B-v1
merge_method: karcher
tokenizer:
source: meta-llama/Llama-3.1-8B-Instruct
dtype: bfloat16
Then the merged model was trained on agentlans/train-of-thought 10K subset for 1 epoch using LLaMA Factory.
- LoRA rank 8, alpha 16, dropout 0.5, use rsLoRA
- Pack sequences, NEFTune 5
- Liger kernel accelerator
Licence
Llama 3.1 license
- Downloads last month
- 13
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for agentlans/Llama3.1-deep-o1
Base model
meta-llama/Llama-3.1-8B