Llama3.1-deep-o1

A robust merge of DeepSeek R1 distilled and O1-style long chain-of-thought (CoT) large language models (LLMs).
It can generate long, coherent solutions and excels at problem-solving tasks among models with 8 billion parameters.

Model Overview

Supports long reasoning and non-reasoning modes.
In reasoning mode, generates the thought process for problem solving.
Suitable for creating solution outlines and analyzing problems
Or as a foundation model for finetuning and merging.

To use the long CoT reasoning mode, use a system prompt like

Explain your reasoning step-by-step using <think>...</think>, then give the final answer inside <response>...</response>.

Examples to Try:

Write the equations for glycolysis and pyruvate oxidation.
Calculate net ATP formation from glucose metabolism (excluding electron transport chain).
Integrate x^2 e^x dx.
Prove that the complete bipartite graph K_{3,3} isn't planar.
Derive a formula for the critical angle between two media with refractive indices n_1 and n_2.
Compare steam vs. diesel engines including their capabilities and historical significance.

Limitations

This model is experimental. While the model provides coherent and expert-like responses, users should verify its outputs for accuracy - especially in calculations or logical reasoning tasks.

It is not optimized for conversational tasks but performs well in single-turn question answering.
Inconsistent formatting for mathematical equations and LaTeX code.
May have inaccurate data, make calculation errors or reasoning mistakes.
Struggles with multiturn conversations and user alignment.

Model Details

The model was created using the following Mergekit YAML configuration:

models:
  - model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
  - model: Skywork/Skywork-o1-Open-Llama-3.1-8B
  - model: SimpleBerry/LLaMA-O1-Supervised-1129
  - model: NousResearch/DeepHermes-3-Llama-3-8B-Preview
  - model: O1-OPEN/OpenO1-LLama-8B-v0.1
  - model: nvidia/Llama-3.1-Nemotron-Nano-8B-v1
merge_method: karcher
tokenizer:
  source: meta-llama/Llama-3.1-8B-Instruct
dtype: bfloat16

Then the merged model was trained on agentlans/train-of-thought 10K subset for 1 epoch using LLaMA Factory.

LoRA rank 8, alpha 16, dropout 0.5, use rsLoRA
Pack sequences, NEFTune 5
Liger kernel accelerator

Licence

Llama 3.1 license

Downloads last month: 13

Safetensors

Model size

8B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for agentlans/Llama3.1-deep-o1

Base model

meta-llama/Llama-3.1-8B

Finetuned

NousResearch/DeepHermes-3-Llama-3-8B-Preview