For full information, go check out the Dr Tulu paper here.

DR Tulu 8B - MLX 4-bit Quantized

This is DR Tulu 8B converted to MLX 4-bit quantized format for efficient inference on Apple Silicon hardware. This variant is optimized for speed with minimal memory requirements while preserving DR-Tulu's signature reasoning capabilities.

MLX Model Variants - Complete Collection

Choose the best variant for your hardware and performance needs:

Model	Precision	Model Size	Bits/Weight	Memory Usage	Performance	Repository
DR-Tulu-8B-MLX-4bit	4-bit quantized	4.3GB	4.500	4.9GB	78.2 tok/s	`Plurigrid/DR-Tulu-8B-MLX-4bit`
DR-Tulu-8B-MLX-6bit	6-bit quantized	6.2GB	6.500	6.9GB	60.7 tok/s	`Plurigrid/DR-Tulu-8B-MLX-6bit`
DR-Tulu-8B-MLX-8bit	8-bit quantized	8.1GB	8.500	8.8GB	59.8 tok/s	`Plurigrid/DR-Tulu-8B-MLX-8bit`
DR-Tulu-8B-MLX-bf16	bfloat16 (full)	15.3GB	~16.000	16.4GB	35.0 tok/s	`Plurigrid/DR-Tulu-8B-MLX-bf16`

Why Choose 4-bit?

Optimized Performance: 78.2 tokens/sec (2.2x faster than bf16)
Minimal Memory: 4.9GB RAM usage (3.4x less than bf16)
Device Compatibility: Runs on 8GB+ Apple Silicon devices
Preserved Reasoning: Full DR-Tulu <think> capabilities intact

Quick Start

Command Line Interface

# Interactive chat (recommended)
uvx --from mlx-lm mlx_lm.chat --model Plurigrid/DR-Tulu-8B-MLX-4bit

# Generate text
uvx --from mlx-lm mlx_lm.generate --model Plurigrid/DR-Tulu-8B-MLX-4bit --prompt "What is category theory?" --max-tokens 500

Python API

from mlx_lm import load, generate

# Load the 4-bit quantized model
model, tokenizer = load("Plurigrid/DR-Tulu-8B-MLX-4bit")

prompt = "Explain quantum computing step by step."

# Apply chat template if available
if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

# Generate response with DR-Tulu reasoning
response = generate(model, tokenizer, prompt=prompt, verbose=True)
print(response)

Installation

# Install MLX-LM
pip install mlx-lm
# or with uv
uv add mlx-lm

Hardware Requirements

Component	Minimum	Recommended
Platform	Apple Silicon (M1/M2/M3/M4/M5)	M1 Pro/Max or newer
Memory	8GB unified memory	16GB+ unified memory
Storage	5GB free space	10GB+ free space
OS	macOS 12+	macOS 14+ (Sonoma)

Tested Configuration: Mac Studio M1 Ultra (20-core CPU, 128GB unified memory), macOS Sequoia 15.2

Technical Specifications

4-bit Quantization Details:

Quantization Method: MLX native affine quantization
Effective Bits: 4.500 bits per weight
Group Size: 128 (default)
Conversion Command: mlx_lm.convert --quantize --q-bits 4
Quality Preservation: Excellent (maintains reasoning patterns)

Performance Metrics:

Inference Speed: 78.2 tokens/second
Memory Efficiency: 4.9GB peak usage
Model Loading: ~3-5 seconds
Quality: Preserves DR-Tulu's signature <think> reasoning

About DR Tulu

This is the RL checkpoint of DR Tulu, an open deep research agent trained on top of rl-research/DR-Tulu-SFT-8B.

Key Capabilities:

Step-by-step reasoning with visible <think> tags
Research-grade analysis and problem-solving
Tool-use and multi-turn conversations
Mathematical and scientific reasoning

For more details on DR Tulu, please read our paper!

Evaluation Results

Results from the original DR-Tulu-8B model (quality preserved in 4-bit variant):

Benchmark	SQAv2	HealthBench	ResearchQA	DeepResearch Bench	SimpleQA	2Wiki	WebWalker	Average
DR-Tulu-8B	86.7	43.7	71.1	41.8	80.1	68.0	39.1	61.5

Advanced Usage

Multi-turn Conversation

messages = [
    {"role": "user", "content": "What is category theory?"},
    {"role": "assistant", "content": "Category theory is a mathematical framework..."},
    {"role": "user", "content": "How does it apply to computer science?"}
]

if tokenizer.chat_template is not None:
    formatted_prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
    response = generate(model, tokenizer, prompt=formatted_prompt, max_tokens=1000)

Research-Style Analysis

research_prompt = """
Analyze the relationship between quantum mechanics and information theory.
Think step by step and provide a comprehensive analysis.
"""

response = generate(model, tokenizer, prompt=research_prompt, max_tokens=1500, verbose=True)

License & Usage

This model is licensed under Apache 2.0. It is intended for research and educational use in accordance with Ai2's Responsible Use Guidelines.

4-bit Specific Considerations:

Optimized for Apple Silicon hardware only
Excellent quality preservation with 4.500 bits per weight
Fastest inference in the MLX model series
Ideal for real-time applications and resource-constrained environments

Citation

@article{drtulu,
  title = {{DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research}},
  author = {{Rulin Shao, Akari Asai, Shannon Shen, Hamish Ivison, Varsha Kishore, Jingming Zhuo, Xinran Zhao, Molly Park, Sam Finlayson, David Sontag, Tyler Murray, Sewon Min, Pradeep Dasigi, Luca Soldani, Faeze Brahman, Scott Yih, Sherry Tongshuang Wu, Luke Zettlemoyer, Yoon Kim, Hanna Hajishirzi, Pang Wei Koh}},
  year = {2025},
}

🔄 Conversion Details

Conversion Date: November 22, 2024
Converter: MLX community via Plurigrid
Command: uvx --from mlx-lm mlx_lm.convert --hf-path rl-research/DR-Tulu-8B --mlx-path ./DR-Tulu-8B-4bit --quantize --q-bits 4
Framework Version: mlx-lm latest (November 2024)
Validation: Tested with 1069-token generation maintaining quality

Downloads last month: 63

Safetensors

Model size

1B params

Tensor type

BF16

U32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Plurigrid/DR-Tulu-8B-MLX-4bit

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Finetuned

rl-research/DR-Tulu-SFT-8B

Finetuned

rl-research/DR-Tulu-8B

Quantized

(5)

this model

Plurigrid
/

DR-Tulu-8B-MLX-4bit

DR Tulu 8B - MLX 4-bit Quantized

MLX Model Variants - Complete Collection

Why Choose 4-bit?

Quick Start

Command Line Interface

Python API

Installation

Hardware Requirements

Technical Specifications

About DR Tulu

Evaluation Results

Advanced Usage

Multi-turn Conversation

Research-Style Analysis

Related Links

License & Usage

Citation

🔄 Conversion Details

Model tree for Plurigrid/DR-Tulu-8B-MLX-4bit