--- license: apache-2.0 base_model: rl-research/DR-Tulu-8B library_name: mlx tags: - mlx - research - reasoning - tool-use --- > For full information, go check out the Dr Tulu paper [here](https://allenai.org/papers/drtulu). [![Figure 1](https://huggingface.co/rl-research/DR-Tulu-SFT-8B/resolve/main/dr_tulu_logo.png)](https://huggingface.co/rl-research/DR-Tulu-SFT-8B/resolve/main/dr%5Ftulu%5Flogo.png) # DR Tulu 8B - MLX This is **DR Tulu 8B converted to MLX format** for efficient inference on Apple Silicon hardware. ## MLX Model Variants All variants are optimized for Apple Silicon with different memory/performance trade-offs: | Model | Precision | Model Size | Bits/Weight | Memory Usage | Performance | Download | |-------|-----------|------------|-------------|--------------|-------------|----------| | **DR-Tulu-8B-MLX-4bit** | 4-bit quantized | ~4.3GB | 4.500 | Lower | 78.2 tok/s | [🤗 HF](https://huggingface.co/Plurigrid/DR-Tulu-8B-MLX-4bit) | | **DR-Tulu-8B-MLX-6bit** | 6-bit quantized | ~6.2GB | 6.500 | Medium | 60.7 tok/s | [🤗 HF](https://huggingface.co/Plurigrid/DR-Tulu-8B-MLX-6bit) | | **DR-Tulu-8B-MLX-8bit** | 8-bit quantized | ~8.1GB | 8.500 | Medium-High | 59.8 tok/s | [🤗 HF](https://huggingface.co/Plurigrid/DR-Tulu-8B-MLX-8bit) | | **DR-Tulu-8B-MLX-bf16** | bfloat16 (full) | ~15.3GB | 16.000 | High | 35.0 tok/s | [🤗 HF](https://huggingface.co/Plurigrid/DR-Tulu-8B-MLX-bf16) | **🔥 Key Features:** - **Original Model:** [rl-research/DR-Tulu-8B](https://huggingface.co/rl-research/DR-Tulu-8B) - **Hardware Optimized:** Apple Silicon (M1/M2/M3/M4/M5) - **Conversion Framework:** [mlx-lm](https://github.com/ml-explore/mlx-lm) - **Research-Grade Choice:** **bf16** provides maximum quality and capabilities with full precision - **All variants maintain core research reasoning capabilities** **🔥 MLX Conversion Details:** * **Original Model:** [rl-research/DR-Tulu-8B](https://huggingface.co/rl-research/DR-Tulu-8B) * **Conversion:** MLX format with bfloat16 precision (research-grade full precision) * **Model Size:** ~15.3GB (down from 16.4GB original) * **Hardware Used:** Mac Studio with Apple M1 Ultra (20-core, 128GB unified memory) * **Conversion Framework:** [mlx-lm](https://github.com/ml-explore/mlx-lm) * **Performance:** ~35 tokens/sec, 16.4GB memory usage ## Hardware Requirements | Variant | Minimum RAM | Recommended RAM | Storage | |---------|-------------|-----------------|---------| | 4bit | 8GB | 16GB | 5GB | | 6bit | 16GB | 24GB | 7GB | | 8bit | 16GB | 32GB | 9GB | | bf16 | 24GB | 32GB+ | 16GB | **Tested Hardware:** Mac Studio with Apple M1 Ultra (20-core, 128GB unified memory) ## MLX Quick Start ### Command Line Interface Install and run with uvx: ```bash # Generate text (replace {VARIANT} with 4bit, 6bit, 8bit, or bf16) uvx --from mlx-lm mlx_lm.generate --model Plurigrid/DR-Tulu-8B-MLX-{VARIANT} --prompt "What is categorical theory and how does it apply to computer science?" --max-tokens 200 # Interactive chat uvx --from mlx-lm mlx_lm.chat --model Plurigrid/DR-Tulu-8B-MLX-{VARIANT} ``` ### Python API ```python from mlx_lm import load, generate # Load model (replace {VARIANT} with 4bit, 6bit, 8bit, or bf16) model, tokenizer = load("Plurigrid/DR-Tulu-8B-MLX-{VARIANT}") prompt = "What is categorical theory and how does it apply to computer science?" # Apply chat template if available if tokenizer.chat_template is not None: messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) # Generate response response = generate(model, tokenizer, prompt=prompt, verbose=True) print(response) ``` **Installation for Python API:** ```bash pip install mlx-lm # or with uv uv add mlx-lm ``` **Advanced Usage:** ```python # For research tasks with step-by-step reasoning prompt = "Analyze the relationship between category theory and functional programming. Think step by step." # Multi-turn conversation messages = [ {"role": "user", "content": "What is category theory?"}, {"role": "assistant", "content": "Category theory is a mathematical framework..."}, {"role": "user", "content": "How does it apply to computer science?"} ] if tokenizer.chat_template is not None: formatted_prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True) response = generate(model, tokenizer, prompt=formatted_prompt, max_tokens=500) ``` ## About DR Tulu This is the RL checkpoint of DR Tulu, an open deep research agent trained on top of [rl-research/DR-Tulu-SFT-8B](https://huggingface.co/rl-research/DR-Tulu-SFT-8B). This model has undergone RL training on [this dataset](https://huggingface.co/datasets/rl-research/dr-tulu-rl-data). For more details on DR Tulu please **read our [paper](https://allenai.org/papers/drtulu)**! ## Inference and Usage **Note:** The original model was trained for tool-use using the dr-agent-lib framework. This MLX version provides general inference capabilities optimized for Apple Silicon. For advanced tool-use functionality, see [our github](https://github.com/rlresearch/dr-tulu) or check out our [demo](https://dr-tulu.github.io/)! ## Evaluation Results Results from the original DR-Tulu-8B model: | Benchmark | SQAv2 | HealthBench | ResearchQA | DeepResearch Bench | SimpleQA | 2Wiki | WebWalker | Average | |-----------|-------|-------------|------------|-------------------|----------|-------|-----------|---------| | [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) (naive rag) | 40.4 | 16.5 | 56.1 | 33.3 | 52.6 | 18.9 | 8.8 | 32.4 | | [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) (our search pipeline) | 57.2 | 5.9 | 46.3 | 18.2 | 70.5 | 44.0 | 27.9 | 38.6 | | [DR-Tulu-SFT-8B](https://huggingface.co/rl-research/DR-Tulu-SFT-8B) | 72.3 | 38.1 | 68.5 | 39.0 | 75.5 | 66.5 | 31.9 | 56.0 | | [DR-Tulu-8B](https://huggingface.co/rl-research/DR-Tulu-8B) (**original**) | **86.7** | **43.7** | **71.1** | **41.8** | **80.1** | **68.0** | **39.1** | **61.5** | For more baselines, explanations of this table, and analysis of results, check out the [Dr Tulu paper](https://allenai.org/papers/drtulu)! ## Intended uses & limitations This model is licensed under Apache 2.0. It is intended for research and educational use in accordance with [Ai2's Responsible Use Guidelines](https://allenai.org/responsible-use). **MLX-specific considerations:** * Optimized for Apple Silicon hardware only * **bf16 precision maintains full model quality** - research-grade full precision choice * Full precision preserved with minimal quality loss * Reasoning capabilities fully preserved across all variants ## Training The script used to train the original model can be found [here](https://github.com/rlresearch/dr-tulu/blob/rl/rl/open-instruct/train%5Fdr%5Ftulu.sh). For hyperparameter details, check out the [Dr Tulu paper](http://allenai-web/papers/drtulu). ## Links * 📝 [DR Tulu Paper](https://allenai.org/papers/drtulu) * ⚙️ [DR Tulu demo](https://dr-tulu.github.io/) * 💻 [DR Tulu code](https://github.com/rlresearch/dr-tulu) * 🤖 [DR Tulu collection](https://huggingface.co/collections/rl-research/dr-tulu) * 🚀 [Original model](https://huggingface.co/rl-research/DR-Tulu-8B) * ⚡ [MLX framework](https://github.com/ml-explore/mlx) ## Citation ```bibtex @article{drtulu, title = {{DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research}}, author = {{Rulin Shao, Akari Asai, Shannon Shen, Hamish Ivison, Varsha Kishore, Jingming Zhuo, Xinran Zhao, Molly Park, Sam Finlayson, David Sontag, Tyler Murray, Sewon Min, Pradeep Dasigi, Luca Soldani, Faeze Brahman, Scott Yih, Sherry Tongshuang Wu, Luke Zettlemoyer, Yoon Kim, Hanna Hajishirzi, Pang Wei Koh}}, year = {2025}, } ``` ## Conversion Details * **Date:** November 22, 2024 * **Converter:** MLX community * **Command:** `uvx --from mlx-lm mlx_lm.convert --hf-path rl-research/DR-Tulu-8B --mlx-path ./DR-Tulu-8B-bf16` * **Precision:** bfloat16 (full precision MLX conversion) * **Hardware:** Mac Studio, Apple M1 Ultra (20-core CPU, 128GB unified memory) * **OS:** macOS Sequoia 15.2 (Darwin 25.2.0) * **Framework Version:** mlx-lm latest (November 2024)