Deventhedude 's Collections Finetune data
updated
Two Minds Better Than One: Collaborative Reward Modeling for LLM
Alignment
Paper
• 2505.10597
• Published
COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for
Alignment with Human Values
Paper
• 2504.05535
• Published
• 44
Viewer
• Updated
• 133k • 4.79k
• 105
nvidia/Nemotron-RL-instruction_following
Preview
• Updated
• 247
• 11
nvidia/Nemotron-RL-knowledge-web_search-mcqa
Viewer
• Updated
• 2.93k • 148
• 13
nvidia/Nemotron-RL-agent-workplace_assistant
Viewer
• Updated
• 1.8k • 346
• 17
nvidia/Nemotron-RL-instruction_following-structured_outputs
Viewer
• Updated
• 9.95k • 193
• 35
nvidia/Nemotron-RL-knowledge-mcqa
Viewer
• Updated
• 686k • 258
• 9
nvidia/Nemotron-RL-math-OpenMathReasoning
Viewer
• Updated
• 113k • 150
• 14
nvidia/Nemotron-RL-knowledge-openqa
Viewer
• Updated
• 136k • 101
• 9
nvidia/Nemotron-RL-math-advanced_calculations
Viewer
• Updated
• 6k • 63
• 9
nvidia/Nemotron-AIQ-Agentic-Safety-Dataset-1.0
Viewer
• Updated
• 10.8k • 783
• 14
nvidia/Nemotron-VLM-Dataset-v2
Viewer
• Updated
• 4.58M • 5.35k
• 86
Viewer
• Updated
• 40 • 814
• 27
google/code_x_glue_cc_code_completion_token
Viewer
• Updated
• 178k • 857
• 9
google/code_x_glue_cc_cloze_testing_all
Viewer
• Updated
• 176k • 117
• 6
google/code_x_glue_cc_clone_detection_big_clone_bench
Viewer
• Updated
• 1.73M • 637
• 21
google/code_x_glue_ct_code_to_text
Viewer
• Updated
• 1.01M • 1.98k
• 79
google/code_x_glue_tc_nl_code_search_adv
Viewer
• Updated
• 281k • 197
• 11
TeichAI/claude-sonnet-4.5-high-reasoning-250x
Viewer
• Updated
• 247 • 702
• 36
Idea2Plan: Exploring AI-Powered Research Planning
Paper
• 2510.24891
• Published
TGPR: Tree-Guided Policy Refinement for Robust Self-Debugging of LLMs
Paper
• 2510.06878
• Published
• 1
FML-bench: A Benchmark for Automatic ML Research Agents Highlighting the
Importance of Exploration Breadth
Paper
• 2510.10472
• Published
• 9
Scientific Algorithm Discovery by Augmenting AlphaEvolve with Deep
Research
Paper
• 2510.06056
• Published
• 6
RECODE-H: A Benchmark for Research Code Development with Interactive Human Feedback
Paper
• 2510.06186
• Published
AlphaResearch: Accelerating New Algorithm Discovery with Language Models
Paper
• 2511.08522
• Published
• 18
Viewer
• Updated
• 169k • 21.2k
• 1.67k
open-thoughts/OpenThoughts3-1.2M
Viewer
• Updated
• 1.2M • 9.07k
• 212
Preview
• Updated
• 325
• 106
Viewer
• Updated
• 14.8M • 36.3k
• 113
Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long
Documents
Paper
• 2310.19923
• Published
• 14
Viewer
• Updated
• 200k • 3.19k
• 78
Viewer
• Updated
• 52.5B • 176k
• 2.7k
rl-research/dr-tulu-sft-data
Viewer
• Updated
• 13.1k • 317
• 27
Viewer
• Updated
• 4.48B • 69.7k
• 768
miromind-ai/MiroVerse-v0.1
Viewer
• Updated
• 228k • 341
• 228
nvidia/Llama-Nemotron-Post-Training-Dataset
Viewer
• Updated
• 3.91M • 2.88k
• 644
Viewer
• Updated
• 61.6M • 86.1k
• 1.15k
Viewer
• Updated
• 500 • 114k
• 288
nick007x/github-code-2025
Viewer
• Updated
• 147M • 5.23k
• 116
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper
• 2508.06471
• Published
• 206
Viewer
• Updated
• 4.06k • 318
• 190
natolambert/GeneralThought-430K-filtered
Viewer
• Updated
• 338k • 185
• 35
RJT1990/GeneralThoughtArchive
Viewer
• Updated
• 431k • 1.09k
• 72
open-thoughts/OpenThoughts-114k
Viewer
• Updated
• 228k • 114k
• 813
Viewer
• Updated
• 516k • 1.57k
• 76
PrimeIntellect/SYNTHETIC-1
Viewer
• Updated
• 1.99M • 372
• 61
PrimeIntellect/synthetic-code-understanding
Viewer
• Updated
• 60.6k • 44
• 19
PrimeIntellect/INTELLECT-3-SFT
Viewer
• Updated
• 6.98M • 515
• 2
openbmb/InfLLM-V2-data-5B
Viewer
• Updated
• 7.19M • 474
• 32
kenhktsui/open-react-retrieval-multi-neg-result-new-kw
Viewer
• Updated
• 25.2k • 11
• 3
alwaysfurther/tiny-agent-with-tools
Viewer
• Updated
• 27 • 22
Viewer
• Updated
• 9 • 6.2k
• 34
Viewer
• Updated
• 68M • 95.3k
• 253
TuringEnterprises/Turing-Open-Reasoning
Viewer
• Updated
• 50 • 275
• 187
TeichAI/claude-4.5-opus-high-reasoning-250x
Viewer
• Updated
• 250 • 3.21k
• 328
PrimeIntellect/INTELLECT-3-RL
Viewer
• Updated
• 70.7k • 2.32k
• 6
PrimeIntellect/Reverse-Text-RL
Viewer
• Updated
• 1k • 6.43k
• 2
PrimeIntellect/Reverse-Text-SFT
Viewer
• Updated
• 1k • 898
• 3
PrimeIntellect/SYNTHETIC-2-Base-Code
Viewer
• Updated
• 57.3k • 31
PrimeIntellect/SYNTHETIC-2-Base-Math
Viewer
• Updated
• 105k • 22
• 1
PrimeIntellect/SYNTHETIC-2-Base
Viewer
• Updated
• 465k • 27
• 9
PrimeIntellect/SYNTHETIC-2-Base-General-Reasoning
Viewer
• Updated
• 165k • 23
• 1
PrimeIntellect/SYNTHETIC-2-SFT-verified
Viewer
• Updated
• 105k • 202
• 8
PrimeIntellect/SYNTHETIC-2-Base-Answer-Critique
Viewer
• Updated
• 50k • 9
• 2
PrimeIntellect/SYNTHETIC-2-Base-Instruction-Following
Viewer
• Updated
• 87.5k • 8
PrimeIntellect/SYNTHETIC-2
Viewer
• Updated
• 51.6k • 200
• 14
Viewer
• Updated
• 30 • 770
Viewer
• Updated
• 30 • 800
Viewer
• Updated
• 500 • 135
PrimeIntellect/LiveCodeBench-v5
Viewer
• Updated
• 279 • 89
arcee-ai/bfcl_v4_web_search
Viewer
• Updated
• 100 • 86
• 2
Viewer
• Updated
• 74.2k • 53
• 37
arcee-ai/general-dpo-datasets
Viewer
• Updated
• 91.6k • 194
arcee-ai/synthetic-data-gen
Viewer
• Updated
• 999k • 6
• 2
Viewer
• Updated
• 10.4k • 53
Viewer
• Updated
• 15.4k • 31
• 7
arcee-ai/reasoning-sharegpt
Viewer
• Updated
• 29.9k • 40
• 23
Viewer
• Updated
• 486k • 108
• 63
arcee-ai/infini-instruct-top-500k
Viewer
• Updated
• 500k • 49
• 6
arcee-ai/cleaned-mlabonne-distilabel-truthy-dpo-v0.1-filtered
Viewer
• Updated
• 663 • 6
Viewer
• Updated
• 369k • 1.97k
• 140
Viewer
• Updated
• 5k • 564
• 95
Viewer
• Updated
• 11.3k • 1.33k
• 168
glaiveai/glaive-function-calling-v2
Viewer
• Updated
• 113k • 9.52k
• 495
Viewer
• Updated
• 28k • 763
• 44
Salesforce/xlam-function-calling-60k
Viewer
• Updated
• 60k • 6.29k
• 581
HuggingFaceFW/fineweb-edu
Viewer
• Updated
• 3.5B • 223k
• 992
Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch
Paper
• 2512.02395
• Published
• 49
MATRIX: Multimodal Agent Tuning for Robust Tool-Use Reasoning
Paper
• 2510.08567
• Published
Scaling Agentic Reinforcement Learning for Tool-Integrated Reasoning in VLMs
Paper
• 2511.19773
• Published
• 10
ToolScope: An Agentic Framework for Vision-Guided and Long-Horizon Tool
Use
Paper
• 2510.27363
• Published
• 23
Ariadne: A Controllable Framework for Probing and Extending VLM
Reasoning Boundaries
Paper
• 2511.00710
• Published
• 5
VLA-R1: Enhancing Reasoning in Vision-Language-Action Models
Paper
• 2510.01623
• Published
• 12
DeepEyesV2: Toward Agentic Multimodal Model
Paper
• 2511.05271
• Published
• 45
DeepMMSearch-R1: Empowering Multimodal LLMs in Multimodal Web Search
Paper
• 2510.12801
• Published
• 13
DeepAgent: A General Reasoning Agent with Scalable Toolsets
Paper
• 2510.21618
• Published
• 101
Open Multimodal Retrieval-Augmented Factual Image Generation
Paper
• 2510.22521
• Published
• 31
smolagents/android-control
Viewer
• Updated
• 15.3k • 830
• 13
smolagents/guiact-web-single
Viewer
• Updated
• 13.3k • 50
• 1
Viewer
• Updated
• 1.89k • 29
• 5
smolagents/hermes-function-calling-v1-formatted-code-agent
Viewer
• Updated
• 9k • 19
• 1
smolagents/aguvis-stage-1
Viewer
• Updated
• 459k • 8.21k
• 16
smolagents/aguvis-stage-2
Viewer
• Updated
• 784k • 6.07k
• 26
Viewer
• Updated
• 10.5k • 8
• 1
beyoru/ToolCall_synthetic_qwen3
Viewer
• Updated
• 60k • 25
• 10
rogue-security/mcp-tool-use-quality-benchmark
Viewer
• Updated
• 5k • 17
• 3
mlx-community/hermes-reasoning-tool-use
Viewer
• Updated
• 51k • 38
• 5
TeichAI/gemini-3-pro-preview-high-reasoning-1000x
Viewer
• Updated
• 1.02k • 703
• 74
Viewer
• Updated
• 1.29B • 44.1k
• 330
allenai/Dolci-Instruct-SFT-Tool-Use
Viewer
• Updated
• 228k • 323
• 13
nvidia/Nemotron-Content-Safety-Reasoning-Dataset
Preview
• Updated
• 161
• 7
ai-safety-institute/AgentHarm
Viewer
• Updated
• 468 • 5.59k
• 50
Viewer
• Updated
• 1.27k • 2.7k
• 1
rootsautomation/ScreenSpot
Viewer
• Updated
• 1.27k • 1.77k
• 44
Preview
• Updated
• 141
• 16
Viewer
• Updated
• 150 • 15
• 3
Viewer
• Updated
• 300 • 250
• 23
Preview
• Updated
• 129
• 10
Viewer
• Updated
• 503 • 29.2k
• 36