Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
MerlinLi
's Collections
any-to-embedding
timeseries-llm
Agentic-llm
domain-specific-llm
QWenX
3D-Gen
Agent
gpt4-data
Code-LLM
Merged-LLM
Yi-LLM
text-to-image
synthetic-data
Speech-App
llm-structured-data
function-llm
mm-lm
dpo-datasets
text-to-speech
text-embedding
llm-guard
role-play-llm
synthetic-data
updated
May 11, 2025
Upvote
-
HuggingFaceTB/cosmopedia
Viewer
•
Updated
Aug 12, 2024
•
31.1M
•
12.6k
•
681
HuggingFaceTB/cosmopedia-20k
Viewer
•
Updated
Feb 23, 2024
•
20k
•
83
•
2
Open-Orca/SlimOrca-Dedup
Viewer
•
Updated
May 19, 2025
•
363k
•
6.86k
•
91
abacusai/SystemChat
Viewer
•
Updated
Mar 4, 2024
•
7.02k
•
548
•
135
allenai/WildChat-nontoxic
Viewer
•
Updated
May 6, 2024
•
530k
•
240
•
26
instruction-pretrain/instruction-synthesizer
Text Generation
•
7B
•
Updated
Mar 2
•
21
•
79
argilla/FinePersonas-v0.1
Viewer
•
Updated
Dec 11, 2024
•
42.1M
•
9.39k
•
409
opencsg/chinese-cosmopedia
Preview
•
Updated
Jan 15, 2025
•
837
•
76
Running
133
TxT360: Trillion Extracted Text
📖
133
Explore the TxT360 LLM pre‑training dataset
open-r1/OpenR1-Math-220k
Viewer
•
Updated
Feb 18, 2025
•
450k
•
14.2k
•
720
opencsg/chinese-fineweb-edu
Viewer
•
Updated
Dec 12, 2025
•
84.6M
•
24.2k
•
110
BAAI/CCI2-Data
Viewer
•
Updated
Dec 17, 2024
•
179M
•
470
•
57
Upvote
-
Share collection
View history
Collection guide
Browse collections