Whaaaaa damn thats really good!
alkinun
AtAndDev
		AI & ML interests
LLMs, Alignment, Merging, Unsloth, DPO, SFT, ORPO, SPIN..
		Recent Activity
						updated
								a model
							
						about 4 hours ago
						
					
						
						
						
						AtAndDev/UVOX-Magpie-xLAM-Supernova-8B
						
						published
								a model
							
						about 4 hours ago
						
					
						
						
						
						AtAndDev/UVOX-Magpie-xLAM-Supernova-8B
						
						updated
								a model
							
						about 5 hours ago
						
					
						
						
						
						AtAndDev/Magpie-xLAM-Supernova-8B
						Organizations
 
					
	
		
					
					replied to 
						
mrfakename's
							
	
						post
						
				
				1 day ago
 
					
	
		
					reacted to 
						
mrfakename's
							
	
						post with 🔥
				
				1 day ago
Post
				
				
							2815
					Trained a model for emotion-controllable TTS based on MiMo audio on LAION's dataset.
Still very early and does have an issue with hallucinating but results seem pretty good so far, given that it is very early into the training run.
Will probably kick off a new run later with some settings tweaked.
Put up a demo here: mrfakename/EmoAct-MiMo
(Turn 🔊 on to hear audio samples)
	
		
	Still very early and does have an issue with hallucinating but results seem pretty good so far, given that it is very early into the training run.
Will probably kick off a new run later with some settings tweaked.
Put up a demo here: mrfakename/EmoAct-MiMo
(Turn 🔊 on to hear audio samples)
 
					
	
		
					reacted to 
						
sourceoftruthdata's
							
	
						post with ❤️🤗
				
				1 day ago
Post
				
				
							3116
					What a fantastic community!
		
	
		
	 
					
	
		
					reacted to 
						
AdinaY's
							
	
						post with 🔥
				
				1 day ago
Post
				
				
							1589
					Glyph 🔥 a framework that scales context length by compressing text into images and processing them with vision–language models, released by Z.ai.
Paper:https://huggingface.co/papers/2510.17800
Model:https://huggingface.co/zai-org/Glyph
✨ Compresses long sequences visually to bypass token limits
✨ Reduces computational and memory costs
✨ Preserves meaning through multimodal encoding
✨ Built on GLM-4.1V-9B-Base
 
					
	
		
					reacted to 
						
s3nh's
							
	
						post with 🔥
				
				10 days ago
Post
				
				
							451
					Eduhelp with more empathy, based on model finetuned on 
psychotheraputic preferences just landed on
 
Beck-8B as a base model, 13000 steps on educational dataset.
Time to go further and build more 🥰
s3nh/EduHelp_Beck_8B
Thanks to @basilic_ai for computations <3
	
		
	psychotheraputic preferences just landed on
Beck-8B as a base model, 13000 steps on educational dataset.
Time to go further and build more 🥰
s3nh/EduHelp_Beck_8B
Thanks to @basilic_ai for computations <3
 
					
	
		
					
					replied to 
						
MonsterMMORPG's
							
	
						post
						
				
				about 2 months ago
This comment has been hidden
		 
					
	
		
					
					replied to 
						
MonsterMMORPG's
							
	
						post
						
				
				about 2 months ago
This comment has been hidden
		 
					
	
		
					reacted to 
						
merve's
							
	
						post with 👍❤️
				
				2 months ago
Post
				
				
							6041
					first vision language model built off 
	openai/gpt-oss-20b just dropped! 🔥 
InternVL3.5 comes with 32 models 🤯 pre-trained, fine-tuned, aligned in various sizes OpenGVLab/internvl35-68ac87bd52ebe953485927fb
comes with gpt-oss or Qwen3 for LLM part ⤵️
	
		
	InternVL3.5 comes with 32 models 🤯 pre-trained, fine-tuned, aligned in various sizes OpenGVLab/internvl35-68ac87bd52ebe953485927fb
comes with gpt-oss or Qwen3 for LLM part ⤵️
Also, I do not think someone will achive AGI as we dont know what AGI is. I think we will just do incremental perf insreases, not an "unlock" that creates AGI.
In my pov, it should be open, if I can achieve AGI, someday someone will too. So theres no need to slow things down like eu. Just let things happen, accelerate and decentralize.
 
					
	
		
					reacted to 
						
prithivMLmods's
							
	
						post with 👀👍
				
				3 months ago
Post
				
				
							4407
					On the verge of releasing Poseidon-Reasoning-5M, a dataset built to excel in general thought processes, mathematics, and science across a diverse mixture of domains, I’m also dropping the Gargantua-R1-Compact dataset, a collection of over six million high-quality reasoning QA pair traces.  🤗🚀
✦ Gargantua-R1-Compact : prithivMLmods/Gargantua-R1-Compact
Additionally, I’m adding the mini version of Gargantua — the Gargantua-R1-Wee : prithivMLmods/Gargantua-R1-Wee
The composition spans 73.93% core mathematical reasoning involving problems, proofs, and computational challenges, 12.11% across diverse scientific domains such as physics, chemistry, biology, and interdisciplinary topics, 11.35% in competitive coding covering algorithms and data structures, 1.37% in academic science focusing on research-level methodology, 0.95% in creative and analytical reasoning through logic puzzles and problem-solving tasks, 0.25% in specialized technical areas like MLOps, LLMs, diffusion models, and CUDA, and 0.06% involving data from graphs and charts converted into structured JSON formats. Designed with both rich contextual depth and formal structural clarity, Gargantua-R1-Compact is an optimal resource for advancing research in symbolic reasoning, interpretability, and high-precision question answering in mathematical domains.
✦ Collection : prithivMLmods/gargantua-r1-mod-6896bfd7834e82b89ad2b38b
To know more about it, visit the dataset card of the respective dataset. !!
	
		
	✦ Gargantua-R1-Compact : prithivMLmods/Gargantua-R1-Compact
from datasets import load_dataset
dataset = load_dataset("prithivMLmods/Gargantua-R1-Compact", split="train")Additionally, I’m adding the mini version of Gargantua — the Gargantua-R1-Wee : prithivMLmods/Gargantua-R1-Wee
from datasets import load_dataset
dataset = load_dataset("prithivMLmods/Gargantua-R1-Wee", split="train")The composition spans 73.93% core mathematical reasoning involving problems, proofs, and computational challenges, 12.11% across diverse scientific domains such as physics, chemistry, biology, and interdisciplinary topics, 11.35% in competitive coding covering algorithms and data structures, 1.37% in academic science focusing on research-level methodology, 0.95% in creative and analytical reasoning through logic puzzles and problem-solving tasks, 0.25% in specialized technical areas like MLOps, LLMs, diffusion models, and CUDA, and 0.06% involving data from graphs and charts converted into structured JSON formats. Designed with both rich contextual depth and formal structural clarity, Gargantua-R1-Compact is an optimal resource for advancing research in symbolic reasoning, interpretability, and high-precision question answering in mathematical domains.
✦ Collection : prithivMLmods/gargantua-r1-mod-6896bfd7834e82b89ad2b38b
To know more about it, visit the dataset card of the respective dataset. !!
 
					
	
		
					reacted to 
						
fdaudens's
							
	
						post with 👍🚀
				
				3 months ago
Post
				
				
							3413
					OpenAI’s GPT-OSS has sparked ~400 new models on Hugging Face and racked up 5M downloads in less than a week, already outpacing DeepSeek R1’s first-week numbers.
For comparison: when R1 launched, I tracked 550 derivatives (across 8 base models) in a week, with ~3M downloads. GPT-OSS is ahead on adoption and engagement.
It’s also the most-liked release of any major LLM this summer. The 20B and 120B versions quickly shot past Kimi K2, GLM 4.5, and others in likes.
Most-downloaded GPT-OSS models include LM Studio and Unsloth AI versions:
1️⃣ openai/gpt-oss-20b - 2.0M
2️⃣ lmstudio-community/gpt-oss-20b-MLX-8bit - 750K
3️⃣ openai/gpt-oss-120b - 430K
4️⃣ unsloth/gpt-oss-20b-GGUF - 380K
5️⃣ lmstudio-community/gpt-oss-20b-GGUF - 330K
The 20B version is clearly finding its audience, showing the power of smaller, faster, more memory- and energy-efficient models. (These numbers don’t include calls to the models via inference providers, so the real usage is likely even bigger, especially for the 120B version)
Open-weight models let anyone build on top. Empower the builders, and innovation takes off. 🚀
	
		
	For comparison: when R1 launched, I tracked 550 derivatives (across 8 base models) in a week, with ~3M downloads. GPT-OSS is ahead on adoption and engagement.
It’s also the most-liked release of any major LLM this summer. The 20B and 120B versions quickly shot past Kimi K2, GLM 4.5, and others in likes.
Most-downloaded GPT-OSS models include LM Studio and Unsloth AI versions:
1️⃣ openai/gpt-oss-20b - 2.0M
2️⃣ lmstudio-community/gpt-oss-20b-MLX-8bit - 750K
3️⃣ openai/gpt-oss-120b - 430K
4️⃣ unsloth/gpt-oss-20b-GGUF - 380K
5️⃣ lmstudio-community/gpt-oss-20b-GGUF - 330K
The 20B version is clearly finding its audience, showing the power of smaller, faster, more memory- and energy-efficient models. (These numbers don’t include calls to the models via inference providers, so the real usage is likely even bigger, especially for the 120B version)
Open-weight models let anyone build on top. Empower the builders, and innovation takes off. 🚀
 
					
	
		
					reacted to 
						
ovi054's
							
	
						post with 🔥
				
				3 months ago
Post
				
				
							3759
					WAN 2.2 Text to Image ⚡
ovi054/wan2-2-text-to-image
We all know that WAN 2.2 A14B is a video model. But It turns out this video model can also produce great image results with incredible prompt adherence! The image output is sharp, detailed, and sticks to the prompt better than most.
👉 Try it now: ovi054/wan2-2-text-to-image
	
		
	ovi054/wan2-2-text-to-image
We all know that WAN 2.2 A14B is a video model. But It turns out this video model can also produce great image results with incredible prompt adherence! The image output is sharp, detailed, and sticks to the prompt better than most.
👉 Try it now: ovi054/wan2-2-text-to-image
 
								












































 
					 
					 
					 
					 
					 
					 
					 
					