Open to Collab

6 128 166

Shyam Sunder Kumar

theainerd

AI & ML interests

Natural Language Processing

Recent Activity

liked a model 1 day ago

nvidia/Orchestrator-8B

reacted to danielhanchen's post with 🔥 1 day ago

Qwen3-Next can now be Run locally! (30GB RAM) Instruct GGUF: https://huggingface.co/unsloth/Qwen3-Next-80B-A3B-Instruct-GGUF The models come in Thinking and Instruct versions and utilize a new architecture, allowing it to have ~10x faster inference than Qwen32B. 💜 Step-by-step Guide: https://docs.unsloth.ai/models/qwen3-next Thinking GGUF: https://huggingface.co/unsloth/Qwen3-Next-80B-A3B-Thinking-GGUF

liked a model 3 days ago

deepseek-ai/DeepSeek-Math-V2

View all activity

Organizations

reacted to danielhanchen's post with 🔥 1 day ago

Post

6610

Qwen3-Next can now be Run locally! (30GB RAM)
Instruct GGUF: unsloth/Qwen3-Next-80B-A3B-Instruct-GGUF

The models come in Thinking and Instruct versions and utilize a new architecture, allowing it to have ~10x faster inference than Qwen32B.
💜 Step-by-step Guide: https://docs.unsloth.ai/models/qwen3-next

Thinking GGUF: unsloth/Qwen3-Next-80B-A3B-Thinking-GGUF

reacted to mitkox's post with 🔥 6 days ago

Post

2997

I run 20 AI coding agents locally on my desktop workstation at 400+ tokens/sec with MiniMax-M2. It’s a Sonnet drop-in replacement in my Cursor, Claude Code, Droid, Kilo and Cline peak at 11k tok/sec input and 433 tok/s output, can generate 1B+ tok/m.All with 196k context window. I'm running it for 6 days now with this config.

Today max performance was stable at 490.2 tokens/sec across 48 concurrent clients and MiniMax M2.

Z8 Fury G5, Xeon 3455, 4xA6K. Aibrix 0.5.0, vLLM 0.11.2,

4 replies

reacted to sergiopaniego's post with 👍 10 days ago

Post

2524

we've just added several example scripts to TRL showing how to train models with GRPO using some of the new OpenEnv environments

train a model to interact with a browser (🎮 BrowserGym Env), play Wordle (🎮 Wordle Env) and moooore!

TRL (GRPO + vLLM) + OpenEnv! ⚡️

📝 go play with them: https://github.com/huggingface/trl/tree/main/examples/scripts/openenv

📝 examples list: https://huggingface.co/docs/trl/main/en/example_overview#scripts

reacted to flozi00's post with 👍 12 days ago

Post

2756

Running large language models efficiently is more than just raw GPU power. The latest guide breaks down the essential math to determine if your LLM workload is compute-bound or memory-bound.

We apply these principles to a real-world example: Qwen's 32B parameter model on the new NVIDIA RTX PRO 6000 Blackwell Edition.

In this guide, you will learn how to:

Calculate your GPU's operational intensity (Ops:Byte Ratio)
Determine your model's arithmetic intensity
Identify whether your workload is memory-bound or compute-bound

Read the full guide here: https://flozi.net/en/guides/ai/llm-inference-math

posted an update 14 days ago

Post

4861

Hindi Speech to Text just crossed 20 million downloads. Grateful for everyone using it.

theainerd/Wav2Vec2-large-xlsr-hindi

reacted to AdinaY's post with 🚀 22 days ago

Post

3107

Kimi K2 Thinking is now live on the hub 🔥

moonshotai/Kimi-K2-Thinking

✨ 1T MoE for deep reasoning & tool use
✨ Native INT4 quantization = 2× faster inference
✨ 256K context window
✨ Modified MIT license

reacted to sondhiArm's post with 🔥 about 2 months ago

Post

1420

Arm will be @ PyTorch Conference, Join Us!

Join us on site October 22-23 to see how Arm empowers developers to build and deploy AI applications with ease using PyTorch and ExecuTorch. Learn about the latest AI technologies from Arm and our ecosystem while expanding your professional network alongside like-minded AI engineers.

Learn more here:
https://huggingface.co/blog/Arm/arm-at-pytorch-conference

reacted to AdinaY's post with 👀 2 months ago

Post

1641

Ring-1T-preview 🔥 1T thinking model released by Ant Group.

inclusionAI/Ring-1T-preview

✨ MoE architecture + 20T tokens + RLVR via ASystem
✨ Strong natural language reasoning (AIME’25: 92.6, close to GPT-5)
✨IMO tests: advanced problem-solving & reasoning

reacted to hesamation's post with ❤️ 3 months ago

Post

10157

a senior engineer at google just dropped a 400-page free book on docs for review: agentic design patterns.

the table of contents looks like everything you need to know about agents + code:
> advanced prompt techniques
> multi-agent patterns
> tool use and MCP
> you name it

read it here: https://docs.google.com/document/d/1rsaK53T3Lg5KoGwvf8ukOUvbELRtH-V0LnOIFDxBryE/edit?tab=t.0#heading=h.pxcur8v2qagu

you can also pre-order on Amazon (published by Springer) and the royalties goes to Save the Children: https://www.amazon.com/Agentic-Design-Patterns-Hands-Intelligent/dp/3032014018/

reacted to danielhanchen's post with ❤️ 4 months ago

Post

5441

Run OpenAI's new gpt-oss models locally with Unsloth GGUFs! 🔥🦥
20b GGUF: unsloth/gpt-oss-20b-GGUF
120b GGUF: unsloth/gpt-oss-120b-GGUF

Model will run on 14GB RAM for 20b and 66GB for 120b.

2 replies

reacted to AdinaY's post with 🔥 4 months ago

Post

3557

Qwen3-30B-A3B-Thinking-2507 🔥 latest step in scaling thinking capabilities from Alibaba Qwen team.

Qwen/Qwen3-30B-A3B-Thinking-2507-FP8

✨ 30B total / 3B active - Apache 2.0
✨ Native 256K context
✨ SOTA coding, alignment, agentic reasoning

reacted to fdaudens's post with 🔥 5 months ago

Post

1851

This is what efficient AI looks like: Gemma 3n just dropped - a natively multimodal model that runs entirely on your device. No cloud. No API calls.

🧠 Text, image, audio, and video - handled locally.
⚡️Only needs 2B in GPU memory to run
🤯 First sub-10B model to hit 1300+ Elo
✅ Plug-and-play with Hugging Face, MLX, llama.cpp, and more.

Plus: Multilingual out of the box (140+ languages), fine-tune in a free Colab notebook.

google/gemma-3n-685065323f5984ef315c93f4

1 reply

reacted to codelion's post with 🚀 6 months ago

Post

3510

🧠 We just implemented Andrej Karpathy's "third paradigm" for LLM learning!

System Prompt Learning (SPL) enables LLMs to automatically learn problem-solving strategies from experience, rather than relying on static prompts.

🚀 How it works:
Your LLM builds a database of effective strategies, selects the best ones for each problem, and refines them over time based on success rates.

📊 Results across math benchmarks:
Arena Hard: 29% → 37.6% (+8.6%)
AIME24: 23.33% → 30% (+6.67%)
OptILLMBench: 61% → 65% (+4%)

The best part? All strategies are human-readable and the system gets progressively better at problem types you use frequently.

✨ Key benefits:
🔄 Cumulative learning over time
📖 Transparent, inspectable strategies
🔌 Works with any OpenAI-compatible API
⚡ Simple integration: just add "spl-" prefix to your model

Built as an open-source plugin in optillm. After 500 queries, our system developed 129 strategies and refined 97 of them!

This feels like a genuine step toward AI that learns from experience while staying completely interpretable.

🔗 GitHub: https://github.com/codelion/optillm/tree/main/optillm/plugins/spl
📖 Full article: https://huggingface.co/blog/codelion/system-prompt-learning
🐦 Original Karpathy tweet: https://x.com/karpathy/status/1921368644069765486

Have you experimented with advanced system prompting? What strategies would you want your LLM to learn?

reacted to AdinaY's post with ❤️ 8 months ago

Post

3304

🔥 New reasoning models from the Chinese community, by Skywork 天工-昆仑万维

Skywork/skywork-or1-67fa1bcb41b436ef2def76b9

✨Skywork OR1-Math-7B > Optimized for math reasoning
✨Skywork-OR1-7B-preview > Excels in math & coding
✨Skywork-OR1-32B-preview > Matches Deepseek-R1 on math (AIME24/25) and coding (LiveCodeBench)

Released under the Apache 2.0 license 🥳
Final version coming in 2 weeks!

reacted to jeffboudier's post with 🚀 8 months ago

Post

2214

Llama4 is out and Scout is already on the Dell Enterprise Hub to deploy on Dell systems 👉 dell.huggingface.co

reacted to aiqtech's post with 🔥 8 months ago

Post

7654

✨ High-Resolution Ghibli Style Image Generator ✨
🌟 Introducing FLUX Ghibli LoRA
Hello everyone! Today I'm excited to present a special LoRA model for FLUX Dev.1. This model leverages a LoRA trained on high-resolution Ghibli images for FLUX Dev.1 to easily create beautiful Ghibli-style images with stunning detail! 🎨

space: aiqtech/FLUX-Ghibli-Studio-LoRA
model: openfree/flux-chatgpt-ghibli-lora

🔮 Key Features

Trained on High-Resolution Ghibli Images - Unlike other LoRAs, this one is trained on high-resolution images, delivering sharper and more beautiful results
Powered by FLUX Dev.1 - Utilizing the latest FLUX model for faster generation and superior quality
User-Friendly Interface - An intuitive UI that allows anyone to create Ghibli-style images with ease
Diverse Creative Possibilities - Express various themes in Ghibli style, from futuristic worlds to fantasy elements

🖼️ Sample Images

Include "Ghibli style" in your prompts
Try combining nature, fantasy elements, futuristic elements, and warm emotions
Add "[trigger]" tag at the end for better results

🚀 Getting Started

Enter your prompt (e.g., "Ghibli style sky whale transport ship...")
Adjust image size and generation settings
Click the "Generate" button
In just seconds, your beautiful Ghibli-style image will be created!

🤝 Community
Want more information and tips? Join our community!
Discord: https://discord.gg/openfreeai

Create your own magical world with the LoRA trained on high-resolution Ghibli images for FLUX Dev.1! 🌈✨

reacted to clem's post with 🤗 8 months ago

Post

2447

What's this cool purple banner haha 😶😶😶

4 replies

reacted to Kseniase's post with 👀 8 months ago

Post

5681

8 types of RoPE

As we always use Transformers, it's helpful to understand RoPE—Rotary Position Embedding. Since token order matters, RoPE encodes it by rotating token embeddings based on their position, so the model knows how to interpret which token comes first, second, and so on.

Here are 8 types of RoPE that can be implemented in different cases:

1. Original RoPE -> RoFormer: Enhanced Transformer with Rotary Position Embedding (2104.09864)
Encodes token positions by rotating token embeddings in the complex plane via a position-based rotation matrix, thereby providing the self-attention mechanism with relative positional info.

2. LongRoPE -> LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens (2402.13753)
Extends the context window of pre-trained LLMs to 2048k tokens, leveraging non-uniformities in positional interpolation with an efficient search.

3. LongRoPE2 -> LongRoPE2: Near-Lossless LLM Context Window Scaling (2502.20082)
Extends the effective context window of pre-trained LLMs to the target! length, rescaling RoPE guided by “needle-driven” perplexity.

4. Multimodal RoPE (MRoPE) -> Qwen2.5-VL Technical Report (2502.13923)
Decomposes positional embedding into 3 components: temporal, height and width, so that positional features are aligned across modalities: text, images and videos.

5. Directional RoPE (DRoPE) -> DRoPE: Directional Rotary Position Embedding for Efficient Agent Interaction Modeling (2503.15029)
Adds an identity scalar, improving how angles are handled without extra complexity. It helps balance accuracy, speed, and memory usage.

6. VideoRoPE -> VideoRoPE: What Makes for Good Video Rotary Position Embedding? (2502.05173)
Adapts RoPE for video, featuring 3D structure, low-frequency temporal allocation, diagonal layout, and adjustable spacing.

7. VRoPE -> VRoPE: Rotary Position Embedding for Video Large Language Models (2502.11664)
An another RoPE for video, which restructures positional indices and balances encoding for uniform spatial focus.

8. XPos (Extrapolatable Position Embedding) -> https://huggingface.co/papers/2212.10
Introduces an exponential decay factor into the rotation matrix, improving stability on long sequences.

1 reply

reacted to onekq's post with 🤯 8 months ago

Post

3773

Folks, let's get ready.🥳 We will be busy soon. 😅🤗https://github.com/huggingface/transformers/pull/36878

reacted to sharpenb's post with 🔥 8 months ago

Post

3114

We open-sourced the pruna package that can be easily installed with pip install pruna :) It allows to easily ccompress and evaluate AI models including transformers and diffusers.

- Github repo: https://github.com/PrunaAI/pruna
- Documentation: https://docs.pruna.ai/en/stable/index.html

With open-sourcing, people can now inspect and contribute to the open code. Beyond the code, we provide detailed readme, tutorials, benchmarks, and documentation to make transparent compression, evaluation, and saving/loading/serving of AI models.

Happy to share it with you and always interested in collecting your feedback :)

2 replies

Shyam Sunder Kumar

AI & ML interests

Recent Activity

Organizations

theainerd's activity