The models come in Thinking and Instruct versions and utilize a new architecture, allowing it to have ~10x faster inference than Qwen32B. ๐ Step-by-step Guide: https://docs.unsloth.ai/models/qwen3-next
I run 20 AI coding agents locally on my desktop workstation at 400+ tokens/sec with MiniMax-M2. Itโs a Sonnet drop-in replacement in my Cursor, Claude Code, Droid, Kilo and Cline peak at 11k tok/sec input and 433 tok/s output, can generate 1B+ tok/m.All with 196k context window. I'm running it for 6 days now with this config.
Today max performance was stable at 490.2 tokens/sec across 48 concurrent clients and MiniMax M2.
Running large language models efficiently is more than just raw GPU power. The latest guide breaks down the essential math to determine if your LLM workload is compute-bound or memory-bound.
We apply these principles to a real-world example: Qwen's 32B parameter model on the new NVIDIA RTX PRO 6000 Blackwell Edition.
In this guide, you will learn how to:
Calculate your GPU's operational intensity (Ops:Byte Ratio) Determine your model's arithmetic intensity Identify whether your workload is memory-bound or compute-bound
Join us on site October 22-23 to see how Arm empowers developers to build and deploy AI applications with ease using PyTorch and ExecuTorch. Learn about the latest AI technologies from Arm and our ecosystem while expanding your professional network alongside like-minded AI engineers.
a senior engineer at google just dropped a 400-page free book on docs for review: agentic design patterns.
the table of contents looks like everything you need to know about agents + code: > advanced prompt techniques > multi-agent patterns > tool use and MCP > you name it
This is what efficient AI looks like: Gemma 3n just dropped - a natively multimodal model that runs entirely on your device. No cloud. No API calls.
๐ง Text, image, audio, and video - handled locally. โก๏ธOnly needs 2B in GPU memory to run ๐คฏ First sub-10B model to hit 1300+ Elo โ Plug-and-play with Hugging Face, MLX, llama.cpp, and more.
Plus: Multilingual out of the box (140+ languages), fine-tune in a free Colab notebook.
๐ง We just implemented Andrej Karpathy's "third paradigm" for LLM learning!
System Prompt Learning (SPL) enables LLMs to automatically learn problem-solving strategies from experience, rather than relying on static prompts.
๐ How it works: Your LLM builds a database of effective strategies, selects the best ones for each problem, and refines them over time based on success rates.
The best part? All strategies are human-readable and the system gets progressively better at problem types you use frequently.
โจ Key benefits: ๐ Cumulative learning over time ๐ Transparent, inspectable strategies ๐ Works with any OpenAI-compatible API โก Simple integration: just add "spl-" prefix to your model
Built as an open-source plugin in optillm. After 500 queries, our system developed 129 strategies and refined 97 of them!
This feels like a genuine step toward AI that learns from experience while staying completely interpretable.
โจSkywork OR1-Math-7B > Optimized for math reasoning โจSkywork-OR1-7B-preview > Excels in math & coding โจSkywork-OR1-32B-preview > Matches Deepseek-R1 on math (AIME24/25) and coding (LiveCodeBench)
Released under the Apache 2.0 license ๐ฅณ Final version coming in 2 weeks!
reacted to jeffboudier's
post with ๐8 months ago
โจ High-Resolution Ghibli Style Image Generator โจ ๐ Introducing FLUX Ghibli LoRA Hello everyone! Today I'm excited to present a special LoRA model for FLUX Dev.1. This model leverages a LoRA trained on high-resolution Ghibli images for FLUX Dev.1 to easily create beautiful Ghibli-style images with stunning detail! ๐จ
Trained on High-Resolution Ghibli Images - Unlike other LoRAs, this one is trained on high-resolution images, delivering sharper and more beautiful results Powered by FLUX Dev.1 - Utilizing the latest FLUX model for faster generation and superior quality User-Friendly Interface - An intuitive UI that allows anyone to create Ghibli-style images with ease Diverse Creative Possibilities - Express various themes in Ghibli style, from futuristic worlds to fantasy elements
๐ผ๏ธ Sample Images
Include "Ghibli style" in your prompts Try combining nature, fantasy elements, futuristic elements, and warm emotions Add "[trigger]" tag at the end for better results
๐ Getting Started
Enter your prompt (e.g., "Ghibli style sky whale transport ship...") Adjust image size and generation settings Click the "Generate" button In just seconds, your beautiful Ghibli-style image will be created!
As we always use Transformers, it's helpful to understand RoPEโRotary Position Embedding. Since token order matters, RoPE encodes it by rotating token embeddings based on their position, so the model knows how to interpret which token comes first, second, and so on.
Here are 8 types of RoPE that can be implemented in different cases:
4. Multimodal RoPE (MRoPE) -> Qwen2.5-VL Technical Report (2502.13923) Decomposes positional embedding into 3 components: temporal, height and width, so that positional features are aligned across modalities: text, images and videos.
8. XPos (Extrapolatable Position Embedding) -> https://huggingface.co/papers/2212.10 Introduces an exponential decay factor into the rotation matrixโ, improving stability on long sequences.
We open-sourced the pruna package that can be easily installed with pip install pruna :) It allows to easily ccompress and evaluate AI models including transformers and diffusers.
With open-sourcing, people can now inspect and contribute to the open code. Beyond the code, we provide detailed readme, tutorials, benchmarks, and documentation to make transparent compression, evaluation, and saving/loading/serving of AI models.
Happy to share it with you and always interested in collecting your feedback :)