Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
mitkox 
posted an update 8 days ago
Post
3004
I run 20 AI coding agents locally on my desktop workstation at 400+ tokens/sec with MiniMax-M2. It’s a Sonnet drop-in replacement in my Cursor, Claude Code, Droid, Kilo and Cline peak at 11k tok/sec input and 433 tok/s output, can generate 1B+ tok/m.All with 196k context window. I'm running it for 6 days now with this config.

Today max performance was stable at 490.2 tokens/sec across 48 concurrent clients and MiniMax M2.

Z8 Fury G5, Xeon 3455, 4xA6K. Aibrix 0.5.0, vLLM 0.11.2,

WHAT GPU ARE YOU USING?

·

4x A6000: 200GB VRAM for around 20 000$

glad you are rich

Cool stuff