Text Generation
Transformers
Safetensors
minimax_m2
conversational
custom_code
fp8

No lightning attention?

#8
by djuna - opened

Looks like it's a normal GQA attention?

What about speculative decoding; are there any models you recommend as the verifier to increase throughput?

@adsfdgfhgjhk11 from the config, this model has 3 mtp module.

I see, I have a mac studio tho, so no support for mtp :/

Sign up or log in to comment