No lightning attention?
#8
by
djuna
- opened
Looks like it's a normal GQA attention?
What about speculative decoding; are there any models you recommend as the verifier to increase throughput?
I see, I have a mac studio tho, so no support for mtp :/