No lightning attention?

by djuna - opened 11 days ago

djuna

11 days ago

Looks like it's a normal GQA attention?

11 days ago

What about speculative decoding; are there any models you recommend as the verifier to increase throughput?

djuna

11 days ago

@adsfdgfhgjhk11 from the config, this model has 3 mtp module.

11 days ago

I see, I have a mac studio tho, so no support for mtp :/

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment