MoE version with the same performance as this 32B dense

#37

by rtzurtz - opened Jul 15

Jul 15

Hello team Qwen,
a, say, ~100B (± some) (sqrt(total parameters * active parameters) would suggest such a size) parameter MoE version with the same performance as this 32B dense model would be a really perfect size for many. (the 235B MoE is unfortunately too big)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment