MoE version with the same performance as this 32B dense

#37
by rtzurtz - opened

Hello team Qwen,
a, say, ~100B (± some) (sqrt(total parameters * active parameters) would suggest such a size) parameter MoE version with the same performance as this 32B dense model would be a really perfect size for many. (the 235B MoE is unfortunately too big)

Sign up or log in to comment