I find qwen3 next exceptional, but too big.

by ZeroWw - opened 18 days ago

Discussion

ZeroWw

18 days ago

Please create a 32b or even 14B model! It would be great!

RecViking

18 days ago

Qwen3 Next is exceptional partially because of its size. While the number of parameters isn't exactly a 1:1 in terms of parameter size to capabilities, there's certainly a strong well studied link. You could remove half of the experts from the model and attempt to resettle the weights, but you'd end up with something that's roughly half the capability depending on how and what you decided to remove and what you decided to measure as capability. You could even get more precise and test for activation and try to discover which experts were most useful in your use cases and then remove the ones you don't "need". But you are giving up generalization in that case too.

A model "stores" information and behaviors/capabilities in one single space. Remove that space and you are removing whatever knowledge and/or capability was there. Other areas within the network may be able to compensate, but you are losing specificity.

Information only compresses so far. There are limits. For an LLM, size matters - at least with current technology and architectures. We really need a major architecture shift and/or boost in hardware capabilities and power efficiency.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment