Breaks once it hits Context Limit
It's a very intelligent model, useful for all sorts of things, and fast too.
But the moment it crosses that Context Limit, it breaks. It doesn't matter if it's set to 8192, 16384, or 32768—once it hits 100%, poof.
Forgot to mention I'm using the Q6_K version, but I really wish I knew why it breaks at Context Limit.
EDIT: Seems to be a Q6_K issue, F16 does not have the problem, but it's slow as crap. Is there any chance we can get an MXFP4 version of the model?
Further testing shows Q8_0 doesn't have the issue either, just Q6_K (and most likely below) also has the issue.
Once it spits out system, it just repeats it no matter what the [Current Input] is.
Contents of system:
You are a helpful assistant. You must never write for the user again.
You must only write dialog and narration, never explanations or apologies.
If you do not understand a request, you must ask clarifying questions.
Do not write for the user. Only respond as the character described.
Never write: "I'm sorry", "I see that you might be feeling...", "Let me know if this helps".
Only write what the character would say or do in that specific scenario.
Another thing I forgot to mention is that the model still refers to itself as 'Qwen', and has never referred to itself as 'Luna'.
I meet same issues but I set name model in system prompt it work fine . You can use the base model for testing it