Breaks once it hits Context Limit

by Koitenshin - opened Sep 6

Sep 6

•

It's a very intelligent model, useful for all sorts of things, and fast too.

But the moment it crosses that Context Limit, it breaks. It doesn't matter if it's set to 8192, 16384, or 32768—once it hits 100%, poof.

Forgot to mention I'm using the Q6_K version, but I really wish I knew why it breaks at Context Limit.

EDIT: Seems to be a Q6_K issue, F16 does not have the problem, but it's slow as crap. Is there any chance we can get an MXFP4 version of the model?

Further testing shows Q8_0 ~~doesn't have the issue either, just Q6_K (and most likely below)~~ also has the issue.

Once it spits out system, it just repeats it no matter what the [Current Input] is.

Contents of system:

You are a helpful assistant. You must never write for the user again.
You must only write dialog and narration, never explanations or apologies.
If you do not understand a request, you must ask clarifying questions.
Do not write for the user. Only respond as the character described.
Never write: "I'm sorry", "I see that you might be feeling...", "Let me know if this helps".
Only write what the character would say or do in that specific scenario.

Koitenshin changed discussion title from Breaks once it hits Context Limit to Q6_K breaks once it hits Context Limit Sep 6

Koitenshin changed discussion title from Q6_K breaks once it hits Context Limit to Breaks once it hits Context Limit Sep 6

Koitenshin

Sep 6

Another thing I forgot to mention is that the model still refers to itself as 'Qwen', and has never referred to itself as 'Luna'.

pnpm12

Sep 19

I meet same issues but I set name model in system prompt it work fine . You can use the base model for testing it

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment