mistralai/Magistral-Small-2507 · added some newlines in SYSTEM

owao

Jul 26

No description provided.

added some newlines in SYSTEM_PROMPT.txt7456bebb

patrickvonplaten

Mistral AI_ org 3 days ago

Nice thanks

patrickvonplaten changed pull request status to merged 3 days ago

juliendenize

Mistral AI_ org 3 days ago

I reverted this one as it was intended, the "below" word is a bit misleading but the SP was correct.

owao

3 days ago

•

edited 3 days ago

Sorry to annoy you again, I think enough time has already been wasted on this, but I don't understand why it's considered as correct without the new line? (I agree putting \n\n was arbitrary from me and would totally understand a single \n was sufficient. But I don't get why none is better than one.
Thanks by advance if you take the time to explain to me.
Also, one last thing: @juliendenize I removed my comment about your vibe repo, as I wrote it in the time frame I was angry about the story and think it wasn't really deserved. This one was really pure rage without bringing any value... Sorry for it...

juliendenize

Mistral AI_ org 3 days ago

No worries we like having feedback and you were right to be disapointed that we didn't take a look to all your PRs that are valuable. Your tone was a bit harsh at some moment but glad to see you also can acknowledge efforts which is not something everyone does !

We'll try to do better to see the feedback/contributions from the community 😊

Now regarding this system prompt , if you take a look at the vLLM code snippet:

def load_system_prompt(repo_id: str, filename: str) -> dict[str, Any]:
    file_path = hf_hub_download(repo_id=repo_id, filename=filename)
    with open(file_path, "r") as file:
        system_prompt = file.read()

    index_begin_think = system_prompt.find("[THINK]")
    index_end_think = system_prompt.find("[/THINK]")

    return {
        "role": "system",
        "content": [
            {"type": "text", "text": system_prompt[:index_begin_think]},
            {
                "type": "thinking",
                "thinking": system_prompt[
                    index_begin_think + len("[THINK]") : index_end_think
                ],
                "closed": True,
            },
            {
                "type": "text",
                "text": system_prompt[index_end_think + len("[/THINK]") :],
            },
        ],
    }

you can see that the THINK tokens are not encoded but inside chunks. Thus it isn't needed to have \n\n because this chunk adds special tokens.

That being said:

adding \n\n probably won't have a wrong impact on the model
"below" is misleading for a user and for next iteration we won't probably do it like this.
@/patrickvonplaten raised a good point that we should have released a json instead of a txt file so that it makes it easier to understand what the system prompt is.

owao

3 days ago

•

edited 3 days ago

I really don't understand... But I learnt many things!
First, I guess this discussion should actually have taken place in one of your Ministral repo instead, because I couldn't find a chat_template.jinja file, nor a tokenizer_config.json in this Magistral repo. I was curious about the tekken.json file but I found it was totally unrelated.
So, I tried to follow what would be the flow for a Ministral model instead (mistralai/Ministral-3-14B-Reasoning-2512).

Here is what I understand (I even end up submitting the thing to gemini in despair, but no success either to understand either because it found exactly the same):

First parsing of SYSTEM_PROMPT.txt through your proposed python snippets gives this:

{
  "role": "system",
  "content": [
    {
      "type": "text",
      "text": "First draft your thinking process (inner monologue) until you arrive at a response. Format your response using Markdown, and use LaTeX for any mathematical equations. Write both your thoughts and the response in the same language as the input.\n\nYour thinking process must follow the template below:"
    },
    {
      "type": "thinking",
      "thinking": "Your thoughts or/and draft, like working through an exercise on scratch paper. Be as casual and as long as you want until you are confident to generate the response. Use the same language as the input.",
      "closed": true
    },
    {
      "type": "text",
      "text": "Here, provide a self-contained response."
    }
  ]
}

Then, this payload, is parsed by the tokenizer library relying on the chat tempalte found in tokenizer_config.json, which I assume is the same we find in chat_template.jinja which is just "beautified". The relevant part of chat_template.jinja is this:

{%- if messages[0]['role'] == 'system' %}
    {{- '[SYSTEM_PROMPT]' -}} 
    {%- if messages[0]['content'] is string %}
        {{- messages[0]['content'] -}}
    {%- else %}
        {%- for block in messages[0]['content'] %}
            {%- if block['type'] == 'text' %}
                {{- block['text'] }}
            {%- elif block['type'] == 'thinking' %}
                {{- '[THINK]' + block['thinking'] + '[/THINK]' }}
            {%- else %}
                {{- raise_exception('Only text and thinking chunks are supported in system message contents.') }}
            {%- endif %}
        {%- endfor %}
    {%- endif %}
    {{- '[/SYSTEM_PROMPT]' -}}
    {%- set loop_messages = messages[1:] %}
{%- else %}
    {%- set loop_messages = messages %}
    {%- if default_system_message != '' %}
        {{- '[SYSTEM_PROMPT]' + default_system_message + '[/SYSTEM_PROMPT]' }}
    {%- endif %}
{%- endif %}

Here:

if content is just a flat text type, no parsing needed
but if content is an array, we construct it by appending each content blocks to get the final flatten plain text

So if we take back our payload: we have 3 blocks, text + thinking + text
so it gives: text1_content+[THINK]thinking_content[/THINK]+text2_content

So as I interpret, we end up with:

First draft your thinking process (inner monologue) until you arrive at a response. Format your response using Markdown, and use LaTeX for any mathematical equations. Write both your thoughts and the response in the same language as the input.\n\nYour thinking process must follow the template below:[THINK]Your thoughts or/and draft, like working through an exercise on scratch paper. Be as casual and as long as you want until you are confident to generate the response. Use the same language as the input.[/THINK]Here, provide a self-contained response.

And so I'm back at the starting point :D it's like the system prompt was never touched and passed as this string in a single block.

It's the first time I really tried to understand the whole construction logic, so I just assumed I wasn't getting it at all... And then gemini-3 flash and pro ended up with the same conclusion, so I closed it and thought stupid LLMs were definitely still struggling when dealing with templates cause I often had no success in the past!
At this point I still thought "I might have prompted them badly"!

But the thing is I have no more options to try to understand by myself. And Im lost... It always end up like this xD

owao

3 days ago

•

edited 3 days ago

To finish: when I try to use https://huggingface.co/spaces/Xenova/jinja-playground with this template https://huggingface.co/mistralai/Ministral-3-14B-Reasoning-2512/blob/main/chat_template.jinja
If I use this json input:

{
  "messages": [
    {
      "role": "system",
      "content": [
        {
          "type": "text",
          "text": "First draft your thinking process (inner monologue) until you arrive at a response. Format your response using Markdown, and use LaTeX for any mathematical equations. Write both your thoughts and the response in the same language as the input.\n\nYour thinking process must follow the template below:"
        },
        {
          "type": "thinking",
          "thinking": "Your thoughts or/and draft, like working through an exercise on scratch paper. Be as casual and as long as you want until you are confident to generate the response. Use the same language as the input.",
          "closed": true
        },
        {
          "type": "text",
          "text": "Here, provide a self-contained response."
        }
      ]
    },{
      "role": "user",
      "content": "Hello, how are you?"
    }
  ],
  "add_generation_prompt": true,
  "bos_token": "<|im_start|>",
  "eos_token": "<|im_end|>",
  "pad_token": "<|im_end|>"
}

The constructed output it returns is once again

[SYSTEM_PROMPT]First draft your thinking process (inner monologue) until you arrive at a response. Format your response using Markdown, and use LaTeX for any mathematical equations. Write both your thoughts and the response in the same language as the input.

Your thinking process must follow the template below:[THINK]Your thoughts or/and draft, like working through an exercise on scratch paper. Be as casual and as long as you want until you are confident to generate the response. Use the same language as the input.[/THINK]Here, provide a self-contained response.[/SYSTEM_PROMPT][INST]Hello, how are you?[/INST]

I don't know what I'm missing

juliendenize

Mistral AI_ org 2 days ago

Yeah i get the confusion when using a Mistral model you have up to two different tokenizers:

Mistral-common, which expects that you only give messages through apply_chat_template and will be able to parse them without a jinja template
Transformers: it expects the messages AND a jinja template.
The results of Transformers apply_chat_template ends up reconstructing the SYSTEM_PROMPT.txt which is expected. The tokenizer will then convert the special tokens correctly.

The think chunks are interesting for Transformers not only for the System prompt but also when you want to do multi-turn with assistant messages to make sure Think special tokens are passed to the next turns. However our Ministraux don't have best performance for now for multi-turn.