--- license: apache-2.0 language: - en - ko base_model: - Motif-Technologies/Motif-2-12.7B-Base tags: - text-generation-inference - conversational - custom_code - text-generation - Motif --- Last update: 1 Nov. 2025 # Introduction We are pleased to announce **Motif-2-12.7B-Instruct**, a 12.7-billion-parameter language model. This model is an **supervised fine-tuning (SFT)** variant of our base model: https://huggingface.co/Motif-Technologies/Motif-2-12.7B-Base. Detailed information including technical report will be released later. # Evaluation *The results of Qwen3 and Gemma 3 are sourced directly from their technical reports.* |Benchmark|Evaluation setting|Motif-2-12.7B|Qwen2.5-72B|Qwen3-14B|Qwen3-14B|Qwen3-32B|Qwen3-32B|Qwen3-30B-A3B|Qwen3-30B-A3B|Gemma-3-12B|Gemma-3-27B| |---|---|---|---|---|---|---|---|---|---|---|---| |||Instruct|Instruct|Non-thinking|Thinking|Non-thinking|Thinking|Non-thinking|Thinking|Instruct|Instruct| |MMLU|0-shot|86.11|-|-|-|-|-|-|-|71.9|76.9| |MMLU-Redux|-|90.02|86.8|82|88.6|85.7|90.9|84.1|89.5|-|-| |BBH|0-shot|85.78|-|-|-|-|-|-|-|85.7|87.6| |GPQA-Diamond|0-shot, CoT|63.6|49|54.8|64|54.6|68.4|54.8|65.8|40.9|42.4| |GSM8K|0-shot, CoT|96.13|-|-|-|-|-|-|-|94.4|95.9| |MATH|0-shot|97|-|-|-|-|-|-|-|83.8|89| |MBPP|3-shot|91|-|-|-|-|-|-|-|73|74.4| |LiveBench 2024-11-25|-|33.8|51.4|59.6|71.3|59.8|74.9|59.4|74.3|-|-| |IFEval|strict prompt|75.78|84.1|84.8|85.4|83.2|85|83.7|86.5|-|-| |IFEval|0-shot|76.52|-|-|-|-|-|-|-|88.9|90.4| |MATH-500|-|96.8|83.6|90|96.8|88.6|97.2|89.8|98|-|-| |AIME24|-|72.3|18.9|31.7|79.3|31|81.4|32.8|80.4|-|-| |AIME25|-|63.6|15|23.3|70.4|20.2|72.9|21.6|70.9|-|-| |ZebraLogic|-|69.5|26.6|33|88.5|29.2|88.8|33.2|89.5|-|-| |BFCL v3|-|55.34|63.4|61.5|70.4|63|70.3|58.6|69.1|-|-| |LiveCodeBench v5|-|50.03|30.7|29|63.5|31.3|65.7|29.8|62.6|-|-| |LiveCodeBench|0-shot, CoT|61.66|-|-|-|-|-|-|-|32|39| |HumanEval|0-shot|93.2|-|-|-|-|-|-|-|85.4|87.8| ## Averages and improvements of the corresponding benchmark scores: ### v.s. Gemma 3 ||Motif-2-12.7B|Gemma-3-12B|Gemma-3-27B| |---|---|---|---| ||Instruct|Instruct|Instruct| |Average|83.44|72.89|75.93| |Improvement||+14.48%|+9.89%| ### v.s. Qwen3 ||Motif-2-12.7B|Qwen2.5-72B|Qwen3-14B|Qwen3-14B|Qwen3-32B|Qwen3-32B|Qwen3-30B-A3B|Qwen3-30B-A3B| |---|---|---|---|---|---|---|---|---| ||Instruct|Instruct|Non-thinking|Thinking|Non-thinking|Thinking|Non-thinking|Thinking| |Average|67.08|50.95|54.97|77.82|54.66|79.55|54.78|78.66| |Improvement||+31.65%|+22.02%|-13.80%|+22.72%|-15.68%|+22.45%|-14.73%| ## How to use in transformers To use this model, install huggingface [kernels](https://github.com/huggingface/kernels). ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model = AutoModelForCausalLM.from_pretrained( "Motif-Technologies/Motif-2-12.7B-Instruct", trust_remote_code = True, _attn_implementation = "flash_attention_2", dtype = torch.bfloat16 # currently supports bf16 only, for efficiency ).cuda() tokenizer = AutoTokenizer.from_pretrained( "Motif-Technologies/Motif-2-12.7B-Instruct", trust_remote_code = True, ) query = "What is the capital city of South Korea?" input_ids = tokenizer.apply_chat_template( [ {'role': 'system', 'content': 'you are an helpful assistant'}, {'role': 'user', 'content': query}, ], add_generation_prompt = True, enable_thinking = False, # or True return_tensors='pt', ).cuda() output = model.generate(input_ids, max_new_tokens=1024, pad_token_id=tokenizer.eos_token_id) output = tokenizer.decode(output[0, input_ids.shape[-1]:], skip_special_tokens = False) print(output) ``` ### outputs ``` # with enable_thinking=True, the model is FORCED to think. Okay, the user is asking for the capital city of South Korea. Let me think. I know that South Korea's capital is Seoul. But wait, I should double-check to make sure I'm not mixing it up with other countries. For example, North Korea's capital is Pyongyang. So yes, South Korea's capital is definitely Seoul. I should just provide that as the answer. The capital city of South Korea is **Seoul**. <|endofturn|><|endoftext|> # with enable_thinking=False, the model chooses to think or not. in this example, thinking is not worth it. The capital city of South Korea is Seoul. <|endofturn|><|endoftext|> ``` ## How to use in vllm The [PR](https://github.com/vllm-project/vllm/pull/27396) adding support for the Motif model in the official vLLM package is currently under review. In the meantime, to use our model with vLLM, please use the following container [image](https://github.com/motiftechnologies/vllm/pkgs/container/vllm). Our model supports a sequence length of up to 32K tokens. ```bash # run vllm api server VLLM_ATTENTION_BACKEND="DIFFERENTIAL_FLASH_ATTN" vllm serve Motif-Technologies/Motif-2-12.7B-Instruct --trust-remote-code --data-parallel-size # sending requests with curl curl http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the capital city of South Korea?"} ], "temperature": 0.6, "skip_special_tokens": false, "chat_template_kwargs": { "enable_thinking": true } }' ```