YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Quantization made by Richard Erkhov.
mistral-orpo-beta - AWQ
- Model creator: https://huggingface.co/kaist-ai/
- Original model: https://huggingface.co/kaist-ai/mistral-orpo-beta/
Original model description:
language:
en license: mit base_model:
mistralai/Mistral-7B-v0.1 datasets:
argilla/ultrafeedback-binarized-preferences-cleaned pipeline_tag: text-generation model-index:
name: Mistral-ORPO-Ξ² results:
AI2 Reasoning Challenge (25-Shot)
- task:
type: text-generation
name: Text Generation
dataset:
name: AI2 Reasoning Challenge (25-Shot)
type: ai2_arc
config: ARC-Challenge
split: test
args:
num_few_shot: 25
metrics:
- type: acc_norm name: normalized accuracy value: 61.18 source: name: Open LLM Leaderboard url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kaist-ai%2Fmistral-orpo-beta
HellaSwag (10-shot)
- task:
type: text-generation
name: Text Generation
dataset:
name: HellaSwag (10-Shot)
type: hellaswag
split: validation
args:
num_few_shot: 10
metrics:
- type: acc_norm name: normalized accuracy value: 84.03 source: name: Open LLM Leaderboard url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kaist-ai%2Fmistral-orpo-beta
TruthfulQA (0-shot)
- task:
type: text-generation
name: Text Generation
dataset:
name: TruthfulQA (0-shot)
type: truthful_qa
config: multiple_choice
split: validation
args:
num_few_shot: 0
metrics:
- type: mc2 value: 47.69 source: name: Open LLM Leaderboard url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kaist-ai%2Fmistral-orpo-beta
GSM8k (5-shot)
- task:
type: text-generation
name: Text Generation
dataset:
name: GSM8k (5-shot)
type: gsm8k
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc name: accuracy value: 39.8 source: name: Open LLM Leaderboard url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kaist-ai%2Fmistral-orpo-beta
MMLU (5-Shot)
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU (5-Shot)
type: cais/mmlu
config: all
split: test
args:
num_few_shot: 5
metrics:
- type: acc name: accuracy value: 63.26 source: name: Open LLM Leaderboard url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kaist-ai%2Fmistral-orpo-beta
Winogrande (5-shot)
- task:
type: text-generation
name: Text Generation
dataset:
name: Winogrande (5-shot)
type: winogrande
config: winogrande_xl
split: validation
args:
num_few_shot: 5
metrics:
- type: acc name: accuracy value: 79.24 source: name: Open LLM Leaderboard url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kaist-ai%2Fmistral-orpo-beta
- task:
type: text-generation
dataset:
name: AlpacaEval 1
type: AlpacaEval
metrics:
- type: AlpacaEval 1.0 value: 91.16% name: Win Rate source: url: https://tatsu-lab.github.io/alpaca_eval/ name: Leaderboard
- task:
type: text-generation
dataset:
name: AlpacaEval 2
type: AlpacaEval
metrics:
- type: AlpacaEval 2.0 value: 12.57% name: Win Rate source: url: https://tatsu-lab.github.io/alpaca_eval/ name: Leaderboard
- task:
type: text-generation
dataset:
name: MT-Bench
type: MT-Bench
metrics:
- type: MT-Bench value: 7.322 name: Score source: url: https://github.com/lm-sys/FastChat/blob/main/fastchat/llm_judge/ name: self-reported
- task:
type: text-generation
name: Text Generation
dataset:
name: AI2 Reasoning Challenge (25-Shot)
type: ai2_arc
config: ARC-Challenge
split: test
args:
num_few_shot: 25
metrics:
Mistral-ORPO-Ξ² (7B)
Mistral-ORPO is a fine-tuned version of mistralai/Mistral-7B-v0.1 using the odds ratio preference optimization (ORPO). With ORPO, the model directly learns the preference without the supervised fine-tuning warmup phase. Mistral-ORPO-Ξ² is fine-tuned exclusively on the 61k instances of the cleaned version of UltraFeedback, argilla/ultrafeedback-binarized-preferences-cleaned, by Argilla.
- Github Repository: https://github.com/xfactlab/orpo
π Model Performance
1) AlpacaEval & MT-Bench
| Model Name | Size | Align | MT-Bench | AlpacaEval 1.0 | AlpacaEval 2.0 |
|---|---|---|---|---|---|
| Mistral-ORPO-βΊ | 7B | ORPO | 7.23 | 87.92 | 11.33 |
| Mistral-ORPO-Ξ² | 7B | ORPO | 7.32 | 91.41 | 12.20 |
| Zephyr Ξ² | 7B | DPO | 7.34 | 90.60 | 10.99 |
| TULU-2-DPO | 13B | DPO | 7.00 | 89.5 | 10.12 |
| Llama-2-Chat | 7B | RLHF | 6.27 | 71.37 | 4.96 |
| Llama-2-Chat | 13B | RLHF | 6.65 | 81.09 | 7.70 |
2) IFEval
| Model Type | Prompt-Strict | Prompt-Loose | Inst-Strict | Inst-Loose |
|---|---|---|---|---|
| Mistral-ORPO-βΊ | 0.5009 | 0.5083 | 0.5995 | 0.6163 |
| Mistral-ORPO-Ξ² | 0.5287 | 0.5564 | 0.6355 | 0.6619 |
πΊοΈ MT-Bench by Category
π₯οΈ Inference
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("kaist-ai/mistral-orpo-beta")
tokenizer = AutoTokenizer.from_pretrained("kaist-ai/mistral-orpo-beta")
# Apply chat template
query = [{'role': 'user', 'content': 'Hi! How are you doing?'}]
prompt = tokenizer.apply_chat_template(query, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors='pt')
# Generation with specific configurations
output = model.generate(
**inputs,
max_new_tokens=128,
do_sample=True,
temperature=0.7
)
response = tokenizer.batch_decode(output)
#<|user|>
#Hi! How are you doing?</s>
#<|assistant|>
#I'm doing well, thank you! How are you?</s>
π Citation
@misc{hong2024orpo,
title={ORPO: Monolithic Preference Optimization without Reference Model},
author={Jiwoo Hong and Noah Lee and James Thorne},
year={2024},
eprint={2403.07691},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 4
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
