# GPT4 x Alpaca As a base model we used: alpaca-13b Finetuned on GPT4's responses, for 3 epochs. # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) | Metric | Value | |-----------------------|---------------------------| | Avg. | 46.78 | | ARC (25-shot) | 52.82 | | HellaSwag (10-shot) | 79.59 | | MMLU (5-shot) | 48.19 | | TruthfulQA (0-shot) | 48.88 | | Winogrande (5-shot) | 70.17 | | GSM8K (5-shot) | 2.81 | | DROP (3-shot) | 24.99 |