# GPT4 x Alpaca

As a base model we used: alpaca-13b

Finetuned on GPT4's responses, for 3 epochs.

# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)

| Metric                | Value                     |
|-----------------------|---------------------------|
| Avg.                  | 46.78   |
| ARC (25-shot)         | 52.82          |
| HellaSwag (10-shot)   | 79.59    |
| MMLU (5-shot)         | 48.19         |
| TruthfulQA (0-shot)   | 48.88   |
| Winogrande (5-shot)   | 70.17   |
| GSM8K (5-shot)        | 2.81        |
| DROP (3-shot)         | 24.99         |