cliang1453 commited on
Commit
4bf6e2c
Β·
verified Β·
1 Parent(s): bd3840a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -12
README.md CHANGED
@@ -2,6 +2,8 @@
2
  license: mit
3
  language:
4
  - en
 
 
5
  base_model:
6
  - microsoft/GRIN-MoE
7
  - microsoft/Phi-3.5-MoE-instruct
@@ -9,9 +11,11 @@ pipeline_tag: text-generation
9
  ---
10
  ## Model Summary
11
 
12
- The Phi-mini-MoE is a 7.6B total parameters with 2.4B activated parameters, lightweight, state-of-the-art open Mixture of Expert (MoE) model compressed and distilled from Phi-3.5-MoE using [SlimMoE](http:\\link.to.slimmoe). The training process utilizes Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. The model has underwent a post-training process that incorporates both supervised fine-tuning and direct preference optimization for the instruction following and safety measures. The model belongs to the SlimMoE series, where a smaller version model Phi-tiny-MoE with 3.8B total parameters and 1.1B activated parameters is available.
13
 
14
- πŸ“– [SlimMoE Paper](http:\\link.to.slimmoe) <br>
 
 
15
  πŸ“– [Phi-3 Technical Report](https://arxiv.org/abs/2404.14219) <br>
16
  πŸ“– [GRIN-MoE](https://arxiv.org/abs/2409.12136) <br>
17
 
@@ -19,13 +23,7 @@ The Phi-mini-MoE is a 7.6B total parameters with 2.4B activated parameters, ligh
19
 
20
  ### Primary Use Cases
21
 
22
- The model is intended for commercial and research use in English. The model provides uses for general purpose AI systems and applications which require:
23
-
24
- 1) Memory/compute constrained environments
25
- 2) Latency bound scenarios
26
- 3) Strong reasoning (especially math and logic)
27
-
28
- Our model is designed to accelerate research on language and multimodal models, for use as a building block for generative AI powered features.
29
 
30
  ### Use Case Considerations
31
 
@@ -90,12 +88,12 @@ print(output[0]['generated_text'])
90
 
91
  ## Benchmarks
92
 
93
- To understand the capabilities, we compare Phi-mini-MoE with a set of models over a variety of benchmarks using lm eval.
94
 
95
  | Model | # Total param | # Act. param | MMLU | MMLU pro | BBH | Arc-C (chat) | Human-eval | GSM8K | MT-bench |
96
  |----------------------|---------------|--------------|-------|----------|-------|---------------|-------------|--------|----------|
97
  | **MoE Models** |||||||||||
98
- | Phi 3.5-MoE | 42B | 6.6B | 78.36 | 59.38 | 63.93 | 91.38 | 81.70 | 87.87 | 8.34 |
99
  | Qwen 1.5 MoE | 14B | 2.7B | 60.73 | 26.49 | 42.65 | 67.24 | 46.30 | 53.07 | 6.55 |
100
  | DeepSeek V2 Lite | 16B | 2.4B | 56.69 | 17.89 | 36.30 | 61.09 | 54.40 | 63.23 | 6.82 |
101
  | OL-MoE | 7B | 1.3B | 54.27 | 20.87 | 38.00 | 55.63 | 37.80 | 71.49 | 6.60 |
@@ -110,7 +108,7 @@ To understand the capabilities, we compare Phi-mini-MoE with a set of models ove
110
  | Qwen 2.5 3B | 3B | 3B | 65.06 | 41.00 | 46.61 | 80.20 | 73.80 | 76.57 | 7.60 |
111
  | Gemma 3 1B | 1B | 1B | 40.80 | 14.70 | 34.80 | 37.46 | 41.50 | 41.77 | 6.67 |
112
  | LLaMA 3.2 1B | 1B | 1B | 46.30 | 18.67 | 35.18 | 49.91 | 35.40 | 44.96 | 5.23 |
113
- | **Our (SlimMoE) Models** |||||||||||
114
  | Phi-mini-MoE | 7.6B | 2.4B | 70.68 | 49.68 | 55.27 | 84.91 | 73.80 | 84.89 | 7.59 |
115
  | Phi-tiny-MoE | 3.8B | 1.1B | 60.83 | 36.34 | 45.58 | 76.37 | 58.50 | 78.47 | 7.05 |
116
 
 
2
  license: mit
3
  language:
4
  - en
5
+ context_length:
6
+ - 4k
7
  base_model:
8
  - microsoft/GRIN-MoE
9
  - microsoft/Phi-3.5-MoE-instruct
 
11
  ---
12
  ## Model Summary
13
 
14
+ Phi-mini-MoE is a lightweight Mixture of Experts (MoE) model with 7.6B total parameters and 2.4B activated parameters. It is compressed and distilled from the base model shared by [Phi-3.5-MoE](https://huggingface.co/microsoft/Phi-3.5-MoE-instruct) and [GRIN-MoE](https://huggingface.co/microsoft/GRIN-MoE) using the [SlimMoE](http://link.to.slimmoe) approach, then post-trained via supervised fine-tuning and direct preference optimization for instruction following and safety. The model is trained on Phi-3 synthetic data and filtered public documents, with a focus on high-quality, reasoning-dense content. It is part of the SlimMoE series, which includes a smaller variant, [Phi-tiny-MoE](https://huggingface.co/microsoft/Phi-tiny-MoE-instruct), with 3.8B total and 1.1B activated parameters.
15
 
16
+
17
+ References: <br>
18
+ πŸ“– [SlimMoE](http:\\link.to.slimmoe) <br>
19
  πŸ“– [Phi-3 Technical Report](https://arxiv.org/abs/2404.14219) <br>
20
  πŸ“– [GRIN-MoE](https://arxiv.org/abs/2409.12136) <br>
21
 
 
23
 
24
  ### Primary Use Cases
25
 
26
+ The model is intended for commercial and research use in English. The model provides uses for general purpose AI systems and applications which require memory/compute constrained environments and latency bound scenarios.
 
 
 
 
 
 
27
 
28
  ### Use Case Considerations
29
 
 
88
 
89
  ## Benchmarks
90
 
91
+ To understand the capabilities, we compare Phi-mini-MoE with a set of models over a variety of benchmarks using lm eval. Detailed evaluation settings can be found in the SlimMoE paper.
92
 
93
  | Model | # Total param | # Act. param | MMLU | MMLU pro | BBH | Arc-C (chat) | Human-eval | GSM8K | MT-bench |
94
  |----------------------|---------------|--------------|-------|----------|-------|---------------|-------------|--------|----------|
95
  | **MoE Models** |||||||||||
96
+ | Phi-3.5-MoE | 42B | 6.6B | 78.36 | 59.38 | 63.93 | 91.38 | 81.70 | 87.87 | 8.34 |
97
  | Qwen 1.5 MoE | 14B | 2.7B | 60.73 | 26.49 | 42.65 | 67.24 | 46.30 | 53.07 | 6.55 |
98
  | DeepSeek V2 Lite | 16B | 2.4B | 56.69 | 17.89 | 36.30 | 61.09 | 54.40 | 63.23 | 6.82 |
99
  | OL-MoE | 7B | 1.3B | 54.27 | 20.87 | 38.00 | 55.63 | 37.80 | 71.49 | 6.60 |
 
108
  | Qwen 2.5 3B | 3B | 3B | 65.06 | 41.00 | 46.61 | 80.20 | 73.80 | 76.57 | 7.60 |
109
  | Gemma 3 1B | 1B | 1B | 40.80 | 14.70 | 34.80 | 37.46 | 41.50 | 41.77 | 6.67 |
110
  | LLaMA 3.2 1B | 1B | 1B | 46.30 | 18.67 | 35.18 | 49.91 | 35.40 | 44.96 | 5.23 |
111
+ | **SlimMoE Models** |||||||||||
112
  | Phi-mini-MoE | 7.6B | 2.4B | 70.68 | 49.68 | 55.27 | 84.91 | 73.80 | 84.89 | 7.59 |
113
  | Phi-tiny-MoE | 3.8B | 1.1B | 60.83 | 36.34 | 45.58 | 76.37 | 58.50 | 78.47 | 7.05 |
114