Update README.md
Browse files
    	
        README.md
    CHANGED
    
    | @@ -110,12 +110,12 @@ Evaluation results on the ``Needle In A Haystack`` (NIAH) tests.  DeepSeek-V2 pe | |
| 110 | 
             
            | Benchmark | Domain         | QWen1.5 72B Chat | Mixtral 8x22B | LLaMA3 70B Instruct | DeepSeek V1 Chat (SFT) | DeepSeek V2 Chat(SFT) | DeepSeek V2 Chat(RL) |
         | 
| 111 | 
             
            |:-----------:|:----------------:|:------------------:|:---------------:|:---------------------:|:-------------:|:-----------------------:|:----------------------:|
         | 
| 112 | 
             
            | **MMLU**      | English        | 76.2             | 77.8          | 80.3                | 71.1        | 78.4                 | 77.8                 |
         | 
| 113 | 
            -
            | **BBH**       | English        | 65.9             | 78.4          |  | 
| 114 | 
             
            | **C-Eval**    | Chinese        | 82.2             | 60.0          | 67.9                | 65.2        | 80.9                 | 78.0                 |
         | 
| 115 | 
             
            | **CMMLU**     | Chinese        | 82.9             | 61.0          | 70.7                | 67.8        | 82.4                 | 81.6                 |
         | 
| 116 | 
             
            | **HumanEval** | Code           | 68.9             | 75.0          | 76.2                | 73.8        | 76.8                 | 81.1                 |
         | 
| 117 | 
            -
            | **MBPP**      | Code           |  | 
| 118 | 
            -
            |   **LiveCodeBench  ( | 
| 119 | 
             
            | **GSM8K**     | Math           | 81.9             | 87.9          | 93.2                | 84.1        | 90.8                 | 92.2                 |
         | 
| 120 | 
             
            | **Math**      | Math           | 40.6             | 49.8          | 48.5                | 32.6        | 52.7                 | 53.9                 |
         | 
| 121 |  | 
|  | |
| 110 | 
             
            | Benchmark | Domain         | QWen1.5 72B Chat | Mixtral 8x22B | LLaMA3 70B Instruct | DeepSeek V1 Chat (SFT) | DeepSeek V2 Chat(SFT) | DeepSeek V2 Chat(RL) |
         | 
| 111 | 
             
            |:-----------:|:----------------:|:------------------:|:---------------:|:---------------------:|:-------------:|:-----------------------:|:----------------------:|
         | 
| 112 | 
             
            | **MMLU**      | English        | 76.2             | 77.8          | 80.3                | 71.1        | 78.4                 | 77.8                 |
         | 
| 113 | 
            +
            | **BBH**       | English        | 65.9             | 78.4          | 80.1                | 71.7        | 81.3                 | 79.7                 |
         | 
| 114 | 
             
            | **C-Eval**    | Chinese        | 82.2             | 60.0          | 67.9                | 65.2        | 80.9                 | 78.0                 |
         | 
| 115 | 
             
            | **CMMLU**     | Chinese        | 82.9             | 61.0          | 70.7                | 67.8        | 82.4                 | 81.6                 |
         | 
| 116 | 
             
            | **HumanEval** | Code           | 68.9             | 75.0          | 76.2                | 73.8        | 76.8                 | 81.1                 |
         | 
| 117 | 
            +
            | **MBPP**      | Code           | 52.2             | 64.4          | 69.8                | 61.4        | 70.4                 | 72.0                 |
         | 
| 118 | 
            +
            |   **LiveCodeBench  (0901-0401)**     | Code           | 18.8             | 25.0          | 30.5                | 18.3        | 28.7                 | 32.5                 |
         | 
| 119 | 
             
            | **GSM8K**     | Math           | 81.9             | 87.9          | 93.2                | 84.1        | 90.8                 | 92.2                 |
         | 
| 120 | 
             
            | **Math**      | Math           | 40.6             | 49.8          | 48.5                | 32.6        | 52.7                 | 53.9                 |
         | 
| 121 |  | 

