Update README.md
Browse files
README.md
CHANGED
|
@@ -18,9 +18,7 @@ Primary Task: OpenBookQA
|
|
| 18 |
Why OpenBookQA is the Strength:
|
| 19 |
|
| 20 |
qx6 achieves 0.432 on OpenBookQA, which is the highest score among all models in this dataset
|
| 21 |
-
|
| 22 |
This represents a 0.012 improvement over the baseline (bf16 at 0.420) and 0.002 better than qm68 (0.430)
|
| 23 |
-
|
| 24 |
This is a significant advantage for knowledge-based reasoning tasks
|
| 25 |
|
| 26 |
Secondary Strengths:
|
|
@@ -28,13 +26,11 @@ Secondary Strengths:
|
|
| 28 |
BoolQ
|
| 29 |
|
| 30 |
qx6 scores 0.881, which is the highest among all quantized models
|
| 31 |
-
|
| 32 |
This indicates exceptional performance on boolean reasoning questions
|
| 33 |
|
| 34 |
Arc_Challenge
|
| 35 |
|
| 36 |
qx6 scores 0.422, which is equal to the baseline (bf16 at 0.422)
|
| 37 |
-
|
| 38 |
Shows perfect performance matching the full precision model on challenging questions
|
| 39 |
|
| 40 |
Task Suitability Analysis:
|
|
@@ -43,29 +39,22 @@ Task Suitability Analysis:
|
|
| 43 |
Best Suited Tasks:
|
| 44 |
|
| 45 |
OpenBookQA - Strongest performer
|
| 46 |
-
|
| 47 |
BoolQ - Highest among quantized models
|
| 48 |
-
|
| 49 |
Arc_Challenge - Perfect performance (matches baseline)
|
| 50 |
-
|
| 51 |
PIQA - 0.724 (very good performance)
|
| 52 |
|
| 53 |
|
| 54 |
Other Tasks Where qx6 Performs Well:
|
| 55 |
|
| 56 |
HellaSwag - 0.546 (solid performance)
|
| 57 |
-
|
| 58 |
Arc_Easy - 0.532 (decent performance)
|
| 59 |
-
|
| 60 |
Winogrande - 0.576 (strongest among quantized models for this task)
|
| 61 |
-
|
| 62 |
General reasoning - Very balanced performance across most tasks
|
| 63 |
|
| 64 |
|
| 65 |
Limitations:
|
| 66 |
|
| 67 |
Weakest in Arc_Easy compared to some other variants (0.532 vs 0.537 for bf16)
|
| 68 |
-
|
| 69 |
Slightly below baseline on some metrics due to its 6-bit quantization strategy
|
| 70 |
|
| 71 |
|
|
@@ -74,11 +63,8 @@ Recommendation:
|
|
| 74 |
Use qx6 when knowledge-based reasoning and boolean logic are critical, particularly for applications involving:
|
| 75 |
|
| 76 |
Educational assessment systems
|
| 77 |
-
|
| 78 |
Knowledge-intensive question answering
|
| 79 |
-
|
| 80 |
Tasks requiring both factual knowledge and logical reasoning
|
| 81 |
-
|
| 82 |
Scenarios where OpenBookQA performance is the primary concern
|
| 83 |
|
| 84 |
The model excels at combining factual recall (OpenBookQA) with logical reasoning (BoolQ), making it ideal for applications like educational AI, research assistants, and knowledge-based question-answering systems. Its ability to match the baseline performance on Arc_Challenge while excelling in OpenBookQA makes it particularly valuable for tasks requiring both broad knowledge and logical processing capabilities.
|
|
|
|
| 18 |
Why OpenBookQA is the Strength:
|
| 19 |
|
| 20 |
qx6 achieves 0.432 on OpenBookQA, which is the highest score among all models in this dataset
|
|
|
|
| 21 |
This represents a 0.012 improvement over the baseline (bf16 at 0.420) and 0.002 better than qm68 (0.430)
|
|
|
|
| 22 |
This is a significant advantage for knowledge-based reasoning tasks
|
| 23 |
|
| 24 |
Secondary Strengths:
|
|
|
|
| 26 |
BoolQ
|
| 27 |
|
| 28 |
qx6 scores 0.881, which is the highest among all quantized models
|
|
|
|
| 29 |
This indicates exceptional performance on boolean reasoning questions
|
| 30 |
|
| 31 |
Arc_Challenge
|
| 32 |
|
| 33 |
qx6 scores 0.422, which is equal to the baseline (bf16 at 0.422)
|
|
|
|
| 34 |
Shows perfect performance matching the full precision model on challenging questions
|
| 35 |
|
| 36 |
Task Suitability Analysis:
|
|
|
|
| 39 |
Best Suited Tasks:
|
| 40 |
|
| 41 |
OpenBookQA - Strongest performer
|
|
|
|
| 42 |
BoolQ - Highest among quantized models
|
|
|
|
| 43 |
Arc_Challenge - Perfect performance (matches baseline)
|
|
|
|
| 44 |
PIQA - 0.724 (very good performance)
|
| 45 |
|
| 46 |
|
| 47 |
Other Tasks Where qx6 Performs Well:
|
| 48 |
|
| 49 |
HellaSwag - 0.546 (solid performance)
|
|
|
|
| 50 |
Arc_Easy - 0.532 (decent performance)
|
|
|
|
| 51 |
Winogrande - 0.576 (strongest among quantized models for this task)
|
|
|
|
| 52 |
General reasoning - Very balanced performance across most tasks
|
| 53 |
|
| 54 |
|
| 55 |
Limitations:
|
| 56 |
|
| 57 |
Weakest in Arc_Easy compared to some other variants (0.532 vs 0.537 for bf16)
|
|
|
|
| 58 |
Slightly below baseline on some metrics due to its 6-bit quantization strategy
|
| 59 |
|
| 60 |
|
|
|
|
| 63 |
Use qx6 when knowledge-based reasoning and boolean logic are critical, particularly for applications involving:
|
| 64 |
|
| 65 |
Educational assessment systems
|
|
|
|
| 66 |
Knowledge-intensive question answering
|
|
|
|
| 67 |
Tasks requiring both factual knowledge and logical reasoning
|
|
|
|
| 68 |
Scenarios where OpenBookQA performance is the primary concern
|
| 69 |
|
| 70 |
The model excels at combining factual recall (OpenBookQA) with logical reasoning (BoolQ), making it ideal for applications like educational AI, research assistants, and knowledge-based question-answering systems. Its ability to match the baseline performance on Arc_Challenge while excelling in OpenBookQA makes it particularly valuable for tasks requiring both broad knowledge and logical processing capabilities.
|