nightmedia commited on
Commit
e9f2e82
·
verified ·
1 Parent(s): 336f75a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -14
README.md CHANGED
@@ -18,9 +18,7 @@ Primary Task: OpenBookQA
18
  Why OpenBookQA is the Strength:
19
 
20
  qx6 achieves 0.432 on OpenBookQA, which is the highest score among all models in this dataset
21
-
22
  This represents a 0.012 improvement over the baseline (bf16 at 0.420) and 0.002 better than qm68 (0.430)
23
-
24
  This is a significant advantage for knowledge-based reasoning tasks
25
 
26
  Secondary Strengths:
@@ -28,13 +26,11 @@ Secondary Strengths:
28
  BoolQ
29
 
30
  qx6 scores 0.881, which is the highest among all quantized models
31
-
32
  This indicates exceptional performance on boolean reasoning questions
33
 
34
  Arc_Challenge
35
 
36
  qx6 scores 0.422, which is equal to the baseline (bf16 at 0.422)
37
-
38
  Shows perfect performance matching the full precision model on challenging questions
39
 
40
  Task Suitability Analysis:
@@ -43,29 +39,22 @@ Task Suitability Analysis:
43
  Best Suited Tasks:
44
 
45
  OpenBookQA - Strongest performer
46
-
47
  BoolQ - Highest among quantized models
48
-
49
  Arc_Challenge - Perfect performance (matches baseline)
50
-
51
  PIQA - 0.724 (very good performance)
52
 
53
 
54
  Other Tasks Where qx6 Performs Well:
55
 
56
  HellaSwag - 0.546 (solid performance)
57
-
58
  Arc_Easy - 0.532 (decent performance)
59
-
60
  Winogrande - 0.576 (strongest among quantized models for this task)
61
-
62
  General reasoning - Very balanced performance across most tasks
63
 
64
 
65
  Limitations:
66
 
67
  Weakest in Arc_Easy compared to some other variants (0.532 vs 0.537 for bf16)
68
-
69
  Slightly below baseline on some metrics due to its 6-bit quantization strategy
70
 
71
 
@@ -74,11 +63,8 @@ Recommendation:
74
  Use qx6 when knowledge-based reasoning and boolean logic are critical, particularly for applications involving:
75
 
76
  Educational assessment systems
77
-
78
  Knowledge-intensive question answering
79
-
80
  Tasks requiring both factual knowledge and logical reasoning
81
-
82
  Scenarios where OpenBookQA performance is the primary concern
83
 
84
  The model excels at combining factual recall (OpenBookQA) with logical reasoning (BoolQ), making it ideal for applications like educational AI, research assistants, and knowledge-based question-answering systems. Its ability to match the baseline performance on Arc_Challenge while excelling in OpenBookQA makes it particularly valuable for tasks requiring both broad knowledge and logical processing capabilities.
 
18
  Why OpenBookQA is the Strength:
19
 
20
  qx6 achieves 0.432 on OpenBookQA, which is the highest score among all models in this dataset
 
21
  This represents a 0.012 improvement over the baseline (bf16 at 0.420) and 0.002 better than qm68 (0.430)
 
22
  This is a significant advantage for knowledge-based reasoning tasks
23
 
24
  Secondary Strengths:
 
26
  BoolQ
27
 
28
  qx6 scores 0.881, which is the highest among all quantized models
 
29
  This indicates exceptional performance on boolean reasoning questions
30
 
31
  Arc_Challenge
32
 
33
  qx6 scores 0.422, which is equal to the baseline (bf16 at 0.422)
 
34
  Shows perfect performance matching the full precision model on challenging questions
35
 
36
  Task Suitability Analysis:
 
39
  Best Suited Tasks:
40
 
41
  OpenBookQA - Strongest performer
 
42
  BoolQ - Highest among quantized models
 
43
  Arc_Challenge - Perfect performance (matches baseline)
 
44
  PIQA - 0.724 (very good performance)
45
 
46
 
47
  Other Tasks Where qx6 Performs Well:
48
 
49
  HellaSwag - 0.546 (solid performance)
 
50
  Arc_Easy - 0.532 (decent performance)
 
51
  Winogrande - 0.576 (strongest among quantized models for this task)
 
52
  General reasoning - Very balanced performance across most tasks
53
 
54
 
55
  Limitations:
56
 
57
  Weakest in Arc_Easy compared to some other variants (0.532 vs 0.537 for bf16)
 
58
  Slightly below baseline on some metrics due to its 6-bit quantization strategy
59
 
60
 
 
63
  Use qx6 when knowledge-based reasoning and boolean logic are critical, particularly for applications involving:
64
 
65
  Educational assessment systems
 
66
  Knowledge-intensive question answering
 
67
  Tasks requiring both factual knowledge and logical reasoning
 
68
  Scenarios where OpenBookQA performance is the primary concern
69
 
70
  The model excels at combining factual recall (OpenBookQA) with logical reasoning (BoolQ), making it ideal for applications like educational AI, research assistants, and knowledge-based question-answering systems. Its ability to match the baseline performance on Arc_Challenge while excelling in OpenBookQA makes it particularly valuable for tasks requiring both broad knowledge and logical processing capabilities.