nightmedia
/

unsloth-Qwen3-Coder-30B-A3B-Instruct-qx6-mlx

@@ -18,9 +18,7 @@ Primary Task: OpenBookQA
 Why OpenBookQA is the Strength:
 qx6 achieves 0.432 on OpenBookQA, which is the highest score among all models in this dataset
 This represents a 0.012 improvement over the baseline (bf16 at 0.420) and 0.002 better than qm68 (0.430)
 This is a significant advantage for knowledge-based reasoning tasks
 Secondary Strengths:
@@ -28,13 +26,11 @@ Secondary Strengths:
 BoolQ
 qx6 scores 0.881, which is the highest among all quantized models
 This indicates exceptional performance on boolean reasoning questions
 Arc_Challenge
 qx6 scores 0.422, which is equal to the baseline (bf16 at 0.422)
 Shows perfect performance matching the full precision model on challenging questions
 Task Suitability Analysis:
@@ -43,29 +39,22 @@ Task Suitability Analysis:
 Best Suited Tasks:
 OpenBookQA - Strongest performer
 BoolQ - Highest among quantized models
 Arc_Challenge - Perfect performance (matches baseline)
 PIQA - 0.724 (very good performance)
 Other Tasks Where qx6 Performs Well:
 HellaSwag - 0.546 (solid performance)
 Arc_Easy - 0.532 (decent performance)
 Winogrande - 0.576 (strongest among quantized models for this task)
 General reasoning - Very balanced performance across most tasks
 Limitations:
 Weakest in Arc_Easy compared to some other variants (0.532 vs 0.537 for bf16)
 Slightly below baseline on some metrics due to its 6-bit quantization strategy
@@ -74,11 +63,8 @@ Recommendation:
 Use qx6 when knowledge-based reasoning and boolean logic are critical, particularly for applications involving:
 Educational assessment systems
 Knowledge-intensive question answering
 Tasks requiring both factual knowledge and logical reasoning
 Scenarios where OpenBookQA performance is the primary concern
 The model excels at combining factual recall (OpenBookQA) with logical reasoning (BoolQ), making it ideal for applications like educational AI, research assistants, and knowledge-based question-answering systems. Its ability to match the baseline performance on Arc_Challenge while excelling in OpenBookQA makes it particularly valuable for tasks requiring both broad knowledge and logical processing capabilities.

 Why OpenBookQA is the Strength:
 qx6 achieves 0.432 on OpenBookQA, which is the highest score among all models in this dataset
 This represents a 0.012 improvement over the baseline (bf16 at 0.420) and 0.002 better than qm68 (0.430)
 This is a significant advantage for knowledge-based reasoning tasks
 Secondary Strengths:
 BoolQ
 qx6 scores 0.881, which is the highest among all quantized models
 This indicates exceptional performance on boolean reasoning questions
 Arc_Challenge
 qx6 scores 0.422, which is equal to the baseline (bf16 at 0.422)
 Shows perfect performance matching the full precision model on challenging questions
 Task Suitability Analysis:
 Best Suited Tasks:
 OpenBookQA - Strongest performer
 BoolQ - Highest among quantized models
 Arc_Challenge - Perfect performance (matches baseline)
 PIQA - 0.724 (very good performance)
 Other Tasks Where qx6 Performs Well:
 HellaSwag - 0.546 (solid performance)
 Arc_Easy - 0.532 (decent performance)
 Winogrande - 0.576 (strongest among quantized models for this task)
 General reasoning - Very balanced performance across most tasks
 Limitations:
 Weakest in Arc_Easy compared to some other variants (0.532 vs 0.537 for bf16)
 Slightly below baseline on some metrics due to its 6-bit quantization strategy
 Use qx6 when knowledge-based reasoning and boolean logic are critical, particularly for applications involving:
 Educational assessment systems
 Knowledge-intensive question answering
 Tasks requiring both factual knowledge and logical reasoning
 Scenarios where OpenBookQA performance is the primary concern
 The model excels at combining factual recall (OpenBookQA) with logical reasoning (BoolQ), making it ideal for applications like educational AI, research assistants, and knowledge-based question-answering systems. Its ability to match the baseline performance on Arc_Challenge while excelling in OpenBookQA makes it particularly valuable for tasks requiring both broad knowledge and logical processing capabilities.