Frank
commited on
Update README.md
Browse files
README.md
CHANGED
|
@@ -92,11 +92,17 @@ DMind-1 exhibits advanced web3-aligned reasoning and interactive capabilities in
|
|
| 92 |
|
| 93 |
## 2. Evaluation Results
|
| 94 |
|
| 95 |
-

|
| 96 |
|
| 97 |
+
We evaluate DMind-1 and DMind-1-mini using the [DMind Benchmark](https://huggingface.co/datasets/DMindAI/DMind_Benchmark), a domain-specific evaluation suite designed to assess large language models in the Web3 context. The benchmark includes 1,917 expert-reviewed questions across nine core domain categories, and it features both multiple-choice and open-ended tasks to measure factual knowledge, contextual reasoning, and other abilities.
|
| 98 |
|
| 99 |
+
To complement accuracy metrics, we conducted a **cost-performance analysis** by comparing benchmark scores against publicly available input token prices across 24 leading LLMs. In this evaluation:
|
| 100 |
+
|
| 101 |
+
- **DMind-1** achieved the highest Web3 score while maintaining one of the lowest token input costs among top-tier models such as Grok 3 and Claude 3.5 Sonnet.
|
| 102 |
+
|
| 103 |
+
- **DMind-1-mini** ranked second, retaining over 95% of DMind-1’s performance with greater efficiency in latency and compute.
|
| 104 |
+
|
| 105 |
+
Both models are uniquely positioned in the most favorable region of the score vs. price curve, delivering state-of-the-art Web3 reasoning at significantly lower cost. This balance of quality and efficiency makes the DMind models highly competitive for both research and production use.
|
| 106 |
|
| 107 |
|
| 108 |
## 3. Use Cases
|