hanhainebula commited on
Commit
4b39f29
·
verified ·
1 Parent(s): d50d230

Upload folder using huggingface_hub

Browse files
Files changed (28) hide show
  1. README.md +2 -21
  2. imgs/bright-performance.png +2 -2
  3. search_results/examples/EVAL/eval_results.json +134 -74
  4. search_results/examples/aops-examples.json +1 -1
  5. search_results/examples/biology-examples.json +2 -2
  6. search_results/examples/earth_science-examples.json +2 -2
  7. search_results/examples/economics-examples.json +2 -2
  8. search_results/examples/leetcode-examples.json +2 -2
  9. search_results/examples/pony-examples.json +2 -2
  10. search_results/examples/psychology-examples.json +2 -2
  11. search_results/examples/robotics-examples.json +2 -2
  12. search_results/examples/stackoverflow-examples.json +2 -2
  13. search_results/examples/sustainable_living-examples.json +2 -2
  14. search_results/examples/theoremqa_questions-examples.json +2 -2
  15. search_results/examples/theoremqa_theorems-examples.json +1 -1
  16. search_results/gpt4_reason/EVAL/eval_results.json +130 -70
  17. search_results/gpt4_reason/aops-gpt4_reason.json +1 -1
  18. search_results/gpt4_reason/biology-gpt4_reason.json +2 -2
  19. search_results/gpt4_reason/earth_science-gpt4_reason.json +2 -2
  20. search_results/gpt4_reason/economics-gpt4_reason.json +2 -2
  21. search_results/gpt4_reason/leetcode-gpt4_reason.json +2 -2
  22. search_results/gpt4_reason/pony-gpt4_reason.json +2 -2
  23. search_results/gpt4_reason/psychology-gpt4_reason.json +2 -2
  24. search_results/gpt4_reason/robotics-gpt4_reason.json +2 -2
  25. search_results/gpt4_reason/stackoverflow-gpt4_reason.json +2 -2
  26. search_results/gpt4_reason/sustainable_living-gpt4_reason.json +2 -2
  27. search_results/gpt4_reason/theoremqa_questions-gpt4_reason.json +2 -2
  28. search_results/gpt4_reason/theoremqa_theorems-gpt4_reason.json +1 -1
README.md CHANGED
@@ -13,7 +13,7 @@ license: apache-2.0
13
 
14
  For more details please refer to our Github: [BGE-Reasoner](https://github.com/FlagOpen/FlagEmbedding/tree/master/research/BGE_Reasoner).
15
 
16
- **BGE-Reasoner-Embed-Qwen3-8B-0923** is an embedding model trained for reasoning-intensive retrieval tasks, based on [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B). It achieves an nDCG@10 of 37.2 on the [BRIGHT](https://brightbenchmark.github.io/) benchmark with original query, demonstrating its strong capability in reasoning-intensive retrieval tasks.
17
 
18
  The search results on BRIGHT are available [here](https://huggingface.co/BAAI/bge-reasoner-embed-qwen3-8b-0923/tree/main/search_results).
19
 
@@ -130,29 +130,10 @@ print(scores.cpu().tolist())
130
 
131
  ## Evaluation
132
 
133
- BGE-Reasoner-Embed-Qwen3-8B-0923 exhibits strong performance in reasoning-intensive retrieval tasks, as demonstrated by its results (nDCG@10 = 37.2 using original query) on the BRIGHT benchmark.
134
 
135
  <img src="./imgs/bright-performance.png" alt="BRIGHT Performance" style="zoom:200%;" />
136
 
137
- Note:
138
- - "**Avg - ALL**" refers to the average performance across **all 12 datasets** in the BRIGHT benchmark.
139
- - "**Avg - SE**" refers to the average performance across the **7 datasets in the StackExchange subset** of the BRIGHT benchmark.
140
- - "**Avg - CD**" refers to the average performance across the **2 datasets in the Coding subset** of the BRIGHT benchmark.
141
- - "**Avg - MT**" refers to the average performance across the **3 datasets in the Theorem-based subset** of the BRIGHT benchmark.
142
-
143
- > Sources of Results:
144
- >
145
- > [1] https://arxiv.org/pdf/2407.12883
146
- >
147
- > [2] https://arxiv.org/pdf/2504.20595
148
- >
149
- > [3] https://github.com/Debrup-61/RaDeR
150
- >
151
- > [4] https://seed1-5-embedding.github.io
152
- >
153
- > [5] https://arxiv.org/pdf/2508.07995
154
- >
155
- > *: results evaluated with our script
156
 
157
  ## Citation
158
 
 
13
 
14
  For more details please refer to our Github: [BGE-Reasoner](https://github.com/FlagOpen/FlagEmbedding/tree/master/research/BGE_Reasoner).
15
 
16
+ **BGE-Reasoner-Embed-Qwen3-8B-0923** is an embedding model trained for reasoning-intensive retrieval tasks, based on [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B). It achieves an nDCG@10 of 37.1 on the [BRIGHT](https://brightbenchmark.github.io/) benchmark with original query, demonstrating its strong capability in reasoning-intensive retrieval tasks.
17
 
18
  The search results on BRIGHT are available [here](https://huggingface.co/BAAI/bge-reasoner-embed-qwen3-8b-0923/tree/main/search_results).
19
 
 
130
 
131
  ## Evaluation
132
 
133
+ BGE-Reasoner-Embed-Qwen3-8B-0923 exhibits strong performance in reasoning-intensive retrieval tasks, as demonstrated by its results (nDCG@10 = 37.1 using original query) on the BRIGHT benchmark.
134
 
135
  <img src="./imgs/bright-performance.png" alt="BRIGHT Performance" style="zoom:200%;" />
136
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
137
 
138
  ## Citation
139
 
imgs/bright-performance.png CHANGED

Git LFS Details

  • SHA256: a99db7e24959989ecdcb8b0f00c8a2e8517ec5fe51d860e4880950682a9af566
  • Pointer size: 131 Bytes
  • Size of remote file: 126 kB

Git LFS Details

  • SHA256: 7c53cefc13bd0ffef36357bddddd34a26d29ba7e289c0e3076aebf16fad0f575
  • Pointer size: 131 Bytes
  • Size of remote file: 126 kB
search_results/examples/EVAL/eval_results.json CHANGED
@@ -1,146 +1,206 @@
1
  {
2
- "biology-examples": {
3
- "ndcg_at_1": 0.50485,
4
- "ndcg_at_10": 0.54407,
5
- "map_at_1": 0.16222,
6
- "map_at_10": 0.43293,
7
- "recall_at_1": 0.16222,
8
- "recall_at_10": 0.63582,
9
- "precision_at_1": 0.50485,
10
- "precision_at_10": 0.22524,
11
- "mrr_at_1": 0.49515,
12
- "mrr_at_10": 0.61066
 
 
 
 
 
13
  },
14
- "theoremqa_theorems-examples": {
15
- "ndcg_at_1": 0.34211,
16
- "ndcg_at_10": 0.47592,
17
- "map_at_1": 0.18499,
18
- "map_at_10": 0.39117,
19
- "recall_at_1": 0.18499,
20
- "recall_at_10": 0.65116,
21
- "precision_at_1": 0.34211,
22
- "precision_at_10": 0.11974,
23
- "mrr_at_1": 0.34211,
24
- "mrr_at_10": 0.45854
 
 
 
 
 
25
  },
26
- "psychology-examples": {
27
- "ndcg_at_1": 0.36634,
28
- "ndcg_at_10": 0.45155,
29
- "map_at_1": 0.16264,
30
- "map_at_10": 0.33572,
31
- "recall_at_1": 0.16264,
32
- "recall_at_10": 0.52951,
33
- "precision_at_1": 0.36634,
34
- "precision_at_10": 0.19406,
35
- "mrr_at_1": 0.36634,
36
- "mrr_at_10": 0.47711
 
 
 
 
 
37
  },
38
  "robotics-examples": {
39
  "ndcg_at_1": 0.28713,
40
  "ndcg_at_10": 0.31993,
 
41
  "map_at_1": 0.14029,
42
  "map_at_10": 0.23822,
 
43
  "recall_at_1": 0.14029,
44
  "recall_at_10": 0.37368,
 
45
  "precision_at_1": 0.28713,
46
  "precision_at_10": 0.11386,
 
47
  "mrr_at_1": 0.28713,
48
- "mrr_at_10": 0.37628
 
49
  },
50
  "aops-examples": {
51
  "ndcg_at_1": 0.13514,
52
  "ndcg_at_10": 0.13305,
 
53
  "map_at_1": 0.03062,
54
  "map_at_10": 0.07937,
 
55
  "recall_at_1": 0.03062,
56
  "recall_at_10": 0.1598,
 
57
  "precision_at_1": 0.13514,
58
  "precision_at_10": 0.07207,
 
59
  "mrr_at_1": 0.13514,
60
- "mrr_at_10": 0.20256
 
61
  },
62
  "sustainable_living-examples": {
63
  "ndcg_at_1": 0.35185,
64
  "ndcg_at_10": 0.37341,
 
65
  "map_at_1": 0.13027,
66
  "map_at_10": 0.27505,
 
67
  "recall_at_1": 0.13027,
68
  "recall_at_10": 0.45011,
 
69
  "precision_at_1": 0.35185,
70
  "precision_at_10": 0.16019,
 
71
  "mrr_at_1": 0.35185,
72
- "mrr_at_10": 0.43959
 
73
  },
74
  "leetcode-examples": {
75
  "ndcg_at_1": 0.28169,
76
  "ndcg_at_10": 0.32309,
 
77
  "map_at_1": 0.17535,
78
  "map_at_10": 0.25267,
 
79
  "recall_at_1": 0.17535,
80
  "recall_at_10": 0.41808,
 
81
  "precision_at_1": 0.28169,
82
  "precision_at_10": 0.07254,
 
83
  "mrr_at_1": 0.28169,
84
- "mrr_at_10": 0.37478
85
- },
86
- "earth_science-examples": {
87
- "ndcg_at_1": 0.57759,
88
- "ndcg_at_10": 0.55426,
89
- "map_at_1": 0.23269,
90
- "map_at_10": 0.44959,
91
- "recall_at_1": 0.23269,
92
- "recall_at_10": 0.58342,
93
- "precision_at_1": 0.57759,
94
- "precision_at_10": 0.2181,
95
- "mrr_at_1": 0.57759,
96
- "mrr_at_10": 0.67135
97
  },
98
  "economics-examples": {
99
  "ndcg_at_1": 0.29126,
100
  "ndcg_at_10": 0.33832,
 
101
  "map_at_1": 0.13804,
102
  "map_at_10": 0.23934,
 
103
  "recall_at_1": 0.13804,
104
  "recall_at_10": 0.35798,
 
105
  "precision_at_1": 0.29126,
106
  "precision_at_10": 0.15049,
 
107
  "mrr_at_1": 0.29126,
108
- "mrr_at_10": 0.37666
109
- },
110
- "theoremqa_questions-examples": {
111
- "ndcg_at_1": 0.39691,
112
- "ndcg_at_10": 0.4124,
113
- "map_at_1": 0.22809,
114
- "map_at_10": 0.37578,
115
- "recall_at_1": 0.22809,
116
- "recall_at_10": 0.45814,
117
- "precision_at_1": 0.39691,
118
- "precision_at_10": 0.09639,
119
- "mrr_at_1": 0.39691,
120
- "mrr_at_10": 0.43798
121
  },
122
  "stackoverflow-examples": {
123
  "ndcg_at_1": 0.30769,
124
  "ndcg_at_10": 0.34329,
 
125
  "map_at_1": 0.12812,
126
  "map_at_10": 0.26281,
 
127
  "recall_at_1": 0.12812,
128
  "recall_at_10": 0.43316,
 
129
  "precision_at_1": 0.30769,
130
  "precision_at_10": 0.12222,
 
131
  "mrr_at_1": 0.2906,
132
- "mrr_at_10": 0.3824
 
133
  },
134
- "pony-examples": {
135
- "ndcg_at_1": 0.24107,
136
- "ndcg_at_10": 0.1903,
137
- "map_at_1": 0.0161,
138
- "map_at_10": 0.05666,
139
- "recall_at_1": 0.0161,
140
- "recall_at_10": 0.09687,
141
- "precision_at_1": 0.24107,
142
- "precision_at_10": 0.17054,
143
- "mrr_at_1": 0.24107,
144
- "mrr_at_10": 0.37504
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
145
  }
146
  }
 
1
  {
2
+ "earth_science-examples": {
3
+ "ndcg_at_1": 0.57759,
4
+ "ndcg_at_10": 0.55426,
5
+ "ndcg_at_100": 0.64815,
6
+ "map_at_1": 0.23269,
7
+ "map_at_10": 0.44959,
8
+ "map_at_100": 0.49319,
9
+ "recall_at_1": 0.23269,
10
+ "recall_at_10": 0.58342,
11
+ "recall_at_100": 0.87067,
12
+ "precision_at_1": 0.57759,
13
+ "precision_at_10": 0.2181,
14
+ "precision_at_100": 0.03966,
15
+ "mrr_at_1": 0.57759,
16
+ "mrr_at_10": 0.67135,
17
+ "mrr_at_100": 0.67713
18
  },
19
+ "theoremqa_questions-examples": {
20
+ "ndcg_at_1": 0.39691,
21
+ "ndcg_at_10": 0.4124,
22
+ "ndcg_at_100": 0.45757,
23
+ "map_at_1": 0.22809,
24
+ "map_at_10": 0.37578,
25
+ "map_at_100": 0.38535,
26
+ "recall_at_1": 0.22809,
27
+ "recall_at_10": 0.45814,
28
+ "recall_at_100": 0.64633,
29
+ "precision_at_1": 0.39691,
30
+ "precision_at_10": 0.09639,
31
+ "precision_at_100": 0.01309,
32
+ "mrr_at_1": 0.39691,
33
+ "mrr_at_10": 0.43798,
34
+ "mrr_at_100": 0.44688
35
  },
36
+ "pony-examples": {
37
+ "ndcg_at_1": 0.24107,
38
+ "ndcg_at_10": 0.18695,
39
+ "ndcg_at_100": 0.27913,
40
+ "map_at_1": 0.0161,
41
+ "map_at_10": 0.05544,
42
+ "map_at_100": 0.09322,
43
+ "recall_at_1": 0.0161,
44
+ "recall_at_10": 0.09518,
45
+ "recall_at_100": 0.38397,
46
+ "precision_at_1": 0.24107,
47
+ "precision_at_10": 0.16696,
48
+ "precision_at_100": 0.07187,
49
+ "mrr_at_1": 0.24107,
50
+ "mrr_at_10": 0.37267,
51
+ "mrr_at_100": 0.38838
52
  },
53
  "robotics-examples": {
54
  "ndcg_at_1": 0.28713,
55
  "ndcg_at_10": 0.31993,
56
+ "ndcg_at_100": 0.40945,
57
  "map_at_1": 0.14029,
58
  "map_at_10": 0.23822,
59
+ "map_at_100": 0.26881,
60
  "recall_at_1": 0.14029,
61
  "recall_at_10": 0.37368,
62
+ "recall_at_100": 0.708,
63
  "precision_at_1": 0.28713,
64
  "precision_at_10": 0.11386,
65
+ "precision_at_100": 0.0298,
66
  "mrr_at_1": 0.28713,
67
+ "mrr_at_10": 0.37628,
68
+ "mrr_at_100": 0.38883
69
  },
70
  "aops-examples": {
71
  "ndcg_at_1": 0.13514,
72
  "ndcg_at_10": 0.13305,
73
+ "ndcg_at_100": 0.21311,
74
  "map_at_1": 0.03062,
75
  "map_at_10": 0.07937,
76
+ "map_at_100": 0.10056,
77
  "recall_at_1": 0.03062,
78
  "recall_at_10": 0.1598,
79
+ "recall_at_100": 0.39924,
80
  "precision_at_1": 0.13514,
81
  "precision_at_10": 0.07207,
82
+ "precision_at_100": 0.01946,
83
  "mrr_at_1": 0.13514,
84
+ "mrr_at_10": 0.20256,
85
+ "mrr_at_100": 0.21432
86
  },
87
  "sustainable_living-examples": {
88
  "ndcg_at_1": 0.35185,
89
  "ndcg_at_10": 0.37341,
90
+ "ndcg_at_100": 0.48528,
91
  "map_at_1": 0.13027,
92
  "map_at_10": 0.27505,
93
+ "map_at_100": 0.32648,
94
  "recall_at_1": 0.13027,
95
  "recall_at_10": 0.45011,
96
+ "recall_at_100": 0.80612,
97
  "precision_at_1": 0.35185,
98
  "precision_at_10": 0.16019,
99
+ "precision_at_100": 0.0375,
100
  "mrr_at_1": 0.35185,
101
+ "mrr_at_10": 0.43959,
102
+ "mrr_at_100": 0.4535
103
  },
104
  "leetcode-examples": {
105
  "ndcg_at_1": 0.28169,
106
  "ndcg_at_10": 0.32309,
107
+ "ndcg_at_100": 0.38938,
108
  "map_at_1": 0.17535,
109
  "map_at_10": 0.25267,
110
+ "map_at_100": 0.26771,
111
  "recall_at_1": 0.17535,
112
  "recall_at_10": 0.41808,
113
+ "recall_at_100": 0.69519,
114
  "precision_at_1": 0.28169,
115
  "precision_at_10": 0.07254,
116
+ "precision_at_100": 0.01254,
117
  "mrr_at_1": 0.28169,
118
+ "mrr_at_10": 0.37478,
119
+ "mrr_at_100": 0.38255
 
 
 
 
 
 
 
 
 
 
 
120
  },
121
  "economics-examples": {
122
  "ndcg_at_1": 0.29126,
123
  "ndcg_at_10": 0.33832,
124
+ "ndcg_at_100": 0.43577,
125
  "map_at_1": 0.13804,
126
  "map_at_10": 0.23934,
127
+ "map_at_100": 0.29841,
128
  "recall_at_1": 0.13804,
129
  "recall_at_10": 0.35798,
130
+ "recall_at_100": 0.72009,
131
  "precision_at_1": 0.29126,
132
  "precision_at_10": 0.15049,
133
+ "precision_at_100": 0.04738,
134
  "mrr_at_1": 0.29126,
135
+ "mrr_at_10": 0.37666,
136
+ "mrr_at_100": 0.38917
 
 
 
 
 
 
 
 
 
 
 
137
  },
138
  "stackoverflow-examples": {
139
  "ndcg_at_1": 0.30769,
140
  "ndcg_at_10": 0.34329,
141
+ "ndcg_at_100": 0.44943,
142
  "map_at_1": 0.12812,
143
  "map_at_10": 0.26281,
144
+ "map_at_100": 0.30158,
145
  "recall_at_1": 0.12812,
146
  "recall_at_10": 0.43316,
147
+ "recall_at_100": 0.77889,
148
  "precision_at_1": 0.30769,
149
  "precision_at_10": 0.12222,
150
+ "precision_at_100": 0.03265,
151
  "mrr_at_1": 0.2906,
152
+ "mrr_at_10": 0.3824,
153
+ "mrr_at_100": 0.39268
154
  },
155
+ "biology-examples": {
156
+ "ndcg_at_1": 0.50485,
157
+ "ndcg_at_10": 0.54407,
158
+ "ndcg_at_100": 0.62536,
159
+ "map_at_1": 0.16222,
160
+ "map_at_10": 0.43293,
161
+ "map_at_100": 0.46687,
162
+ "recall_at_1": 0.16222,
163
+ "recall_at_10": 0.63582,
164
+ "recall_at_100": 0.91484,
165
+ "precision_at_1": 0.50485,
166
+ "precision_at_10": 0.22524,
167
+ "precision_at_100": 0.03301,
168
+ "mrr_at_1": 0.49515,
169
+ "mrr_at_10": 0.61066,
170
+ "mrr_at_100": 0.61562
171
+ },
172
+ "theoremqa_theorems-examples": {
173
+ "ndcg_at_1": 0.34211,
174
+ "ndcg_at_10": 0.47592,
175
+ "ndcg_at_100": 0.54161,
176
+ "map_at_1": 0.18499,
177
+ "map_at_10": 0.39117,
178
+ "map_at_100": 0.41019,
179
+ "recall_at_1": 0.18499,
180
+ "recall_at_10": 0.65116,
181
+ "recall_at_100": 0.89583,
182
+ "precision_at_1": 0.34211,
183
+ "precision_at_10": 0.11974,
184
+ "precision_at_100": 0.01763,
185
+ "mrr_at_1": 0.34211,
186
+ "mrr_at_10": 0.45854,
187
+ "mrr_at_100": 0.46713
188
+ },
189
+ "psychology-examples": {
190
+ "ndcg_at_1": 0.36634,
191
+ "ndcg_at_10": 0.45155,
192
+ "ndcg_at_100": 0.52282,
193
+ "map_at_1": 0.16264,
194
+ "map_at_10": 0.33572,
195
+ "map_at_100": 0.38264,
196
+ "recall_at_1": 0.16264,
197
+ "recall_at_10": 0.52951,
198
+ "recall_at_100": 0.8142,
199
+ "precision_at_1": 0.36634,
200
+ "precision_at_10": 0.19406,
201
+ "precision_at_100": 0.04168,
202
+ "mrr_at_1": 0.36634,
203
+ "mrr_at_10": 0.47711,
204
+ "mrr_at_100": 0.48678
205
  }
206
  }
search_results/examples/aops-examples.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "eval_name": "bright_short",
3
- "model_name": "model_name",
4
  "reranker_name": "NoReranker",
5
  "split": "examples",
6
  "dataset_name": "aops",
 
1
  {
2
  "eval_name": "bright_short",
3
+ "model_name": "bge-reasoner-embed-qwen3-8b-0923",
4
  "reranker_name": "NoReranker",
5
  "split": "examples",
6
  "dataset_name": "aops",
search_results/examples/biology-examples.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:73ff1cf7ff2e2aaaf8c4589982ae31198a58744ee5aba0393a10a4c4cb040f1c
3
- size 16553902
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:928f1da50e99b17ac672c010c980b311507b0b8e93428514d56ad0a10892cefd
3
+ size 16553924
search_results/examples/earth_science-examples.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a3c02bb3fd382207d49a63897dd34ec396c9e1175e35921fd3c1b95fcf5620f2
3
- size 18149199
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a00edb97fd961347afda6259846fe9b684174f275b915adea3eec6eea93033b0
3
+ size 18149221
search_results/examples/economics-examples.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:ca54dce3b507334fd5fb1752bc7979ce1ffc19529339f04e3b02bc3aeea77e65
3
- size 16602998
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d6a9cd4068cbcbba2532bd3fe2092907dbc82ca9fb3c114d70189cc0950df6d5
3
+ size 16603020
search_results/examples/leetcode-examples.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5605b0a3e4529214669438e02af8060bdc07cd168872742e0b13b0f16422b0c0
3
- size 18343889
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fd47cf17fcc232b564d218738592d804194428b91433b0f6a5893561f48e04d2
3
+ size 18343911
search_results/examples/pony-examples.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:2df5caf0f85432924ec0848713162cce488c6f63f63c2da36907c4aab1981c2c
3
- size 14633844
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ff7599e7f23965bf0ed6a5fbc08e6150a93cd1591860882bd4be60a9f02dc147
3
+ size 14638419
search_results/examples/psychology-examples.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:27abb7eeb6ef1fd323c8d0edd4693711ccb2f3ab475ad7cb852d79486ec64c15
3
- size 15426536
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0f434872fcdbb50f7dc83d31ba9764da9d86bd6007301a9495912b185175d737
3
+ size 15426558
search_results/examples/robotics-examples.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:ccaa49fe930eef8a75a4f95775041d720fd98f590ee4a2e43639d85060f0f841
3
- size 14420936
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:de79b06df718828c14911206d08913ee23982cbb366506adc948b996cd47fcc4
3
+ size 14420958
search_results/examples/stackoverflow-examples.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e8583c0c7ad3583159cdbfe1af10f6d7d02867d1d966a649517158809a0cdcdf
3
- size 19083216
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:90839303d3929463e570827fa26b2d96d51d38bc69327bc2a1ee9f61b61299f4
3
+ size 19083238
search_results/examples/sustainable_living-examples.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:424f15ba7d04f4e36a09940fe5ca4c8e1186904cb8cf828341121911aa4a2f74
3
- size 17535702
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5f9da842cc39fdfa47065b4fd5378ca6beed0fe51293d11cb2390e2edcbf2fbb
3
+ size 17535724
search_results/examples/theoremqa_questions-examples.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:11ef6db56d097e389508bece9f359d2e5cac0a5d9c5966343fa0d20475366e84
3
- size 14691873
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2af96c492ee0666553820a5057d2ca20c36a3125794897099e5c6e13f40c3575
3
+ size 14691895
search_results/examples/theoremqa_theorems-examples.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "eval_name": "bright_short",
3
- "model_name": "model_name",
4
  "reranker_name": "NoReranker",
5
  "split": "examples",
6
  "dataset_name": "theoremqa_theorems",
 
1
  {
2
  "eval_name": "bright_short",
3
+ "model_name": "bge-reasoner-embed-qwen3-8b-0923",
4
  "reranker_name": "NoReranker",
5
  "split": "examples",
6
  "dataset_name": "theoremqa_theorems",
search_results/gpt4_reason/EVAL/eval_results.json CHANGED
@@ -1,146 +1,206 @@
1
  {
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  "biology-gpt4_reason": {
3
  "ndcg_at_1": 0.58252,
4
  "ndcg_at_10": 0.6238,
 
5
  "map_at_1": 0.20161,
6
  "map_at_10": 0.52321,
 
7
  "recall_at_1": 0.20161,
8
  "recall_at_10": 0.70649,
 
9
  "precision_at_1": 0.58252,
10
  "precision_at_10": 0.24757,
 
11
  "mrr_at_1": 0.57282,
12
- "mrr_at_10": 0.67721
13
- },
14
- "stackoverflow-gpt4_reason": {
15
- "ndcg_at_1": 0.33333,
16
- "ndcg_at_10": 0.39248,
17
- "map_at_1": 0.15138,
18
- "map_at_10": 0.3128,
19
- "recall_at_1": 0.15138,
20
- "recall_at_10": 0.48351,
21
- "precision_at_1": 0.33333,
22
- "precision_at_10": 0.13761,
23
- "mrr_at_1": 0.34188,
24
- "mrr_at_10": 0.43379
25
  },
26
  "sustainable_living-gpt4_reason": {
27
  "ndcg_at_1": 0.36111,
28
  "ndcg_at_10": 0.40345,
 
29
  "map_at_1": 0.15859,
30
  "map_at_10": 0.3077,
 
31
  "recall_at_1": 0.15859,
32
  "recall_at_10": 0.47684,
 
33
  "precision_at_1": 0.36111,
34
  "precision_at_10": 0.16944,
 
35
  "mrr_at_1": 0.36111,
36
- "mrr_at_10": 0.46219
 
37
  },
38
  "leetcode-gpt4_reason": {
39
  "ndcg_at_1": 0.23239,
40
  "ndcg_at_10": 0.28348,
 
41
  "map_at_1": 0.14894,
42
  "map_at_10": 0.21761,
 
43
  "recall_at_1": 0.14894,
44
  "recall_at_10": 0.37946,
 
45
  "precision_at_1": 0.23239,
46
  "precision_at_10": 0.0662,
 
47
  "mrr_at_1": 0.23239,
48
- "mrr_at_10": 0.31958
 
49
  },
50
  "pony-gpt4_reason": {
51
  "ndcg_at_1": 0.4375,
52
- "ndcg_at_10": 0.31629,
 
53
  "map_at_1": 0.02577,
54
- "map_at_10": 0.09796,
 
55
  "recall_at_1": 0.02577,
56
- "recall_at_10": 0.1572,
 
57
  "precision_at_1": 0.4375,
58
- "precision_at_10": 0.27321,
 
59
  "mrr_at_1": 0.4375,
60
- "mrr_at_10": 0.58011
 
61
  },
62
  "aops-gpt4_reason": {
63
  "ndcg_at_1": 0.0991,
64
  "ndcg_at_10": 0.12337,
 
65
  "map_at_1": 0.02301,
66
  "map_at_10": 0.07287,
 
67
  "recall_at_1": 0.02301,
68
  "recall_at_10": 0.15546,
 
69
  "precision_at_1": 0.0991,
70
  "precision_at_10": 0.07748,
 
71
  "mrr_at_1": 0.0991,
72
- "mrr_at_10": 0.17159
 
73
  },
74
  "theoremqa_questions-gpt4_reason": {
75
  "ndcg_at_1": 0.39175,
76
  "ndcg_at_10": 0.39407,
 
77
  "map_at_1": 0.2268,
78
  "map_at_10": 0.36037,
 
79
  "recall_at_1": 0.2268,
80
  "recall_at_10": 0.42649,
 
81
  "precision_at_1": 0.39175,
82
  "precision_at_10": 0.08918,
 
83
  "mrr_at_1": 0.39175,
84
- "mrr_at_10": 0.42989
85
- },
86
- "theoremqa_theorems-gpt4_reason": {
87
- "ndcg_at_1": 0.31579,
88
- "ndcg_at_10": 0.41518,
89
- "map_at_1": 0.18061,
90
- "map_at_10": 0.34361,
91
- "recall_at_1": 0.18061,
92
- "recall_at_10": 0.55138,
93
- "precision_at_1": 0.31579,
94
- "precision_at_10": 0.10526,
95
- "mrr_at_1": 0.31579,
96
- "mrr_at_10": 0.41206
97
- },
98
- "earth_science-gpt4_reason": {
99
- "ndcg_at_1": 0.68966,
100
- "ndcg_at_10": 0.62277,
101
- "map_at_1": 0.27438,
102
- "map_at_10": 0.51581,
103
- "recall_at_1": 0.27438,
104
- "recall_at_10": 0.62818,
105
- "precision_at_1": 0.68966,
106
- "precision_at_10": 0.23966,
107
- "mrr_at_1": 0.68966,
108
- "mrr_at_10": 0.75591
109
- },
110
- "economics-gpt4_reason": {
111
- "ndcg_at_1": 0.25243,
112
- "ndcg_at_10": 0.35251,
113
- "map_at_1": 0.1225,
114
- "map_at_10": 0.24774,
115
- "recall_at_1": 0.1225,
116
- "recall_at_10": 0.39583,
117
- "precision_at_1": 0.25243,
118
- "precision_at_10": 0.16214,
119
- "mrr_at_1": 0.25243,
120
- "mrr_at_10": 0.36646
121
  },
122
  "psychology-gpt4_reason": {
123
  "ndcg_at_1": 0.43564,
124
  "ndcg_at_10": 0.49823,
 
125
  "map_at_1": 0.19875,
126
  "map_at_10": 0.37395,
 
127
  "recall_at_1": 0.19875,
128
  "recall_at_10": 0.56675,
 
129
  "precision_at_1": 0.43564,
130
  "precision_at_10": 0.20396,
 
131
  "mrr_at_1": 0.43564,
132
- "mrr_at_10": 0.55344
 
133
  },
134
- "robotics-gpt4_reason": {
135
- "ndcg_at_1": 0.30693,
136
- "ndcg_at_10": 0.3438,
137
- "map_at_1": 0.13674,
138
- "map_at_10": 0.25764,
139
- "recall_at_1": 0.13674,
140
- "recall_at_10": 0.39831,
141
- "precision_at_1": 0.30693,
142
- "precision_at_10": 0.12277,
143
- "mrr_at_1": 0.30693,
144
- "mrr_at_10": 0.40114
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
145
  }
146
  }
 
1
  {
2
+ "earth_science-gpt4_reason": {
3
+ "ndcg_at_1": 0.68966,
4
+ "ndcg_at_10": 0.62277,
5
+ "ndcg_at_100": 0.70079,
6
+ "map_at_1": 0.27438,
7
+ "map_at_10": 0.51581,
8
+ "map_at_100": 0.55688,
9
+ "recall_at_1": 0.27438,
10
+ "recall_at_10": 0.62818,
11
+ "recall_at_100": 0.87189,
12
+ "precision_at_1": 0.68966,
13
+ "precision_at_10": 0.23966,
14
+ "precision_at_100": 0.04017,
15
+ "mrr_at_1": 0.68966,
16
+ "mrr_at_10": 0.75591,
17
+ "mrr_at_100": 0.76114
18
+ },
19
+ "economics-gpt4_reason": {
20
+ "ndcg_at_1": 0.25243,
21
+ "ndcg_at_10": 0.35251,
22
+ "ndcg_at_100": 0.4404,
23
+ "map_at_1": 0.1225,
24
+ "map_at_10": 0.24774,
25
+ "map_at_100": 0.30356,
26
+ "recall_at_1": 0.1225,
27
+ "recall_at_10": 0.39583,
28
+ "recall_at_100": 0.7242,
29
+ "precision_at_1": 0.25243,
30
+ "precision_at_10": 0.16214,
31
+ "precision_at_100": 0.0467,
32
+ "mrr_at_1": 0.25243,
33
+ "mrr_at_10": 0.36646,
34
+ "mrr_at_100": 0.37617
35
+ },
36
+ "robotics-gpt4_reason": {
37
+ "ndcg_at_1": 0.30693,
38
+ "ndcg_at_10": 0.3438,
39
+ "ndcg_at_100": 0.42992,
40
+ "map_at_1": 0.13674,
41
+ "map_at_10": 0.25764,
42
+ "map_at_100": 0.29106,
43
+ "recall_at_1": 0.13674,
44
+ "recall_at_10": 0.39831,
45
+ "recall_at_100": 0.71796,
46
+ "precision_at_1": 0.30693,
47
+ "precision_at_10": 0.12277,
48
+ "precision_at_100": 0.02941,
49
+ "mrr_at_1": 0.30693,
50
+ "mrr_at_10": 0.40114,
51
+ "mrr_at_100": 0.41358
52
+ },
53
  "biology-gpt4_reason": {
54
  "ndcg_at_1": 0.58252,
55
  "ndcg_at_10": 0.6238,
56
+ "ndcg_at_100": 0.69335,
57
  "map_at_1": 0.20161,
58
  "map_at_10": 0.52321,
59
+ "map_at_100": 0.55406,
60
  "recall_at_1": 0.20161,
61
  "recall_at_10": 0.70649,
62
+ "recall_at_100": 0.94461,
63
  "precision_at_1": 0.58252,
64
  "precision_at_10": 0.24757,
65
+ "precision_at_100": 0.03408,
66
  "mrr_at_1": 0.57282,
67
+ "mrr_at_10": 0.67721,
68
+ "mrr_at_100": 0.68104
 
 
 
 
 
 
 
 
 
 
 
69
  },
70
  "sustainable_living-gpt4_reason": {
71
  "ndcg_at_1": 0.36111,
72
  "ndcg_at_10": 0.40345,
73
+ "ndcg_at_100": 0.50126,
74
  "map_at_1": 0.15859,
75
  "map_at_10": 0.3077,
76
+ "map_at_100": 0.35186,
77
  "recall_at_1": 0.15859,
78
  "recall_at_10": 0.47684,
79
+ "recall_at_100": 0.80474,
80
  "precision_at_1": 0.36111,
81
  "precision_at_10": 0.16944,
82
+ "precision_at_100": 0.03602,
83
  "mrr_at_1": 0.36111,
84
+ "mrr_at_10": 0.46219,
85
+ "mrr_at_100": 0.47221
86
  },
87
  "leetcode-gpt4_reason": {
88
  "ndcg_at_1": 0.23239,
89
  "ndcg_at_10": 0.28348,
90
+ "ndcg_at_100": 0.35507,
91
  "map_at_1": 0.14894,
92
  "map_at_10": 0.21761,
93
+ "map_at_100": 0.23384,
94
  "recall_at_1": 0.14894,
95
  "recall_at_10": 0.37946,
96
+ "recall_at_100": 0.67911,
97
  "precision_at_1": 0.23239,
98
  "precision_at_10": 0.0662,
99
+ "precision_at_100": 0.01218,
100
  "mrr_at_1": 0.23239,
101
+ "mrr_at_10": 0.31958,
102
+ "mrr_at_100": 0.32962
103
  },
104
  "pony-gpt4_reason": {
105
  "ndcg_at_1": 0.4375,
106
+ "ndcg_at_10": 0.31554,
107
+ "ndcg_at_100": 0.40062,
108
  "map_at_1": 0.02577,
109
+ "map_at_10": 0.0977,
110
+ "map_at_100": 0.16595,
111
  "recall_at_1": 0.02577,
112
+ "recall_at_10": 0.15626,
113
+ "recall_at_100": 0.51074,
114
  "precision_at_1": 0.4375,
115
+ "precision_at_10": 0.27143,
116
+ "precision_at_100": 0.09634,
117
  "mrr_at_1": 0.4375,
118
+ "mrr_at_10": 0.58209,
119
+ "mrr_at_100": 0.58798
120
  },
121
  "aops-gpt4_reason": {
122
  "ndcg_at_1": 0.0991,
123
  "ndcg_at_10": 0.12337,
124
+ "ndcg_at_100": 0.19981,
125
  "map_at_1": 0.02301,
126
  "map_at_10": 0.07287,
127
+ "map_at_100": 0.09278,
128
  "recall_at_1": 0.02301,
129
  "recall_at_10": 0.15546,
130
+ "recall_at_100": 0.38814,
131
  "precision_at_1": 0.0991,
132
  "precision_at_10": 0.07748,
133
+ "precision_at_100": 0.01865,
134
  "mrr_at_1": 0.0991,
135
+ "mrr_at_10": 0.17159,
136
+ "mrr_at_100": 0.18433
137
  },
138
  "theoremqa_questions-gpt4_reason": {
139
  "ndcg_at_1": 0.39175,
140
  "ndcg_at_10": 0.39407,
141
+ "ndcg_at_100": 0.44046,
142
  "map_at_1": 0.2268,
143
  "map_at_10": 0.36037,
144
+ "map_at_100": 0.37029,
145
  "recall_at_1": 0.2268,
146
  "recall_at_10": 0.42649,
147
+ "recall_at_100": 0.61392,
148
  "precision_at_1": 0.39175,
149
  "precision_at_10": 0.08918,
150
+ "precision_at_100": 0.01273,
151
  "mrr_at_1": 0.39175,
152
+ "mrr_at_10": 0.42989,
153
+ "mrr_at_100": 0.43888
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
154
  },
155
  "psychology-gpt4_reason": {
156
  "ndcg_at_1": 0.43564,
157
  "ndcg_at_10": 0.49823,
158
+ "ndcg_at_100": 0.56403,
159
  "map_at_1": 0.19875,
160
  "map_at_10": 0.37395,
161
+ "map_at_100": 0.41725,
162
  "recall_at_1": 0.19875,
163
  "recall_at_10": 0.56675,
164
+ "recall_at_100": 0.83745,
165
  "precision_at_1": 0.43564,
166
  "precision_at_10": 0.20396,
167
+ "precision_at_100": 0.04386,
168
  "mrr_at_1": 0.43564,
169
+ "mrr_at_10": 0.55344,
170
+ "mrr_at_100": 0.56089
171
  },
172
+ "stackoverflow-gpt4_reason": {
173
+ "ndcg_at_1": 0.33333,
174
+ "ndcg_at_10": 0.39248,
175
+ "ndcg_at_100": 0.49454,
176
+ "map_at_1": 0.15138,
177
+ "map_at_10": 0.3128,
178
+ "map_at_100": 0.3534,
179
+ "recall_at_1": 0.15138,
180
+ "recall_at_10": 0.48351,
181
+ "recall_at_100": 0.81653,
182
+ "precision_at_1": 0.33333,
183
+ "precision_at_10": 0.13761,
184
+ "precision_at_100": 0.03299,
185
+ "mrr_at_1": 0.34188,
186
+ "mrr_at_10": 0.43379,
187
+ "mrr_at_100": 0.44454
188
+ },
189
+ "theoremqa_theorems-gpt4_reason": {
190
+ "ndcg_at_1": 0.31579,
191
+ "ndcg_at_10": 0.41518,
192
+ "ndcg_at_100": 0.50262,
193
+ "map_at_1": 0.18061,
194
+ "map_at_10": 0.34361,
195
+ "map_at_100": 0.37053,
196
+ "recall_at_1": 0.18061,
197
+ "recall_at_10": 0.55138,
198
+ "recall_at_100": 0.86952,
199
+ "precision_at_1": 0.31579,
200
+ "precision_at_10": 0.10526,
201
+ "precision_at_100": 0.01724,
202
+ "mrr_at_1": 0.31579,
203
+ "mrr_at_10": 0.41206,
204
+ "mrr_at_100": 0.42468
205
  }
206
  }
search_results/gpt4_reason/aops-gpt4_reason.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "eval_name": "bright_short",
3
- "model_name": "model_name",
4
  "reranker_name": "NoReranker",
5
  "split": "gpt4_reason",
6
  "dataset_name": "aops",
 
1
  {
2
  "eval_name": "bright_short",
3
+ "model_name": "bge-reasoner-embed-qwen3-8b-0923",
4
  "reranker_name": "NoReranker",
5
  "split": "gpt4_reason",
6
  "dataset_name": "aops",
search_results/gpt4_reason/biology-gpt4_reason.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a314b2948bd2a48f44d4d949645051600cd1cba4e600dc9b89c31753570e4c8a
3
- size 16518223
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ca5bc5385b40a26e4dae800a1e3ef5da67292abc64e25e1cd5f93087609b6551
3
+ size 16518245
search_results/gpt4_reason/earth_science-gpt4_reason.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1f2fa207b74ef2808c22bbed1d086941cd177cd442c26b7ea8cdd4991bce3411
3
- size 18093660
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e327e4d348e68ba0f82c378b47201730922e77f6ab801a7edbb7f35a2162b39c
3
+ size 18093682
search_results/gpt4_reason/economics-gpt4_reason.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:41293661ab3cec60d7620fe33a4689ccbd0f5b67ee5f8dbf5ba589fb0f70fb0f
3
- size 16558285
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a04232d66f4b8ebdbe04605a31e58220cf882687ac175b709b53d7e68dc852c4
3
+ size 16558307
search_results/gpt4_reason/leetcode-gpt4_reason.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4520f31f78b5d0543af77eee8c7f29ddfe7d3c17960778ca654893490fac36d9
3
- size 18454399
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4173c80bc28c902b3185acb767ca736293420b414d0d635c6c4356df0f40f41e
3
+ size 18454421
search_results/gpt4_reason/pony-gpt4_reason.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f9f79f2d325a64c1362fcf7cc7707b6ddb3adfbd164c246bb70dcac003d7765f
3
- size 14625227
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:535d52285a03f186f6a2cec113d561f159dc31f33a718624c677b7766c324489
3
+ size 14626574
search_results/gpt4_reason/psychology-gpt4_reason.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1347e9db837913ab175f6b5daee4b8e1e3babb69ba3a43396a5b49cec2655d0d
3
- size 15322577
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6934acd47feaf0d4660f27dd6698974b1e39bc3b33aa366e642bbdaa0e98ef80
3
+ size 15322599
search_results/gpt4_reason/robotics-gpt4_reason.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:6fd310df6a895d2d566674614b1976af7a85248ceca02e583d7c2b61bcde49d2
3
- size 14451325
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:da4db05bc8a4a6373f0c55ba14e9eeece3183309a5686a7c21d2c4524f28754a
3
+ size 14451347
search_results/gpt4_reason/stackoverflow-gpt4_reason.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9f98cedc37011e2e1900a716e3330342562bf028dfd6ea6fd056a7c1b44fb293
3
- size 19155033
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:597ef4f451d4a8fed29f02e19a3ba9f48a5bbeb831e2b730542c006998dbd3ed
3
+ size 19155055
search_results/gpt4_reason/sustainable_living-gpt4_reason.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:ef85268f068dccb8f806d7f6d5deeb748e9d72f7ca82cd092ee3c93b4b9d52fd
3
- size 17382433
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6834741e9fb06e733e8709f48df1f4765a6cd7a3f17908f4a9b3505f5373066d
3
+ size 17382455
search_results/gpt4_reason/theoremqa_questions-gpt4_reason.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:96c0abc7644d10c4a13532f2420a6a3ef9d952917ef422e7cb0658b199f783c2
3
- size 14333959
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ff71cae336466a1d4fbedac67a9f278e6b45c1f322d5c13e0d692ec57af8742c
3
+ size 14333981
search_results/gpt4_reason/theoremqa_theorems-gpt4_reason.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "eval_name": "bright_short",
3
- "model_name": "model_name",
4
  "reranker_name": "NoReranker",
5
  "split": "gpt4_reason",
6
  "dataset_name": "theoremqa_theorems",
 
1
  {
2
  "eval_name": "bright_short",
3
+ "model_name": "bge-reasoner-embed-qwen3-8b-0923",
4
  "reranker_name": "NoReranker",
5
  "split": "gpt4_reason",
6
  "dataset_name": "theoremqa_theorems",