Spaces:
Running
Running
Update app.py
Browse files
app.py
CHANGED
|
@@ -171,18 +171,18 @@ with tab1:
|
|
| 171 |
# st.markdown('<div class="title">Leaderboard</div>', unsafe_allow_html=True)
|
| 172 |
st.markdown('<div class="tab-content">', unsafe_allow_html=True)
|
| 173 |
|
| 174 |
-
st.markdown('Metrics Explanation')
|
| 175 |
-
st.markdown(
|
| 176 |
<div class="metric">
|
| 177 |
<br/>
|
| 178 |
<p style="font-size:16px;">
|
| 179 |
-
<strong> Factual Precision </strong> measures the
|
| 180 |
</p>
|
| 181 |
<p style="font-size:16px;">
|
| 182 |
π for closed LLMs; π for open-weights LLMs; π¨ for newly added models"
|
| 183 |
</p>
|
| 184 |
</div>
|
| 185 |
-
|
| 186 |
unsafe_allow_html=True
|
| 187 |
)
|
| 188 |
|
|
|
|
| 171 |
# st.markdown('<div class="title">Leaderboard</div>', unsafe_allow_html=True)
|
| 172 |
st.markdown('<div class="tab-content">', unsafe_allow_html=True)
|
| 173 |
|
| 174 |
+
st.markdown('# Metrics Explanation')
|
| 175 |
+
st.markdown("""
|
| 176 |
<div class="metric">
|
| 177 |
<br/>
|
| 178 |
<p style="font-size:16px;">
|
| 179 |
+
<strong> Factual Precision </strong> measures the ratio of supported units divided by all units averaged over model responses. <strong> Hallucination Score </strong> quantifies the incorrect or inconclusive contents within a model response, as described in the paper. We also provide statistics on the average number of units labelled as unsupported (<strong>Avg. # Unsupported</strong>), the average number of units labelled as undecidable (<strong>Avg. # Undecided</strong>), the average length of the response in terms of the number of tokens, and the average verifiable units existing in the model responses (<strong>Avg. # Units</strong>).
|
| 180 |
</p>
|
| 181 |
<p style="font-size:16px;">
|
| 182 |
π for closed LLMs; π for open-weights LLMs; π¨ for newly added models"
|
| 183 |
</p>
|
| 184 |
</div>
|
| 185 |
+
""",
|
| 186 |
unsafe_allow_html=True
|
| 187 |
)
|
| 188 |
|