nielsr HF Staff commited on
Commit
b58d96e
·
verified ·
1 Parent(s): faffe15

Improve model card: Add pipeline tag, library, abstract, and authors

Browse files

This PR enhances the model card for `Shiyunee/Honest-Llama3-8B-Instruct` by:
- Adding `pipeline_tag: text-generation` to improve discoverability on the Hub, as the model extends a text generation LLM for QA tasks.
- Adding `library_name: transformers` to enable the automated "how to use" widget, as the model is compatible with the `transformers` and `peft` libraries.
- Adding the paper's abstract to provide more context about the model.
- Including the author list with Hugging Face usernames for proper attribution.
- Adding a prominent link to the main GitHub repository at the top of the model card for easy access to the code.
- Retaining the existing Arxiv paper link as per instructions.
- Preserving the existing code snippets in the "Inference" section without modification or additions to ensure accuracy and adherence to the original implementation.

Files changed (1) hide show
  1. README.md +59 -6
README.md CHANGED
@@ -1,13 +1,33 @@
1
  ---
2
- license: apache-2.0
3
- language:
4
- - en
5
  base_model:
6
  - meta-llama/Meta-Llama-3-8B-Instruct
 
 
 
 
 
7
  ---
8
- # Introduction
9
 
10
- This is the official repo of the paper [Annotation-Efficient Universal Honesty Alignment](https://arxiv.org/abs/2510.17509)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
  This repository provides modules that extend **Llama3-8B-Instruct** with the ability to generate accurate confidence scores *before* response generation, indicating how likely the model is to answer a given question correctly across tasks. We offer two types of modules—**LoRA + Linear Head** and **Linear Head**—along with model parameters under three training settings:
13
 
@@ -131,4 +151,37 @@ base_model = AutoModel.from_pretrained(args.model_path)
131
 
132
  /mlp
133
  ...
134
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
2
  base_model:
3
  - meta-llama/Meta-Llama-3-8B-Instruct
4
+ language:
5
+ - en
6
+ license: apache-2.0
7
+ pipeline_tag: text-generation
8
+ library_name: transformers
9
  ---
 
10
 
11
+ # Annotation-Efficient Universal Honesty Alignment
12
+
13
+ This model card is for the paper [Annotation-Efficient Universal Honesty Alignment](https://arxiv.org/abs/2510.17509).
14
+ For code and further details, visit the official [GitHub repository](https://github.com/Trustworthy-Information-Access/Annotation-Efficient-Universal-Honesty-Alignment).
15
+
16
+ ## Abstract
17
+
18
+ Honesty alignment-the ability of large language models (LLMs) to recognize their knowledge boundaries and express calibrated confidence-is essential for trustworthy deployment. Existing methods either rely on training-free confidence estimation (e.g., token probabilities, self-consistency) or training-based calibration with correctness annotations. While effective, achieving universal honesty alignment with training-based calibration requires costly, large-scale labeling. To support annotation-efficient training, we introduce Elicitation-Then-Calibration (EliCal), a two-stage framework that first elicits internal confidence using inexpensive self-consistency supervision, then calibrates this confidence with a small set of correctness annotations. To support a large-scale study, we release HonestyBench, a benchmark covering ten free-form QA datasets with 560k training and 70k evaluation instances annotated with correctness and self-consistency signals. Experiments show that EliCal achieves near-optimal alignment with only 1k correctness annotations (0.18% of full supervision) and better alignment performance on unseen MMLU tasks than the calibration-only baseline, offering a scalable solution toward universal honesty alignment in LLMs.
19
+
20
+ ## Authors
21
+
22
+ - Shiyu Ni (Shiyunee)
23
+ - Keping Bi
24
+ - Jiafeng Guo
25
+ - Minghao Tang (StudentTang)
26
+ - Jingtong Wu
27
+ - Zengxin Han
28
+ - Xueqi Cheng
29
+
30
+ # Introduction
31
 
32
  This repository provides modules that extend **Llama3-8B-Instruct** with the ability to generate accurate confidence scores *before* response generation, indicating how likely the model is to answer a given question correctly across tasks. We offer two types of modules—**LoRA + Linear Head** and **Linear Head**—along with model parameters under three training settings:
33
 
 
151
 
152
  /mlp
153
  ...
154
+ ```
155
+
156
+ # Evaluation
157
+
158
+ After the training and confidence score prediction are completed, you can use the following command to perform score aggregation and evaluation.
159
+
160
+ ```sh
161
+ bash run_eval.sh
162
+ ```
163
+
164
+ This will compute **AUROC**, **ECE**, and **Alignment** simultaneously and save the results into three Excel files.
165
+ Each Excel file contains **12 rows** (depending on how many training data sizes are evaluated). From top to bottom, each row corresponds to a different **amount of labeled data used during training**, in the same order as specified in your evaluation input. Each file has **5 columns**, which from left to right represent:
166
+
167
+ | Column | Meaning |
168
+ | -------- | ----------------------------------- |
169
+ | N-Prob | Normalized probability baseline |
170
+ | Cons-Sem | Consistency with semantic agreement |
171
+ | Eli-Only | Elicitation-only model |
172
+ | Cal-Only | Calibration-only model |
173
+ | EliCal | Full EliCal method |
174
+
175
+ If you see `_mlp` at the end of a script name, it means that only a classification head is added to the model without using LoRA. This is not the main focus of the paper, but simply an ablation study.
176
+
177
+ ## Results
178
+
179
+ Experiments show that **EliCal achieves near-optimal alignment** **with only 1k correctness annotations** (∼0.18% of full supervision) and better alignment performance on unseen MMLU tasks than the calibration-only baseline, offering a scalable solution toward universal honesty alignment in LLMs.
180
+
181
+ > We **provide all the plotting functions** used in the paper in `honesty_alignment/draw.py`.
182
+ >
183
+ > For more details, please refer to our paper.
184
+
185
+ ## Citation
186
+
187
+ If you find our repository useful, please consider giving it a star. 🚀✨. Please cite the paper if you find our work helpful.