Improve model card: Add pipeline tag, library, abstract, and authors
Browse filesThis PR enhances the model card for `Shiyunee/Honest-Llama3-8B-Instruct` by:
- Adding `pipeline_tag: text-generation` to improve discoverability on the Hub, as the model extends a text generation LLM for QA tasks.
- Adding `library_name: transformers` to enable the automated "how to use" widget, as the model is compatible with the `transformers` and `peft` libraries.
- Adding the paper's abstract to provide more context about the model.
- Including the author list with Hugging Face usernames for proper attribution.
- Adding a prominent link to the main GitHub repository at the top of the model card for easy access to the code.
- Retaining the existing Arxiv paper link as per instructions.
- Preserving the existing code snippets in the "Inference" section without modification or additions to ensure accuracy and adherence to the original implementation.
|
@@ -1,13 +1,33 @@
|
|
| 1 |
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
language:
|
| 4 |
-
- en
|
| 5 |
base_model:
|
| 6 |
- meta-llama/Meta-Llama-3-8B-Instruct
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
---
|
| 8 |
-
# Introduction
|
| 9 |
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
|
| 12 |
This repository provides modules that extend **Llama3-8B-Instruct** with the ability to generate accurate confidence scores *before* response generation, indicating how likely the model is to answer a given question correctly across tasks. We offer two types of modules—**LoRA + Linear Head** and **Linear Head**—along with model parameters under three training settings:
|
| 13 |
|
|
@@ -131,4 +151,37 @@ base_model = AutoModel.from_pretrained(args.model_path)
|
|
| 131 |
|
| 132 |
/mlp
|
| 133 |
...
|
| 134 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
| 2 |
base_model:
|
| 3 |
- meta-llama/Meta-Llama-3-8B-Instruct
|
| 4 |
+
language:
|
| 5 |
+
- en
|
| 6 |
+
license: apache-2.0
|
| 7 |
+
pipeline_tag: text-generation
|
| 8 |
+
library_name: transformers
|
| 9 |
---
|
|
|
|
| 10 |
|
| 11 |
+
# Annotation-Efficient Universal Honesty Alignment
|
| 12 |
+
|
| 13 |
+
This model card is for the paper [Annotation-Efficient Universal Honesty Alignment](https://arxiv.org/abs/2510.17509).
|
| 14 |
+
For code and further details, visit the official [GitHub repository](https://github.com/Trustworthy-Information-Access/Annotation-Efficient-Universal-Honesty-Alignment).
|
| 15 |
+
|
| 16 |
+
## Abstract
|
| 17 |
+
|
| 18 |
+
Honesty alignment-the ability of large language models (LLMs) to recognize their knowledge boundaries and express calibrated confidence-is essential for trustworthy deployment. Existing methods either rely on training-free confidence estimation (e.g., token probabilities, self-consistency) or training-based calibration with correctness annotations. While effective, achieving universal honesty alignment with training-based calibration requires costly, large-scale labeling. To support annotation-efficient training, we introduce Elicitation-Then-Calibration (EliCal), a two-stage framework that first elicits internal confidence using inexpensive self-consistency supervision, then calibrates this confidence with a small set of correctness annotations. To support a large-scale study, we release HonestyBench, a benchmark covering ten free-form QA datasets with 560k training and 70k evaluation instances annotated with correctness and self-consistency signals. Experiments show that EliCal achieves near-optimal alignment with only 1k correctness annotations (0.18% of full supervision) and better alignment performance on unseen MMLU tasks than the calibration-only baseline, offering a scalable solution toward universal honesty alignment in LLMs.
|
| 19 |
+
|
| 20 |
+
## Authors
|
| 21 |
+
|
| 22 |
+
- Shiyu Ni (Shiyunee)
|
| 23 |
+
- Keping Bi
|
| 24 |
+
- Jiafeng Guo
|
| 25 |
+
- Minghao Tang (StudentTang)
|
| 26 |
+
- Jingtong Wu
|
| 27 |
+
- Zengxin Han
|
| 28 |
+
- Xueqi Cheng
|
| 29 |
+
|
| 30 |
+
# Introduction
|
| 31 |
|
| 32 |
This repository provides modules that extend **Llama3-8B-Instruct** with the ability to generate accurate confidence scores *before* response generation, indicating how likely the model is to answer a given question correctly across tasks. We offer two types of modules—**LoRA + Linear Head** and **Linear Head**—along with model parameters under three training settings:
|
| 33 |
|
|
|
|
| 151 |
|
| 152 |
/mlp
|
| 153 |
...
|
| 154 |
+
```
|
| 155 |
+
|
| 156 |
+
# Evaluation
|
| 157 |
+
|
| 158 |
+
After the training and confidence score prediction are completed, you can use the following command to perform score aggregation and evaluation.
|
| 159 |
+
|
| 160 |
+
```sh
|
| 161 |
+
bash run_eval.sh
|
| 162 |
+
```
|
| 163 |
+
|
| 164 |
+
This will compute **AUROC**, **ECE**, and **Alignment** simultaneously and save the results into three Excel files.
|
| 165 |
+
Each Excel file contains **12 rows** (depending on how many training data sizes are evaluated). From top to bottom, each row corresponds to a different **amount of labeled data used during training**, in the same order as specified in your evaluation input. Each file has **5 columns**, which from left to right represent:
|
| 166 |
+
|
| 167 |
+
| Column | Meaning |
|
| 168 |
+
| -------- | ----------------------------------- |
|
| 169 |
+
| N-Prob | Normalized probability baseline |
|
| 170 |
+
| Cons-Sem | Consistency with semantic agreement |
|
| 171 |
+
| Eli-Only | Elicitation-only model |
|
| 172 |
+
| Cal-Only | Calibration-only model |
|
| 173 |
+
| EliCal | Full EliCal method |
|
| 174 |
+
|
| 175 |
+
If you see `_mlp` at the end of a script name, it means that only a classification head is added to the model without using LoRA. This is not the main focus of the paper, but simply an ablation study.
|
| 176 |
+
|
| 177 |
+
## Results
|
| 178 |
+
|
| 179 |
+
Experiments show that **EliCal achieves near-optimal alignment** **with only 1k correctness annotations** (∼0.18% of full supervision) and better alignment performance on unseen MMLU tasks than the calibration-only baseline, offering a scalable solution toward universal honesty alignment in LLMs.
|
| 180 |
+
|
| 181 |
+
> We **provide all the plotting functions** used in the paper in `honesty_alignment/draw.py`.
|
| 182 |
+
>
|
| 183 |
+
> For more details, please refer to our paper.
|
| 184 |
+
|
| 185 |
+
## Citation
|
| 186 |
+
|
| 187 |
+
If you find our repository useful, please consider giving it a star. 🚀✨. Please cite the paper if you find our work helpful.
|