Improve model card: Add pipeline tag, library, abstract, and authors

This PR enhances the model card for `Shiyunee/Honest-Llama3-8B-Instruct` by:
- Adding `pipeline_tag: text-generation` to improve discoverability on the Hub, as the model extends a text generation LLM for QA tasks.
- Adding `library_name: transformers` to enable the automated "how to use" widget, as the model is compatible with the `transformers` and `peft` libraries.
- Adding the paper's abstract to provide more context about the model.
- Including the author list with Hugging Face usernames for proper attribution.
- Adding a prominent link to the main GitHub repository at the top of the model card for easy access to the code.
- Retaining the existing Arxiv paper link as per instructions.
- Preserving the existing code snippets in the "Inference" section without modification or additions to ensure accuracy and adherence to the original implementation.

Files changed (1) hide show

README.md +59 -6

README.md CHANGED Viewed

@@ -1,13 +1,33 @@
 ---
-license: apache-2.0
-language:
-- en
 base_model:
 - meta-llama/Meta-Llama-3-8B-Instruct
 ---
-# Introduction
-This is the official repo of the paper [Annotation-Efficient Universal Honesty Alignment](https://arxiv.org/abs/2510.17509)
 This repository provides modules that extend **Llama3-8B-Instruct** with the ability to generate accurate confidence scores *before* response generation, indicating how likely the model is to answer a given question correctly across tasks. We offer two types of modules—**LoRA + Linear Head** and **Linear Head**—along with model parameters under three training settings:
@@ -131,4 +151,37 @@ base_model = AutoModel.from_pretrained(args.model_path)
 /mlp
 ...
-```

 ---
 base_model:
 - meta-llama/Meta-Llama-3-8B-Instruct
+language:
+- en
+license: apache-2.0
+pipeline_tag: text-generation
+library_name: transformers
 ---
+# Annotation-Efficient Universal Honesty Alignment
+This model card is for the paper [Annotation-Efficient Universal Honesty Alignment](https://arxiv.org/abs/2510.17509).
+For code and further details, visit the official [GitHub repository](https://github.com/Trustworthy-Information-Access/Annotation-Efficient-Universal-Honesty-Alignment).
+## Abstract
+Honesty alignment-the ability of large language models (LLMs) to recognize their knowledge boundaries and express calibrated confidence-is essential for trustworthy deployment. Existing methods either rely on training-free confidence estimation (e.g., token probabilities, self-consistency) or training-based calibration with correctness annotations. While effective, achieving universal honesty alignment with training-based calibration requires costly, large-scale labeling. To support annotation-efficient training, we introduce Elicitation-Then-Calibration (EliCal), a two-stage framework that first elicits internal confidence using inexpensive self-consistency supervision, then calibrates this confidence with a small set of correctness annotations. To support a large-scale study, we release HonestyBench, a benchmark covering ten free-form QA datasets with 560k training and 70k evaluation instances annotated with correctness and self-consistency signals. Experiments show that EliCal achieves near-optimal alignment with only 1k correctness annotations (0.18% of full supervision) and better alignment performance on unseen MMLU tasks than the calibration-only baseline, offering a scalable solution toward universal honesty alignment in LLMs.
+## Authors
+- Shiyu Ni (Shiyunee)
+- Keping Bi
+- Jiafeng Guo
+- Minghao Tang (StudentTang)
+- Jingtong Wu
+- Zengxin Han
+- Xueqi Cheng
+# Introduction
 This repository provides modules that extend **Llama3-8B-Instruct** with the ability to generate accurate confidence scores *before* response generation, indicating how likely the model is to answer a given question correctly across tasks. We offer two types of modules—**LoRA + Linear Head** and **Linear Head**—along with model parameters under three training settings:
 /mlp
 ...
+```
+# Evaluation
+After the training and confidence score prediction are completed, you can use the following command to perform score aggregation and evaluation.
+```sh
+bash run_eval.sh
+```
+This will compute **AUROC**, **ECE**, and **Alignment** simultaneously and save the results into three Excel files.
+Each Excel file contains **12 rows** (depending on how many training data sizes are evaluated). From top to bottom, each row corresponds to a different **amount of labeled data used during training**, in the same order as specified in your evaluation input. Each file has **5 columns**, which from left to right represent:
+| Column   | Meaning                             |
+| -------- | ----------------------------------- |
+| N-Prob   | Normalized probability baseline     |
+| Cons-Sem | Consistency with semantic agreement |
+| Eli-Only | Elicitation-only model              |
+| Cal-Only | Calibration-only model              |
+| EliCal   | Full EliCal method                  |
+If you see `_mlp` at the end of a script name, it means that only a classification head is added to the model without using LoRA. This is not the main focus of the paper, but simply an ablation study.
+## Results
+Experiments show that **EliCal achieves near-optimal alignment** **with only 1k correctness annotations** (∼0.18% of full supervision) and better alignment performance on unseen MMLU tasks than the calibration-only baseline, offering a scalable solution toward universal honesty alignment in LLMs.
+> We **provide all the plotting functions** used in the paper in `honesty_alignment/draw.py`.
+>
+> For more details, please refer to our paper.
+## Citation
+If you find our repository useful, please consider giving it a star. 🚀✨. Please cite the paper if you find our work helpful.