Update pipeline tag and add project page link (#1)
Browse files- Update pipeline tag and add project page link (f888dc22ee3eaac095c470f5c41310e7e7d68dee)
Co-authored-by: Niels Rogge <[email protected]>
README.md
CHANGED
|
@@ -1,16 +1,16 @@
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
license: apache-2.0
|
|
|
|
| 3 |
tags:
|
| 4 |
- language-modeling
|
| 5 |
- causal-lm
|
| 6 |
- bias-analysis
|
| 7 |
- cognitive-bias
|
| 8 |
-
language:
|
| 9 |
-
- en
|
| 10 |
-
base_model:
|
| 11 |
-
- google/t5-v1_1-xxl
|
| 12 |
-
pipeline_tag: text2text-generation
|
| 13 |
-
library_name: transformers
|
| 14 |
---
|
| 15 |
|
| 16 |
# Model Card for T5-Flan
|
|
@@ -23,12 +23,13 @@ This 🤗 Transformers model was finetuned using LoRA adapters for the arXiv pap
|
|
| 23 |
We study whether cognitive biases in LLMs emerge from pretraining, instruction tuning, or training randomness.
|
| 24 |
This is one of 3 identical versions trained with different random seeds.
|
| 25 |
|
| 26 |
-
-
|
| 27 |
-
-
|
| 28 |
-
-
|
| 29 |
-
-
|
| 30 |
-
-
|
| 31 |
-
-
|
|
|
|
| 32 |
|
| 33 |
## Uses
|
| 34 |
|
|
@@ -53,26 +54,26 @@ print(tokenizer.decode(outputs[0]))
|
|
| 53 |
|
| 54 |
## Training Details
|
| 55 |
|
| 56 |
-
-
|
| 57 |
-
-
|
| 58 |
-
-
|
| 59 |
-
-
|
| 60 |
-
-
|
| 61 |
-
-
|
| 62 |
-
-
|
| 63 |
|
| 64 |
## Evaluation
|
| 65 |
|
| 66 |
-
-
|
| 67 |
-
-
|
| 68 |
-
-
|
| 69 |
|
| 70 |
## Environmental Impact
|
| 71 |
|
| 72 |
-
-
|
| 73 |
-
-
|
| 74 |
|
| 75 |
## Technical Specifications
|
| 76 |
|
| 77 |
-
-
|
| 78 |
-
-
|
|
|
|
| 1 |
---
|
| 2 |
+
base_model:
|
| 3 |
+
- google/t5-v1_1-xxl
|
| 4 |
+
language:
|
| 5 |
+
- en
|
| 6 |
+
library_name: transformers
|
| 7 |
license: apache-2.0
|
| 8 |
+
pipeline_tag: text-generation
|
| 9 |
tags:
|
| 10 |
- language-modeling
|
| 11 |
- causal-lm
|
| 12 |
- bias-analysis
|
| 13 |
- cognitive-bias
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
---
|
| 15 |
|
| 16 |
# Model Card for T5-Flan
|
|
|
|
| 23 |
We study whether cognitive biases in LLMs emerge from pretraining, instruction tuning, or training randomness.
|
| 24 |
This is one of 3 identical versions trained with different random seeds.
|
| 25 |
|
| 26 |
+
- **Model type**: Causal decoder-based transformer
|
| 27 |
+
- **Language(s)**: English
|
| 28 |
+
- **License**: Apache 2.0
|
| 29 |
+
- **Finetuned from**: `google/t5-v1_1-xxl`
|
| 30 |
+
- **Paper**: https://arxiv.org/abs/2507.07186
|
| 31 |
+
- **Repository**: https://github.com/itay1itzhak/planted-in-pretraining
|
| 32 |
+
- **Project Page**: https://itay1itzhak.github.io/planted-in-pretraining/
|
| 33 |
|
| 34 |
## Uses
|
| 35 |
|
|
|
|
| 54 |
|
| 55 |
## Training Details
|
| 56 |
|
| 57 |
+
- Finetuning method: LoRA (high-rank, rank ∈ [64, 512])
|
| 58 |
+
- Instruction data: Flan (350K)
|
| 59 |
+
- Seeds: 3 per setting to evaluate randomness effects
|
| 60 |
+
- Batch size: 128 (OLMo) / 64 (T5)
|
| 61 |
+
- Learning rate: 1e-6 to 1e-3
|
| 62 |
+
- Steps: ~5.5k (OLMo) / ~16k (T5)
|
| 63 |
+
- Mixed precision: fp16 (OLMo) / bf16 (T5)
|
| 64 |
|
| 65 |
## Evaluation
|
| 66 |
|
| 67 |
+
- Evaluated on 32 cognitive biases from Itzhak et al. (2024) and Malberg et al. (2024)
|
| 68 |
+
- Metrics: mean bias score, PCA clustering, MMLU accuracy
|
| 69 |
+
- Findings: Biases primarily originate in pretraining; randomness introduces moderate variation
|
| 70 |
|
| 71 |
## Environmental Impact
|
| 72 |
|
| 73 |
+
- Hardware: 4× NVIDIA A40
|
| 74 |
+
- Estimated time: ~120 GPU hours/model
|
| 75 |
|
| 76 |
## Technical Specifications
|
| 77 |
|
| 78 |
+
- Architecture: T5-11B
|
| 79 |
+
- Instruction dataset: Flan (350K)
|