itay1itzhak
/

T5-Flan-Seed-1

@@ -1,16 +1,16 @@
 ---
 license: apache-2.0
 tags:
 - language-modeling
 - causal-lm
 - bias-analysis
 - cognitive-bias
-language:
-- en
-base_model:
-- google/t5-v1_1-xxl
-pipeline_tag: text2text-generation
-library_name: transformers
 ---
 # Model Card for T5-Flan
@@ -23,12 +23,13 @@ This 🤗 Transformers model was finetuned using LoRA adapters for the arXiv pap
 We study whether cognitive biases in LLMs emerge from pretraining, instruction tuning, or training randomness.
 This is one of 3 identical versions trained with different random seeds.
-- **Model type**: Causal decoder-based transformer
-- **Language(s)**: English
-- **License**: Apache 2.0
-- **Finetuned from**: `google/t5-v1_1-xxl`
-- **Paper**: https://arxiv.org/abs/2507.07186
-- **Repository**: https://github.com/itay1itzhak/planted-in-pretraining
 ## Uses
@@ -53,26 +54,26 @@ print(tokenizer.decode(outputs[0]))
 ## Training Details
-- Finetuning method: LoRA (high-rank, rank ∈ [64, 512])
-- Instruction data: Flan (350K)
-- Seeds: 3 per setting to evaluate randomness effects
-- Batch size: 128 (OLMo) / 64 (T5)
-- Learning rate: 1e-6 to 1e-3
-- Steps: ~5.5k (OLMo) / ~16k (T5)
-- Mixed precision: fp16 (OLMo) / bf16 (T5)
 ## Evaluation
-- Evaluated on 32 cognitive biases from Itzhak et al. (2024) and Malberg et al. (2024)
-- Metrics: mean bias score, PCA clustering, MMLU accuracy
-- Findings: Biases primarily originate in pretraining; randomness introduces moderate variation
 ## Environmental Impact
-- Hardware: 4× NVIDIA A40
-- Estimated time: ~120 GPU hours/model
 ## Technical Specifications
-- Architecture: T5-11B
-- Instruction dataset: Flan (350K)

 ---
+base_model:
+- google/t5-v1_1-xxl
+language:
+- en
+library_name: transformers
 license: apache-2.0
+pipeline_tag: text-generation
 tags:
 - language-modeling
 - causal-lm
 - bias-analysis
 - cognitive-bias
 ---
 # Model Card for T5-Flan
 We study whether cognitive biases in LLMs emerge from pretraining, instruction tuning, or training randomness.
 This is one of 3 identical versions trained with different random seeds.
+-   **Model type**: Causal decoder-based transformer
+-   **Language(s)**: English
+-   **License**: Apache 2.0
+-   **Finetuned from**: `google/t5-v1_1-xxl`
+-   **Paper**: https://arxiv.org/abs/2507.07186
+-   **Repository**: https://github.com/itay1itzhak/planted-in-pretraining
+-   **Project Page**: https://itay1itzhak.github.io/planted-in-pretraining/
 ## Uses
 ## Training Details
+-   Finetuning method: LoRA (high-rank, rank ∈ [64, 512])
+-   Instruction data: Flan (350K)
+-   Seeds: 3 per setting to evaluate randomness effects
+-   Batch size: 128 (OLMo) / 64 (T5)
+-   Learning rate: 1e-6 to 1e-3
+-   Steps: ~5.5k (OLMo) / ~16k (T5)
+-   Mixed precision: fp16 (OLMo) / bf16 (T5)
 ## Evaluation
+-   Evaluated on 32 cognitive biases from Itzhak et al. (2024) and Malberg et al. (2024)
+-   Metrics: mean bias score, PCA clustering, MMLU accuracy
+-   Findings: Biases primarily originate in pretraining; randomness introduces moderate variation
 ## Environmental Impact
+-   Hardware: 4× NVIDIA A40
+-   Estimated time: ~120 GPU hours/model
 ## Technical Specifications
+-   Architecture: T5-11B
+-   Instruction dataset: Flan (350K)