pythia-6.9b-deduped-4k / README.md

jon-tow

Create README.md

8b48bea over 2 years ago

preview code

raw

history blame

571 Bytes

metadata

license: apache-2.0
datasets:
  - EleutherAI/the_pile_deduplicated
language:
  - en

Pythia-6.9B Deduped 4K is a Pythia-6.9B Deduped model fine-tuned with a 4096 context length. Training resumed from their 143,000 step checkpoint and continued on The Pile v1 Deduped (threshold=0.87). This particular model is from a checkpoint captured at step 175,500 for an extra 134,217,728,000 tokens of training.

Note: Sequence length warmup was not used to move up from 2048 but, in hindsight, should have been applied.