Add missing metadata

#2
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -1,5 +1,7 @@
1
  ---
2
  license: mit
 
 
3
  ---
4
 
5
  This repository contains a fastText pretraining data filter targeting the LAMBADA task, as discussed in the paper [Improving Pretraining Data Using Perplexity Correlations](https://arxiv.org/abs/2409.05816). This filter selects high-quality pretraining data based on correlations between LLM perplexity and downstream benchmark performance.
 
1
  ---
2
  license: mit
3
+ library_name: fasttext
4
+ pipeline_tag: text-classification
5
  ---
6
 
7
  This repository contains a fastText pretraining data filter targeting the LAMBADA task, as discussed in the paper [Improving Pretraining Data Using Perplexity Correlations](https://arxiv.org/abs/2409.05816). This filter selects high-quality pretraining data based on correlations between LLM perplexity and downstream benchmark performance.