Commit
·
5d71a07
1
Parent(s):
7d73e03
Update README.md
Browse files
README.md
CHANGED
|
@@ -6,6 +6,36 @@ https://github.com/martiansideofthemoon/rankgen
|
|
| 6 |
|
| 7 |
RankGen is a suite of encoder models (100M-1.2B parameters) which map prefixes and generations from any pretrained English language model to a shared vector space. RankGen can be used to rerank multiple full-length samples from an LM, and it can also be incorporated as a scoring function into beam search to significantly improve generation quality (0.85 vs 0.77 MAUVE, 75% preference according to humans annotators who are English writers). RankGen can also be used like a dense retriever, and achieves state-of-the-art performance on [literary retrieval](https://relic.cs.umass.edu/leaderboard.html).
|
| 8 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
## Using RankGen
|
| 10 |
|
| 11 |
Loading RankGen is simple using the HuggingFace APIs (see Method-2 below), but we suggest using [`RankGenEncoder`](https://github.com/martiansideofthemoon/rankgen/blob/master/rankgen/rankgen_encoder.py), which is a small wrapper around the HuggingFace APIs for correctly preprocessing data and doing tokenization automatically. You can either download [our repository](https://github.com/martiansideofthemoon/rankgen) and install the API, or copy the implementation from [below](#rankgenencoder-implementation).
|
|
|
|
| 6 |
|
| 7 |
RankGen is a suite of encoder models (100M-1.2B parameters) which map prefixes and generations from any pretrained English language model to a shared vector space. RankGen can be used to rerank multiple full-length samples from an LM, and it can also be incorporated as a scoring function into beam search to significantly improve generation quality (0.85 vs 0.77 MAUVE, 75% preference according to humans annotators who are English writers). RankGen can also be used like a dense retriever, and achieves state-of-the-art performance on [literary retrieval](https://relic.cs.umass.edu/leaderboard.html).
|
| 8 |
|
| 9 |
+
## Setup
|
| 10 |
+
|
| 11 |
+
**Requirements** (`pip` will install these dependencies for you)
|
| 12 |
+
|
| 13 |
+
Python 3.7+, `torch` (CUDA recommended), `transformers`
|
| 14 |
+
|
| 15 |
+
**Installation**
|
| 16 |
+
|
| 17 |
+
```
|
| 18 |
+
python3.7 -m virtualenv rankgen-venv
|
| 19 |
+
source rankgen-venv/bin/activate
|
| 20 |
+
pip install rankgen
|
| 21 |
+
```
|
| 22 |
+
|
| 23 |
+
Get the data [here](https://drive.google.com/drive/folders/1DRG2ess7fK3apfB-6KoHb_azMuHbsIv4?usp=sharing) and place folder in root directory. Alternatively, use `gdown` as shown below,
|
| 24 |
+
|
| 25 |
+
```
|
| 26 |
+
gdown --folder https://drive.google.com/drive/folders/1DRG2ess7fK3apfB-6KoHb_azMuHbsIv4
|
| 27 |
+
```
|
| 28 |
+
|
| 29 |
+
Run the test script to make sure the RankGen checkpoint has loaded correctly,
|
| 30 |
+
|
| 31 |
+
```
|
| 32 |
+
python -m rankgen.test_rankgen_encoder --model_path kalpeshk2011/rankgen-t5-base-all
|
| 33 |
+
|
| 34 |
+
### Expected output
|
| 35 |
+
0.0009239262409127233
|
| 36 |
+
0.0011521980725477804
|
| 37 |
+
```
|
| 38 |
+
|
| 39 |
## Using RankGen
|
| 40 |
|
| 41 |
Loading RankGen is simple using the HuggingFace APIs (see Method-2 below), but we suggest using [`RankGenEncoder`](https://github.com/martiansideofthemoon/rankgen/blob/master/rankgen/rankgen_encoder.py), which is a small wrapper around the HuggingFace APIs for correctly preprocessing data and doing tokenization automatically. You can either download [our repository](https://github.com/martiansideofthemoon/rankgen) and install the API, or copy the implementation from [below](#rankgenencoder-implementation).
|