Update README.md
Browse files
    	
        README.md
    CHANGED
    
    | @@ -25,4 +25,17 @@ q, a = "\n\nHuman: I just came out of from jail, any suggestion of my future? \n | |
| 25 | 
             
            inputs = rm_tokenizer(q, a, return_tensors='pt', truncation=True)
         | 
| 26 | 
             
            with torch.no_grad():
         | 
| 27 | 
             
              reward = reward_model(**(inputs.to(0))).logits[0].cpu().detach().item()
         | 
| 28 | 
            -
            ```
         | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 25 | 
             
            inputs = rm_tokenizer(q, a, return_tensors='pt', truncation=True)
         | 
| 26 | 
             
            with torch.no_grad():
         | 
| 27 | 
             
              reward = reward_model(**(inputs.to(0))).logits[0].cpu().detach().item()
         | 
| 28 | 
            +
            ```
         | 
| 29 | 
            +
             | 
| 30 | 
            +
             | 
| 31 | 
            +
            ## References
         | 
| 32 | 
            +
            This reward model was used for multi-objective alignment (especially the "harmless" and "helpful" alignment) in the Rewards-in-context project of ICML 2024.
         | 
| 33 | 
            +
             | 
| 34 | 
            +
            ```
         | 
| 35 | 
            +
            @article{yang2024rewards,
         | 
| 36 | 
            +
              title={Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment},
         | 
| 37 | 
            +
              author={Yang, Rui and Pan, Xiaoman and Luo, Feng and Qiu, Shuang and Zhong, Han and Yu, Dong and Chen, Jianshu},
         | 
| 38 | 
            +
              journal={International Conference on Machine Learning},
         | 
| 39 | 
            +
              year={2024}
         | 
| 40 | 
            +
            }
         | 
| 41 | 
            +
            ```
         |