fix notebook link
Browse files
README.md
CHANGED
|
@@ -43,17 +43,17 @@ From a *practical* point of view, [Failspy](https://huggingface.co/failspy) show
|
|
| 43 |
|
| 44 |
Inspired by Failspy's work, I adapted the approach to the rap use case.
|
| 45 |
|
| 46 |
-
π [Notebook: Steer Llama to respond with a rap style](
|
| 47 |
|
| 48 |
π£ Steps
|
| 49 |
1. Load the Llama-3-8B-Instruct model.
|
| 50 |
2. Load 1024 examples from Alpaca (instruction dataset).
|
| 51 |
3. Prepare a system prompt to make the model act like a rapper.
|
| 52 |
4. Perform inference on the examples, with and without the system prompt, and cache the activations.
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
|
| 58 |
## π§ Limitations of this approach
|
| 59 |
(Maybe a trivial observation)
|
|
|
|
| 43 |
|
| 44 |
Inspired by Failspy's work, I adapted the approach to the rap use case.
|
| 45 |
|
| 46 |
+
π [Notebook: Steer Llama to respond with a rap style](steer_llama_to_rap_style.ipynb)
|
| 47 |
|
| 48 |
π£ Steps
|
| 49 |
1. Load the Llama-3-8B-Instruct model.
|
| 50 |
2. Load 1024 examples from Alpaca (instruction dataset).
|
| 51 |
3. Prepare a system prompt to make the model act like a rapper.
|
| 52 |
4. Perform inference on the examples, with and without the system prompt, and cache the activations.
|
| 53 |
+
5. Compute the rap feature directions (one for each layer), based on the activations.
|
| 54 |
+
6. Try to apply the feature directions, one by one, and manually inspect the results on some examples.
|
| 55 |
+
7. Select the best-performing feature direction.
|
| 56 |
+
8. Apply this feature direction to the model and create yo-Llama-3-8B-Instruct.
|
| 57 |
|
| 58 |
## π§ Limitations of this approach
|
| 59 |
(Maybe a trivial observation)
|