Update README.md
Browse files
README.md
CHANGED
|
@@ -15,8 +15,6 @@ library_name: transformers
|
|
| 15 |
|
| 16 |
The implementation of the paper Differential Layer Skipping in Large Language Models.
|
| 17 |
|
| 18 |
-
## Model Details
|
| 19 |
-
|
| 20 |
### Model Description
|
| 21 |
|
| 22 |
DiffSkip-Llama-3-8B-Instruct is an enhanced version of the Llama-3-8B-Instruct model, incorporating the Differential Layer Skipping (DiffSkip) method to enable dynamic Feed-Forward Network (FFN) skipping during text generation. This approach leverages the self-attention input-output difference as a routing signal, allowing tokens to bypass FFN blocks based on computational needs.
|
|
|
|
| 15 |
|
| 16 |
The implementation of the paper Differential Layer Skipping in Large Language Models.
|
| 17 |
|
|
|
|
|
|
|
| 18 |
### Model Description
|
| 19 |
|
| 20 |
DiffSkip-Llama-3-8B-Instruct is an enhanced version of the Llama-3-8B-Instruct model, incorporating the Differential Layer Skipping (DiffSkip) method to enable dynamic Feed-Forward Network (FFN) skipping during text generation. This approach leverages the self-attention input-output difference as a routing signal, allowing tokens to bypass FFN blocks based on computational needs.
|