Update README.md with new model card content
Browse files
README.md
CHANGED
|
@@ -1,20 +1,53 @@
|
|
| 1 |
---
|
| 2 |
library_name: keras-hub
|
| 3 |
---
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
library_name: keras-hub
|
| 3 |
---
|
| 4 |
+
### Model Overview
|
| 5 |
+
⚠️ T5 is currently only available via the `keras-hub-nightly` package. Use `pip install keras-hub-nightly` to try this model.
|
| 6 |
+
|
| 7 |
+
T5 encoder-decoder backbone model.
|
| 8 |
+
|
| 9 |
+
T5 is a LLM pretrained on a mix of unsupervised and supervised tasks,
|
| 10 |
+
where each task is converted to a sequence-to-sequence format.
|
| 11 |
+
T5 works well on a variety of tasks out-of-the-box by prepending
|
| 12 |
+
various prefixex to the input sequence, e.g., for translation:
|
| 13 |
+
`"translate English to German: ..."`, for summarization:
|
| 14 |
+
`"summarize: ..."`.
|
| 15 |
+
|
| 16 |
+
T5 was introduced in
|
| 17 |
+
[Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683)
|
| 18 |
+
|
| 19 |
+
The default constructor gives a fully customizable, randomly initialized T5
|
| 20 |
+
model with any number of layers, heads, and embedding dimensions. To load
|
| 21 |
+
preset architectures and weights, use the `from_preset` constructor.
|
| 22 |
+
|
| 23 |
+
Disclaimer: Pre-trained models are provided on an "as is" basis, without
|
| 24 |
+
warranties or conditions of any kind.
|
| 25 |
+
|
| 26 |
+
|
| 27 |
+
__Arguments__
|
| 28 |
+
|
| 29 |
+
|
| 30 |
+
- __vocabulary_size__: int. The size of the token vocabulary.
|
| 31 |
+
- __num_layers__: int. The number of Transformer layers.
|
| 32 |
+
- __num_heads__: int. The number of attention heads for each Transformer.
|
| 33 |
+
The hidden size must be divisible by the number of attention heads.
|
| 34 |
+
- __hidden_dim__: int. The hidden size of the Transformer layers.
|
| 35 |
+
- __intermediate_dim__: int. The output dimension of the first Dense layer in
|
| 36 |
+
a two-layer feedforward network for each Transformer layer.
|
| 37 |
+
- __key_value_dim__: int. The dimension of each head of the key/value
|
| 38 |
+
projections in the multi-head attention layers. Defaults to
|
| 39 |
+
hidden_dim / num_heads.
|
| 40 |
+
- __dropout__: float. Dropout probability for the Transformer layers.
|
| 41 |
+
- __activation__: activation function (or activation string name). The
|
| 42 |
+
activation to be used in the inner dense blocks of the
|
| 43 |
+
Transformer layers. Defaults to `"relu"`.
|
| 44 |
+
- __use_gated_activation__: boolean. Whether to use activation gating in
|
| 45 |
+
the inner dense blocks of the Transformer layers.
|
| 46 |
+
The original T5 architecture didn't use gating, but more
|
| 47 |
+
recent versions do. Defaults to `True`.
|
| 48 |
+
- __layer_norm_epsilon__: float. Epsilon factor to be used in the
|
| 49 |
+
layer normalization layers in the Transformer layers.
|
| 50 |
+
- __tie_embedding_weights__: boolean. If `True`, the weights of the token
|
| 51 |
+
embedding and the weights projecting language model outputs from
|
| 52 |
+
`hidden_dim`
|
| 53 |
+
|