keras
/

t5_1.1_xl

KerasHub

Model card Files Files and versions

xet

Community

mattdangerw commited on Dec 24, 2024

Commit

0e603a8

verified ·

1 Parent(s): d15c368

Update README.md with new model card content

Browse files

Files changed (1) hide show

README.md +50 -17

README.md CHANGED Viewed

@@ -1,20 +1,53 @@
 ---
 library_name: keras-hub
 ---
-This is a [`T5` model](https://keras.io/api/keras_hub/models/t5) uploaded using the KerasHub library and can be used with JAX, TensorFlow, and PyTorch backends.
-Model config:
-* **name:** t5_backbone
-* **trainable:** True
-* **vocabulary_size:** 32128
-* **hidden_dim:** 2048
-* **intermediate_dim:** 5120
-* **num_layers:** 24
-* **num_heads:** 32
-* **activation:** gelu
-* **key_value_dim:** 64
-* **dropout:** 0.1
-* **use_gated_activation:** True
-* **layer_norm_epsilon:** 1e-06
-* **tie_embedding_weights:** False
-This model card has been generated automatically and should be completed by the model author. See [Model Cards documentation](https://huggingface.co/docs/hub/model-cards) for more information.

 ---
 library_name: keras-hub
 ---
+### Model Overview
+⚠️ T5 is currently only available via the `keras-hub-nightly` package. Use `pip install keras-hub-nightly` to try this model.
+T5 encoder-decoder backbone model.
+T5 is a LLM pretrained on a mix of unsupervised and supervised tasks,
+where each task is converted to a sequence-to-sequence format.
+T5 works well on a variety of tasks out-of-the-box by prepending
+various prefixex to the input sequence, e.g., for translation:
+`"translate English to German: ..."`, for summarization:
+`"summarize: ..."`.
+T5 was introduced in
+[Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683)
+The default constructor gives a fully customizable, randomly initialized T5
+model with any number of layers, heads, and embedding dimensions. To load
+preset architectures and weights, use the `from_preset` constructor.
+Disclaimer: Pre-trained models are provided on an "as is" basis, without
+warranties or conditions of any kind.
+__Arguments__
+- __vocabulary_size__: int. The size of the token vocabulary.
+- __num_layers__: int. The number of Transformer layers.
+- __num_heads__: int. The number of attention heads for each Transformer.
+    The hidden size must be divisible by the number of attention heads.
+- __hidden_dim__: int. The hidden size of the Transformer layers.
+- __intermediate_dim__: int. The output dimension of the first Dense layer in
+    a two-layer feedforward network for each Transformer layer.
+- __key_value_dim__: int. The dimension of each head of the key/value
+    projections in the multi-head attention layers. Defaults to
+    hidden_dim / num_heads.
+- __dropout__: float. Dropout probability for the Transformer layers.
+- __activation__: activation function (or activation string name). The
+    activation to be used in the inner dense blocks of the
+    Transformer layers. Defaults to `"relu"`.
+- __use_gated_activation__: boolean. Whether to use activation gating in
+    the inner dense blocks of the Transformer layers.
+    The original T5 architecture didn't use gating, but more
+    recent versions do. Defaults to `True`.
+- __layer_norm_epsilon__: float. Epsilon factor to be used in the
+    layer normalization layers in the Transformer layers.
+- __tie_embedding_weights__: boolean. If `True`, the weights of the token
+    embedding and the weights projecting language model outputs from
+    `hidden_dim`