bertin-project
/

bertin-roberta-base-spanish

@@ -10,8 +10,8 @@ widget:
 ---
 - [Version beta](https://huggingface.co/bertin-project/bertin-roberta-base-spanish/tree/beta): July 15th, 2021
-- [Version 1.0](https://huggingface.co/bertin-project/bertin-roberta-base-spanish/tree/1.0) (current): July 26th, 2021
 # BERTIN
@@ -252,7 +252,7 @@ In addition to the tasks above, we also trained the [`beta`](https://huggingface
 Results for PAWS-X seem surprising given the large differences in performance. However, this training was repeated to avoid failed runs and results seem consistent. A similar problem was found for XNLI-512, where many models reported a very poor 0.3333 accuracy on a first run (and even a second, in the case of BSC-BNE). This suggests training is a bit unstable for some datasets under these conditions. Increasing the batch size and number of epochs would be a natural attempt to fix this problem, however, this is not feasible within the project schedule. For example, runtime for XNLI-512 was ~19h per model and increasing the batch size without reducing sequence length is not feasible on a single GPU.
-We are also releasing the fine-tuned models for `Gaussian`-512 and making it our version 1.0 (current).
 - POS: [`bertin-project/bertin-base-pos-conll2002-es`](https://huggingface.co/bertin-project/bertin-base-pos-conll2002-es/)
 - NER: [`bertin-project/bertin-base-ner-conll2002-es`](https://huggingface.co/bertin-project/bertin-base-ner-conll2002-es/)

 ---
 - [Version beta](https://huggingface.co/bertin-project/bertin-roberta-base-spanish/tree/beta): July 15th, 2021
+- [Version v1](https://huggingface.co/bertin-project/bertin-roberta-base-spanish/tree/v1): July 26th, 2021
+- [Version v1-512](https://huggingface.co/bertin-project/bertin-roberta-base-spanish/tree/v1-512): July 26th, 2021
 # BERTIN
 Results for PAWS-X seem surprising given the large differences in performance. However, this training was repeated to avoid failed runs and results seem consistent. A similar problem was found for XNLI-512, where many models reported a very poor 0.3333 accuracy on a first run (and even a second, in the case of BSC-BNE). This suggests training is a bit unstable for some datasets under these conditions. Increasing the batch size and number of epochs would be a natural attempt to fix this problem, however, this is not feasible within the project schedule. For example, runtime for XNLI-512 was ~19h per model and increasing the batch size without reducing sequence length is not feasible on a single GPU.
+We are also releasing the fine-tuned models for `Gaussian`-512 and making it our version v1 default to 128 sequence length since it experimentally shows better performance on fill-mask task, while alse releasing the 512 sequence length version (v1-512) for fine-tuning.
 - POS: [`bertin-project/bertin-base-pos-conll2002-es`](https://huggingface.co/bertin-project/bertin-base-pos-conll2002-es/)
 - NER: [`bertin-project/bertin-base-ner-conll2002-es`](https://huggingface.co/bertin-project/bertin-base-ner-conll2002-es/)