Update README.md
Browse files
README.md
CHANGED
|
@@ -9,6 +9,7 @@ datasets:
|
|
| 9 |
- allenai/c4
|
| 10 |
language:
|
| 11 |
- ja
|
|
|
|
| 12 |
---
|
| 13 |
|
| 14 |
# What’s this?
|
|
@@ -51,6 +52,10 @@ model = AutoModelForTokenClassification.from_pretrained(model_name)
|
|
| 51 |
|
| 52 |
本家の DeBERTa V3 は大きな語彙数で学習されていることに特徴がありますが、反面埋め込み層のパラメータ数が大きくなりすぎる ([microsoft/deberta-v3-base](https://huggingface.co/microsoft/deberta-v3-base) モデルの場合で埋め込み層が全体の 54%) ことから、本モデルでは小さめの語彙数を採用しています。
|
| 53 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 54 |
---
|
| 55 |
The tokenizer is trained using [the method introduced by Kudo](https://qiita.com/taku910/items/fbaeab4684665952d5a9).
|
| 56 |
|
|
@@ -62,6 +67,10 @@ Key points include:
|
|
| 62 |
|
| 63 |
Although the original DeBERTa V3 is characterized by a large vocabulary size, which can result in a significant increase in the number of parameters in the embedding layer (for the [microsoft/deberta-v3-base](https://huggingface.co/microsoft/deberta-v3-base) model, the embedding layer accounts for 54% of the total), this model adopts a smaller vocabulary size to address this.
|
| 64 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 65 |
# Data
|
| 66 |
| Dataset Name | Notes | File Size (with metadata) | Factor |
|
| 67 |
| ------------- | ----- | ------------------------- | ---------- |
|
|
@@ -83,6 +92,7 @@ Although the original DeBERTa V3 is characterized by a large vocabulary size, wh
|
|
| 83 |
- Training steps: 1,000,000
|
| 84 |
- Warmup steps: 100,000
|
| 85 |
- Precision: Mixed (fp16)
|
|
|
|
| 86 |
|
| 87 |
# Evaluation
|
| 88 |
| Model | #params | JSTS | JNLI | JSQuAD | JCQA |
|
|
|
|
| 9 |
- allenai/c4
|
| 10 |
language:
|
| 11 |
- ja
|
| 12 |
+
library_name: transformers
|
| 13 |
---
|
| 14 |
|
| 15 |
# What’s this?
|
|
|
|
| 52 |
|
| 53 |
本家の DeBERTa V3 は大きな語彙数で学習されていることに特徴がありますが、反面埋め込み層のパラメータ数が大きくなりすぎる ([microsoft/deberta-v3-base](https://huggingface.co/microsoft/deberta-v3-base) モデルの場合で埋め込み層が全体の 54%) ことから、本モデルでは小さめの語彙数を採用しています。
|
| 54 |
|
| 55 |
+
注意点として、 `xsmall` 、 `base` 、 `large` の 3 つのモデルのうち、前者二つは unigram アルゴリズムで学習しているが、 `large` モデルのみ BPE アルゴリズムで学習している。
|
| 56 |
+
深い理由はなく、 `large` モデルのみ語彙サイズを増やすために独立して学習を行ったが、なぜか unigram アルゴリズムでの学習がうまくいかなかったことが原因である。
|
| 57 |
+
原因の探究よりモデルの完成を優先して、 BPE アルゴリズムに切り替えた。
|
| 58 |
+
|
| 59 |
---
|
| 60 |
The tokenizer is trained using [the method introduced by Kudo](https://qiita.com/taku910/items/fbaeab4684665952d5a9).
|
| 61 |
|
|
|
|
| 67 |
|
| 68 |
Although the original DeBERTa V3 is characterized by a large vocabulary size, which can result in a significant increase in the number of parameters in the embedding layer (for the [microsoft/deberta-v3-base](https://huggingface.co/microsoft/deberta-v3-base) model, the embedding layer accounts for 54% of the total), this model adopts a smaller vocabulary size to address this.
|
| 69 |
|
| 70 |
+
Note that, among the three models: xsmall, base, and large, the first two were trained using the unigram algorithm, while only the large model was trained using the BPE algorithm.
|
| 71 |
+
The reason for this is simple: while the large model was independently trained to increase its vocabulary size, for some reason, training with the unigram algorithm was not successful.
|
| 72 |
+
Thus, prioritizing the completion of the model over investigating the cause, we switched to the BPE algorithm.
|
| 73 |
+
|
| 74 |
# Data
|
| 75 |
| Dataset Name | Notes | File Size (with metadata) | Factor |
|
| 76 |
| ------------- | ----- | ------------------------- | ---------- |
|
|
|
|
| 92 |
- Training steps: 1,000,000
|
| 93 |
- Warmup steps: 100,000
|
| 94 |
- Precision: Mixed (fp16)
|
| 95 |
+
- Vocabulary size: 32,000
|
| 96 |
|
| 97 |
# Evaluation
|
| 98 |
| Model | #params | JSTS | JNLI | JSQuAD | JCQA |
|