globis-university
/

deberta-v3-japanese-base

@@ -9,6 +9,7 @@ datasets:
 - allenai/c4
 language:
 - ja
 ---
 # What’s this?
@@ -51,6 +52,10 @@ model = AutoModelForTokenClassification.from_pretrained(model_name)
 本家の DeBERTa V3 は大きな語彙数で学習されていることに特徴がありますが、反面埋め込み層のパラメータ数が大きくなりすぎる ([microsoft/deberta-v3-base](https://huggingface.co/microsoft/deberta-v3-base) モデルの場合で埋め込み層が全体の 54%) ことから、本モデルでは小さめの語彙数を採用しています。
 ---
 The tokenizer is trained using [the method introduced by Kudo](https://qiita.com/taku910/items/fbaeab4684665952d5a9).
@@ -62,6 +67,10 @@ Key points include:
 Although the original DeBERTa V3 is characterized by a large vocabulary size, which can result in a significant increase in the number of parameters in the embedding layer (for the [microsoft/deberta-v3-base](https://huggingface.co/microsoft/deberta-v3-base) model, the embedding layer accounts for 54% of the total), this model adopts a smaller vocabulary size to address this.
 # Data
 | Dataset Name  | Notes | File Size (with metadata) | Factor |
 | ------------- | ----- | ------------------------- | ---------- |
@@ -83,6 +92,7 @@ Although the original DeBERTa V3 is characterized by a large vocabulary size, wh
 - Training steps: 1,000,000
 - Warmup steps: 100,000
 - Precision: Mixed (fp16)
 # Evaluation
 | Model | #params | JSTS | JNLI | JSQuAD | JCQA |

 - allenai/c4
 language:
 - ja
+library_name: transformers
 ---
 # What’s this?
 本家の DeBERTa V3 は大きな語彙数で学習されていることに特徴がありますが、反面埋め込み層のパラメータ数が大きくなりすぎる ([microsoft/deberta-v3-base](https://huggingface.co/microsoft/deberta-v3-base) モデルの場合で埋め込み層が全体の 54%) ことから、本モデルでは小さめの語彙数を採用しています。
+注意点として、 `xsmall` 、 `base` 、 `large` の 3 つのモデルのうち、前者二つは unigram アルゴリズムで学習しているが、 `large` モデルのみ BPE アルゴリズムで学習している。
+深い理由はなく、  `large` モデルのみ語彙サイズを増やすために独立して学習を行ったが、なぜか unigram アルゴリズムでの学習がうまくいかなかったことが原因である。
+原因の探究よりモデルの完成を優先して、 BPE アルゴリズムに切り替えた。
 ---
 The tokenizer is trained using [the method introduced by Kudo](https://qiita.com/taku910/items/fbaeab4684665952d5a9).
 Although the original DeBERTa V3 is characterized by a large vocabulary size, which can result in a significant increase in the number of parameters in the embedding layer (for the [microsoft/deberta-v3-base](https://huggingface.co/microsoft/deberta-v3-base) model, the embedding layer accounts for 54% of the total), this model adopts a smaller vocabulary size to address this.
+Note that, among the three models: xsmall, base, and large, the first two were trained using the unigram algorithm, while only the large model was trained using the BPE algorithm.
+The reason for this is simple: while the large model was independently trained to increase its vocabulary size, for some reason, training with the unigram algorithm was not successful.
+Thus, prioritizing the completion of the model over investigating the cause, we switched to the BPE algorithm.
 # Data
 | Dataset Name  | Notes | File Size (with metadata) | Factor |
 | ------------- | ----- | ------------------------- | ---------- |
 - Training steps: 1,000,000
 - Warmup steps: 100,000
 - Precision: Mixed (fp16)
+- Vocabulary size: 32,000
 # Evaluation
 | Model | #params | JSTS | JNLI | JSQuAD | JCQA |