Fill-Mask
Transformers
Safetensors
modernbert
masked-lm
long-context
mange commited on
Commit
9932546
·
verified ·
1 Parent(s): cfe44bb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -0
README.md CHANGED
@@ -18,6 +18,8 @@ base_model: answerdotai/ModernBERT-large
18
  ## Overview
19
  This checkpoint continues the pre-training of [answerdotai/ModernBERT-large](https://huggingface.co/answerdotai/ModernBERT-large) on Scandinavian text, extending the model’s knowledge with ~1.2 trillion additional masked-language-model (MLM) tokens drawn from [The Nordic Pile](https://arxiv.org/pdf/2303.17183) and [SWEb](https://arxiv.org/pdf/2410.04456) while preserving the original 8k token context window.
20
 
 
 
21
  Our tokenizer is trained from scratch on a subset of 11 985 103 472 tokens.
22
 
23
  The training is done in one stage with 8192 tokens per sample for the whole run.
@@ -63,6 +65,7 @@ See training details [here](https://github.com/timpal0l/ModernBERT/blob/main/tra
63
  Train lr-StableAdamW/group1: 0.0000
64
  ```
65
  ## Intended Use
 
66
  * Fill-mask inference, embedding extraction and fine-tuning for Scandinavian downstream NLP tasks (classification, NER, QA, etc.).
67
  * Drop-in replacement for BERT-style encoders (omit `token_type_ids`).
68
  ## Fill-mask
 
18
  ## Overview
19
  This checkpoint continues the pre-training of [answerdotai/ModernBERT-large](https://huggingface.co/answerdotai/ModernBERT-large) on Scandinavian text, extending the model’s knowledge with ~1.2 trillion additional masked-language-model (MLM) tokens drawn from [The Nordic Pile](https://arxiv.org/pdf/2303.17183) and [SWEb](https://arxiv.org/pdf/2410.04456) while preserving the original 8k token context window.
20
 
21
+ This is a **research artefact** and is only intended for **research purposes**.
22
+
23
  Our tokenizer is trained from scratch on a subset of 11 985 103 472 tokens.
24
 
25
  The training is done in one stage with 8192 tokens per sample for the whole run.
 
65
  Train lr-StableAdamW/group1: 0.0000
66
  ```
67
  ## Intended Use
68
+ This is a **research artefact** and is only intended for **research purposes**.
69
  * Fill-mask inference, embedding extraction and fine-tuning for Scandinavian downstream NLP tasks (classification, NER, QA, etc.).
70
  * Drop-in replacement for BERT-style encoders (omit `token_type_ids`).
71
  ## Fill-mask