Update README.md
Browse files
README.md
CHANGED
|
@@ -37,7 +37,7 @@ tags:
|
|
| 37 |
---
|
| 38 |
|
| 39 |
## Preprocessing & class imbalance
|
| 40 |
-
Sentences were **lowercased** (no stemming/lemmatization) and tokenized with the base tokenizer from [`bilalzafar/
|
| 41 |
|
| 42 |
---
|
| 43 |
|
|
|
|
| 37 |
---
|
| 38 |
|
| 39 |
## Preprocessing & class imbalance
|
| 40 |
+
Sentences were **lowercased** (no stemming/lemmatization) and tokenized with the base tokenizer from [`bilalzafar/CentralBank-BERT`](https://huggingface.co/bilalzafar/CentralBank-BERT) using **max\_length=320** with truncation and **dynamic padding** via `DataCollatorWithPadding`. To address imbalance, training used *Focal Loss (γ=1.0)* with **class weights** computed from the *train* split (`class_weight="balanced"`) applied in the loss, plus a *WeightedRandomSampler* with √(inverse-frequency) *per-sample weights*.
|
| 41 |
|
| 42 |
---
|
| 43 |
|