Meertje / README.md
nikkibyr's picture
Update README.md
53b94c1 verified
metadata
license: cc
language:
  - nl
metrics:
  - accuracy
base_model:
  - GroNLP/bert-base-dutch-cased
pipeline_tag: text-classification
tags:
  - dialect
  - low-resource languages

Model Description

Meertje is intended as a dialect classifier, developed at the Meertens Institute to distinguish between dialect material and Standard Dutch. The model was trained on the Dialect Novel Corpus, using a subcorpus of linguistic material from Drenthe.

Intended Use

Isolating dialect material from Dutch texts containing both dialect and Standard Dutch.

Training Data

Sentences containing Drents vs. Standard Dutch sentences. Balanced train/dev/test at 2122/730/730.

Evaluation

Material F1 (weighted avg) support
Test set (Drents) 0.95 730
Drents 0.95 7362
Gronings 0.94 605
Twents 0.98 1496
Zeeuws-Vlaams 0.90 3231

Further Resources

Background article on the Meertje-project (NL or Eng)

Finetuning script (Colab Notebook)

Usage script (Colab Notebook)