--- license: cc language: - nl metrics: - accuracy base_model: - GroNLP/bert-base-dutch-cased pipeline_tag: text-classification tags: - dialect - low-resource languages --- # Model Description Meertje is intended as a dialect classifier, developed at the Meertens Institute to distinguish between dialect material and Standard Dutch. The model was trained on the Dialect Novel Corpus, using a subcorpus of linguistic material from Drenthe. ## Intended Use Isolating dialect material from Dutch texts containing both dialect and Standard Dutch. ## Training Data Sentences containing Drents vs. Standard Dutch sentences. Balanced train/dev/test at 2122/730/730. ## Evaluation | Material | F1 (weighted avg) | support | | ----------------- | ----------------- | ------- | | Test set (Drents) | 0.95 | 730 | | Drents | 0.95 | 7362 | | Gronings | 0.94 | 605 | | Twents | 0.98 | 1496 | | Zeeuws-Vlaams | 0.90 | 3231 | ## Further Resources Background article on the Meertje-project ([NL](https://meertens.knaw.nl/2026/03/05/kan-een-computer-dialect-herkennen/) or [Eng](https://www.the-low-countries.com/article/can-a-computer-recognise-dialect/)) [Finetuning script](https://colab.research.google.com/drive/1XYuAeNQQrCZMvq24mdMRPNeay-MNhPrz?usp=sharing) (Colab Notebook) [Usage script](https://colab.research.google.com/drive/16LyvCtORY4NBo4PpN-Xwo_lP0zH2hwOI?usp=sharing) (Colab Notebook)