---
license: cc
language:
- nl
metrics:
- accuracy
base_model:
- GroNLP/bert-base-dutch-cased
pipeline_tag: text-classification
tags:
- dialect
- low-resource languages
---

# Model Description

Meertje is intended as a dialect classifier, developed at the Meertens Institute to distinguish between dialect material and Standard Dutch.
The model was trained on the Dialect Novel Corpus, using a subcorpus of linguistic material from Drenthe.


## Intended Use

Isolating dialect material from Dutch texts containing both dialect and Standard Dutch.


## Training Data

Sentences containing Drents vs. Standard Dutch sentences. Balanced train/dev/test at 2122/730/730.

## Evaluation

| Material          | F1 (weighted avg) | support |
| ----------------- | ----------------- | ------- |
| Test set (Drents) | 0.95              | 730     |
| Drents            | 0.95              | 7362    |
| Gronings          | 0.94              | 605     |
| Twents            | 0.98              | 1496    |
| Zeeuws-Vlaams     | 0.90              | 3231    |


## Further Resources

Background article on the Meertje-project ([NL](https://meertens.knaw.nl/2026/03/05/kan-een-computer-dialect-herkennen/) or [Eng](https://www.the-low-countries.com/article/can-a-computer-recognise-dialect/))

[Finetuning script](https://colab.research.google.com/drive/1XYuAeNQQrCZMvq24mdMRPNeay-MNhPrz?usp=sharing) (Colab Notebook)

[Usage script](https://colab.research.google.com/drive/16LyvCtORY4NBo4PpN-Xwo_lP0zH2hwOI?usp=sharing) (Colab Notebook)