ilsp
/

UD_Greek-Cretan / README.md
viv's picture
Update README.md
c1431b4 verified
---
license: apache-2.0
language:
- el
pipeline_tag: token-classification
library_name: stanza
tags:
- Greek dialect
---
# πŸ›οΈ UD East Cretan Dialect Treebank
The **East Cretan dialect** is a variety of **Modern Greek** primarily used on the island of **Crete** and by the **Cretan diaspora**, including communities relocated to **Hamidieh in Syria** and **Western Asia Minor** following the 1923 population exchange. The dialect has been shaped by the island's long-term isolation and successive domination by **Arabs, Venetians, and Turks**, resulting in distinct phonological, morphological, and lexical characteristics.
East Cretan is divided into **western and eastern subgroups**, with the boundary roughly coinciding with the prefectures of **Rethymno** and **Heraklion**. The **eastern group** is more homogeneous, while the western shows more variation. Unlike many Modern Greek dialects, **East Cretan remains actively spoken**, serving as the main means of communication in much of the island.
---
## πŸ—£οΈ Dataset Summary
This model was trained on the **6th round of the East Cretan dataset**, which includes:
- **180 training sentences** (2,976 tokens)
- **60 development sentences** (1,129 tokens)
- **30 test sentences** (523 tokens)
Annotations follow the **Universal Dependencies v2 schema** for morphological, syntactic, and lemmatization layers.
---
## πŸ“Š Model Performance
| **Metric** | **Accuracy (%)** |
|------------|----------------:|
| UPOS | 92.90 |
| XPOS | 89.45 |
| UFeats | 85.60 |
| AllTags | 77.48 |
| Lemmas | 88.44 |
| UAS | 85.40 |
| LAS | 78.30 |
| CLAS | 72.76 |
| MLAS | 57.09 |
| BLEX | 61.57 |
| ELAS | 0.00 |
| EULAS | 0.00 |
---
## Citation
To cite this work or read more about the training pipeline, see:
Socrates Vakirtzian, Vivian Stamou, Yannis Kazos, Stella Markantonatou. (2024). **Dialectal treebanks and their relation with the standard variety: The case of East Cretan and Standard Modern Greek.** The Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025), Tallinn, Estonia, March 2–5, 2025.