README: RoBBERT-based NER Model for Location Entity Extraction

Overview

This Named Entity Recognition (NER) model is specifically designed for extracting location-related entities from Flemish Dutch administrative documents. The model is based on the RoBBERT language model and has been fine-tuned to identify various types of location entities commonly found in municipal decisions and agenda items.

Model Details

Base Model: RoBBERT (A Dutch adaptation of RoBERTa)
Framework: spaCy v3.x
Language: Dutch (Flemish variant)
Task: Named Entity Recognition (Location Extraction)
Domain: Municipal administrative documents

Training Data

The model was trained on data sourced from Lokaal Beslist Vlaanderen, which contains:

Agendapunten (Agenda items): Items discussed in municipal council meetings
Besluiten (Decisions): Official municipal decisions and resolutions
Language: Flemish Dutch administrative language
Coverage: Various Flemish municipalities and administrative regions

The training corpus consists of real-world municipal documents, providing the model with authentic examples of how location entities appear in official Flemish administrative contexts.

Usage

import spacy

# Load the trained model
nlp = spacy.load("path/to/model-best")

# Process text
text = "De werken aan de Korenmarkt 15-17, 9000 Gent worden uitgevoerd."
doc = nlp(text)

# Extract entities
for ent in doc.ents:
    print(f"{ent.text} -> {ent.label_}")

# Output:
# Korenmarkt -> STREET
# 15-17 -> HOUSENUMBERS  
# 9000 -> POSTCODE
# Gent -> CITY

Entity Classes

The model recognizes 8 distinct location-related entity types:

1. `CITY`

Description: City, town, or municipality names
Examples: "Gent", "Antwerpen", "Brugge", "Nazareth-De Pinte"

2. `STREET`

Description: Street names without house numbers
Examples: "Korenmarkt", "Graslei", "Sint-Baafsplein"

3. `ROAD`

Description: Roads, highways, and major thoroughfares
Examples: "E40", "R4", "Gentsesteenweg"

4. `INTERSECTION`

Description: Street intersections and crossings
Examples: "kruispunt Korenmarkt met Graslei", "hoek van de Veldstraat"

5. `HOUSENUMBERS`

Description: House numbers, number ranges, and apartment specifications
Examples: "15", "23-27", "100 bus 2", "1 tot en met 5"

6. `POSTCODE`

Description: Belgian postal codes
Examples: "9000", "2000", "8000"

7. `PROVINCE`

Description: Belgian provinces and regions
Examples: "Oost-Vlaanderen", "West-Vlaanderen", "Antwerpen"

8. `DOMAIN`

Description: Specific areas, domains, or districts within municipalities
Examples: "industriezone", "woongebied", "natuurdomein"

9. `BUILDING`

Description: Specific buildings, structures, or facilities
Examples: "stadhuis", "gemeenschapscentrum", "bibliotheek", "sporthal"

Temporal Entities

10. `DATE`

Description: Dates, periods, and temporal references
Examples: "15 maart 2024", "volgende maand", "dit jaar"

11. `TIME`

Description: Specific times and time periods
Examples: "14:30", "namiddag", "tijdens kantooruren"

People and Organizations

12. `PERSON`

Description: Names of individuals (politicians, officials, citizens)
Examples: "burgemeester Mathias De Clercq", "schepen van mobiliteit"

13. `ORG`

Description: Organizations, institutions, and companies
Examples: "De Lijn", "Aquafin", "Farys", "politiezone"

Financial and Quantitative Entities

14. `MONEY`

Description: Monetary amounts and financial values
Examples: "€50.000", "125.000 euro", "budget van 2 miljoen"

15. `PERCENT`

Description: Percentages and proportional values
Examples: "15%", "50 procent", "driekwart"

Other Entities

16. `PRODUCT`

Description: Products, services, or specific items
Examples: "fietsenstalling", "parkeerautomaat", "LED-verlichting"

Applications

This NER model is particularly useful for:

Document Processing: Automated extraction of locations from municipal documents
Address Parsing: Breaking down complex address information into components
Geographic Information Systems: Populating GIS databases from text sources
Administrative Analysis: Understanding spatial references in policy documents
Data Mining: Large-scale extraction of location data from municipal archives

Technical Specifications

Input: Raw Dutch text (particularly Flemish administrative language)
Output: Tokenized entities with location classification labels
Processing: Real-time entity extraction with spaCy pipeline
Integration: Compatible with spaCy ecosystem and downstream NLP tasks

Limitations

Language: Optimized specifically for Flemish Dutch; may not perform well on other Dutch variants
Domain: Best performance on administrative/municipal text; may need fine-tuning for other domains
Geographic Scope: Trained primarily on Belgian location data; international locations may not be well-recognized

Model Files

The complete model package includes:

config.cfg: spaCy pipeline configuration
meta.json: Model metadata and information
vocab/: Vocabulary and vector data
ner/: Named Entity Recognition component
tokenizer/: Text tokenization component

Downloads last month: -

Model tree for svercoutere/RoBERTa-NER-BE-Loc

Base model

pdelobelle/robbert-v2-dutch-base

Finetuned

(45)

this model

README: RoBBERT-based NER Model for Location Entity Extraction

Overview

Model Details

Training Data

Usage

Entity Classes

1. CITY

2. STREET

3. ROAD

4. INTERSECTION

5. HOUSENUMBERS

6. POSTCODE

7. PROVINCE

8. DOMAIN

9. BUILDING