README: RoBBERT-based NER Model for Location Entity Extraction
Overview
This Named Entity Recognition (NER) model is specifically designed for extracting location-related entities from Flemish Dutch administrative documents. The model is based on the RoBBERT language model and has been fine-tuned to identify various types of location entities commonly found in municipal decisions and agenda items.
Model Details
- Base Model: RoBBERT (A Dutch adaptation of RoBERTa)
- Framework: spaCy v3.x
- Language: Dutch (Flemish variant)
- Task: Named Entity Recognition (Location Extraction)
- Domain: Municipal administrative documents
Training Data
The model was trained on data sourced from Lokaal Beslist Vlaanderen, which contains:
- Agendapunten (Agenda items): Items discussed in municipal council meetings
- Besluiten (Decisions): Official municipal decisions and resolutions
- Language: Flemish Dutch administrative language
- Coverage: Various Flemish municipalities and administrative regions
The training corpus consists of real-world municipal documents, providing the model with authentic examples of how location entities appear in official Flemish administrative contexts.
Usage
import spacy
# Load the trained model
nlp = spacy.load("path/to/model-best")
# Process text
text = "De werken aan de Korenmarkt 15-17, 9000 Gent worden uitgevoerd."
doc = nlp(text)
# Extract entities
for ent in doc.ents:
print(f"{ent.text} -> {ent.label_}")
# Output:
# Korenmarkt -> STREET
# 15-17 -> HOUSENUMBERS
# 9000 -> POSTCODE
# Gent -> CITY
Entity Classes
The model recognizes 8 distinct location-related entity types:
1. CITY
- Description: City, town, or municipality names
- Examples: "Gent", "Antwerpen", "Brugge", "Nazareth-De Pinte"
2. STREET
- Description: Street names without house numbers
- Examples: "Korenmarkt", "Graslei", "Sint-Baafsplein"
3. ROAD
- Description: Roads, highways, and major thoroughfares
- Examples: "E40", "R4", "Gentsesteenweg"
4. INTERSECTION
- Description: Street intersections and crossings
- Examples: "kruispunt Korenmarkt met Graslei", "hoek van de Veldstraat"
5. HOUSENUMBERS
- Description: House numbers, number ranges, and apartment specifications
- Examples: "15", "23-27", "100 bus 2", "1 tot en met 5"
6. POSTCODE
- Description: Belgian postal codes
- Examples: "9000", "2000", "8000"
7. PROVINCE
- Description: Belgian provinces and regions
- Examples: "Oost-Vlaanderen", "West-Vlaanderen", "Antwerpen"
8. DOMAIN
- Description: Specific areas, domains, or districts within municipalities
- Examples: "industriezone", "woongebied", "natuurdomein"
9. BUILDING
- Description: Specific buildings, structures, or facilities
- Examples: "stadhuis", "gemeenschapscentrum", "bibliotheek", "sporthal"
Temporal Entities
10. DATE
- Description: Dates, periods, and temporal references
- Examples: "15 maart 2024", "volgende maand", "dit jaar"
11. TIME
- Description: Specific times and time periods
- Examples: "14:30", "namiddag", "tijdens kantooruren"
People and Organizations
12. PERSON
- Description: Names of individuals (politicians, officials, citizens)
- Examples: "burgemeester Mathias De Clercq", "schepen van mobiliteit"
13. ORG
- Description: Organizations, institutions, and companies
- Examples: "De Lijn", "Aquafin", "Farys", "politiezone"
Financial and Quantitative Entities
14. MONEY
- Description: Monetary amounts and financial values
- Examples: "โฌ50.000", "125.000 euro", "budget van 2 miljoen"
15. PERCENT
- Description: Percentages and proportional values
- Examples: "15%", "50 procent", "driekwart"
Other Entities
16. PRODUCT
- Description: Products, services, or specific items
- Examples: "fietsenstalling", "parkeerautomaat", "LED-verlichting"
Applications
This NER model is particularly useful for:
- Document Processing: Automated extraction of locations from municipal documents
- Address Parsing: Breaking down complex address information into components
- Geographic Information Systems: Populating GIS databases from text sources
- Administrative Analysis: Understanding spatial references in policy documents
- Data Mining: Large-scale extraction of location data from municipal archives
Technical Specifications
- Input: Raw Dutch text (particularly Flemish administrative language)
- Output: Tokenized entities with location classification labels
- Processing: Real-time entity extraction with spaCy pipeline
- Integration: Compatible with spaCy ecosystem and downstream NLP tasks
Limitations
- Language: Optimized specifically for Flemish Dutch; may not perform well on other Dutch variants
- Domain: Best performance on administrative/municipal text; may need fine-tuning for other domains
- Geographic Scope: Trained primarily on Belgian location data; international locations may not be well-recognized
Model Files
The complete model package includes:
config.cfg: spaCy pipeline configurationmeta.json: Model metadata and informationvocab/: Vocabulary and vector dataner/: Named Entity Recognition componenttokenizer/: Text tokenization component
- Downloads last month
- -
Model tree for svercoutere/RoBERTa-NER-BE-Loc
Base model
pdelobelle/robbert-v2-dutch-base