README: RoBBERT-based NER Model for Location Entity Extraction

Overview

This Named Entity Recognition (NER) model is specifically designed for extracting location-related entities from Flemish Dutch administrative documents. The model is based on the RoBBERT language model and has been fine-tuned to identify various types of location entities commonly found in municipal decisions and agenda items.

Model Details

  • Base Model: RoBBERT (A Dutch adaptation of RoBERTa)
  • Framework: spaCy v3.x
  • Language: Dutch (Flemish variant)
  • Task: Named Entity Recognition (Location Extraction)
  • Domain: Municipal administrative documents

Training Data

The model was trained on data sourced from Lokaal Beslist Vlaanderen, which contains:

  • Agendapunten (Agenda items): Items discussed in municipal council meetings
  • Besluiten (Decisions): Official municipal decisions and resolutions
  • Language: Flemish Dutch administrative language
  • Coverage: Various Flemish municipalities and administrative regions

The training corpus consists of real-world municipal documents, providing the model with authentic examples of how location entities appear in official Flemish administrative contexts.

Usage

import spacy

# Load the trained model
nlp = spacy.load("path/to/model-best")

# Process text
text = "De werken aan de Korenmarkt 15-17, 9000 Gent worden uitgevoerd."
doc = nlp(text)

# Extract entities
for ent in doc.ents:
    print(f"{ent.text} -> {ent.label_}")

# Output:
# Korenmarkt -> STREET
# 15-17 -> HOUSENUMBERS  
# 9000 -> POSTCODE
# Gent -> CITY

Entity Classes

The model recognizes 8 distinct location-related entity types:

1. CITY

  • Description: City, town, or municipality names
  • Examples: "Gent", "Antwerpen", "Brugge", "Nazareth-De Pinte"

2. STREET

  • Description: Street names without house numbers
  • Examples: "Korenmarkt", "Graslei", "Sint-Baafsplein"

3. ROAD

  • Description: Roads, highways, and major thoroughfares
  • Examples: "E40", "R4", "Gentsesteenweg"

4. INTERSECTION

  • Description: Street intersections and crossings
  • Examples: "kruispunt Korenmarkt met Graslei", "hoek van de Veldstraat"

5. HOUSENUMBERS

  • Description: House numbers, number ranges, and apartment specifications
  • Examples: "15", "23-27", "100 bus 2", "1 tot en met 5"

6. POSTCODE

  • Description: Belgian postal codes
  • Examples: "9000", "2000", "8000"

7. PROVINCE

  • Description: Belgian provinces and regions
  • Examples: "Oost-Vlaanderen", "West-Vlaanderen", "Antwerpen"

8. DOMAIN

  • Description: Specific areas, domains, or districts within municipalities
  • Examples: "industriezone", "woongebied", "natuurdomein"

9. BUILDING

  • Description: Specific buildings, structures, or facilities
  • Examples: "stadhuis", "gemeenschapscentrum", "bibliotheek", "sporthal"

Temporal Entities

10. DATE

  • Description: Dates, periods, and temporal references
  • Examples: "15 maart 2024", "volgende maand", "dit jaar"

11. TIME

  • Description: Specific times and time periods
  • Examples: "14:30", "namiddag", "tijdens kantooruren"

People and Organizations

12. PERSON

  • Description: Names of individuals (politicians, officials, citizens)
  • Examples: "burgemeester Mathias De Clercq", "schepen van mobiliteit"

13. ORG

  • Description: Organizations, institutions, and companies
  • Examples: "De Lijn", "Aquafin", "Farys", "politiezone"

Financial and Quantitative Entities

14. MONEY

  • Description: Monetary amounts and financial values
  • Examples: "โ‚ฌ50.000", "125.000 euro", "budget van 2 miljoen"

15. PERCENT

  • Description: Percentages and proportional values
  • Examples: "15%", "50 procent", "driekwart"

Other Entities

16. PRODUCT

  • Description: Products, services, or specific items
  • Examples: "fietsenstalling", "parkeerautomaat", "LED-verlichting"

Applications

This NER model is particularly useful for:

  1. Document Processing: Automated extraction of locations from municipal documents
  2. Address Parsing: Breaking down complex address information into components
  3. Geographic Information Systems: Populating GIS databases from text sources
  4. Administrative Analysis: Understanding spatial references in policy documents
  5. Data Mining: Large-scale extraction of location data from municipal archives

Technical Specifications

  • Input: Raw Dutch text (particularly Flemish administrative language)
  • Output: Tokenized entities with location classification labels
  • Processing: Real-time entity extraction with spaCy pipeline
  • Integration: Compatible with spaCy ecosystem and downstream NLP tasks

Limitations

  • Language: Optimized specifically for Flemish Dutch; may not perform well on other Dutch variants
  • Domain: Best performance on administrative/municipal text; may need fine-tuning for other domains
  • Geographic Scope: Trained primarily on Belgian location data; international locations may not be well-recognized

Model Files

The complete model package includes:

  • config.cfg: spaCy pipeline configuration
  • meta.json: Model metadata and information
  • vocab/: Vocabulary and vector data
  • ner/: Named Entity Recognition component
  • tokenizer/: Text tokenization component
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for svercoutere/RoBERTa-NER-BE-Loc

Finetuned
(45)
this model