bert-finetuned-ner

This model is a fine-tuned Persian BERT model based on HooshvareLab/bert-base-parsbert-uncased for Named Entity Recognition (NER). It has been trained to identify entities such as persons, organizations, locations, and products in Persian text.

Model Description

bert-finetuned-ner is designed for token-level classification in Persian. The model uses ParsBERT, a BERT variant pretrained on a large Persian corpus, as the base model and is fine-tuned on a wnut2017-persian dataset. It can predict entity labels for each token in input text, supporting tasks such as text analysis, information extraction, and question answering pipelines.

Intended Uses & Limitations

Intended Uses

  • Named Entity Recognition (NER) in Persian text.
  • Information extraction for NLP pipelines in Persian language applications.
  • Academic research or industrial projects requiring entity tagging.

Limitations

  • Performance depends heavily on the coverage and quality of the training data. Entities not represented in the dataset may not be recognized.
  • The model may misclassify rare or out-of-vocabulary words.
  • For critical applications, manual verification of predictions is recommended.
  • Trained on formal text; performance on dialects or colloquial Persian may vary.

Training and Evaluation Data

  • Dataset: ('Amir13/wnut2017-persian')
  • Entities annotated include: persons, organizations, locations, creative works, and products.
  • Tokenization handled using ParsBERT tokenizer (HooshvareLab/bert-base-parsbert-uncased).

Training Procedure

Training Hyperparameters

  • Learning rate: 2e-5
  • Train batch size: 8
  • Evaluation batch size: 8
  • Optimizer: AdamW (betas=(0.9, 0.999), epsilon=1e-8)
  • Learning rate scheduler: linear
  • Number of epochs: 10
  • Seed: 42

Training Environment

  • Framework: Transformers 4.56.1
  • PyTorch 2.8.0+cu126
  • Datasets 4.0.0
  • Tokenizers 0.22.0

Training Results

  • The model achieved training loss of ~0.13 (averaged over all epochs).
  • Accuracy, F1, precision, and recall metrics are recommended to be computed on a held-out test set for full evaluation.

How to Use

from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

model_path = "path_to_saved_model"

tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForTokenClassification.from_pretrained(model_path)

ner_pipeline = pipeline(
    "ner",
    model=model,
    tokenizer=tokenizer,
    aggregation_strategy="simple"
)

text = "سلام. در تهران زندگی میکنم."
results = ner_pipeline(text)
print(results)
Downloads last month
2
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MoAmini77/bert-finetuned-ner

Finetuned
(20)
this model