bert-finetuned-ner
This model is a fine-tuned Persian BERT model based on HooshvareLab/bert-base-parsbert-uncased for Named Entity Recognition (NER). It has been trained to identify entities such as persons, organizations, locations, and products in Persian text.
Model Description
bert-finetuned-ner is designed for token-level classification in Persian. The model uses ParsBERT, a BERT variant pretrained on a large Persian corpus, as the base model and is fine-tuned on a wnut2017-persian dataset. It can predict entity labels for each token in input text, supporting tasks such as text analysis, information extraction, and question answering pipelines.
Intended Uses & Limitations
Intended Uses
- Named Entity Recognition (NER) in Persian text.
- Information extraction for NLP pipelines in Persian language applications.
- Academic research or industrial projects requiring entity tagging.
Limitations
- Performance depends heavily on the coverage and quality of the training data. Entities not represented in the dataset may not be recognized.
- The model may misclassify rare or out-of-vocabulary words.
- For critical applications, manual verification of predictions is recommended.
- Trained on formal text; performance on dialects or colloquial Persian may vary.
Training and Evaluation Data
- Dataset: ('Amir13/wnut2017-persian')
- Entities annotated include: persons, organizations, locations, creative works, and products.
- Tokenization handled using ParsBERT tokenizer (
HooshvareLab/bert-base-parsbert-uncased).
Training Procedure
Training Hyperparameters
- Learning rate: 2e-5
- Train batch size: 8
- Evaluation batch size: 8
- Optimizer:
AdamW(betas=(0.9, 0.999), epsilon=1e-8) - Learning rate scheduler: linear
- Number of epochs: 10
- Seed: 42
Training Environment
- Framework: Transformers 4.56.1
- PyTorch 2.8.0+cu126
- Datasets 4.0.0
- Tokenizers 0.22.0
Training Results
- The model achieved training loss of ~0.13 (averaged over all epochs).
- Accuracy, F1, precision, and recall metrics are recommended to be computed on a held-out test set for full evaluation.
How to Use
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
model_path = "path_to_saved_model"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForTokenClassification.from_pretrained(model_path)
ner_pipeline = pipeline(
"ner",
model=model,
tokenizer=tokenizer,
aggregation_strategy="simple"
)
text = "سلام. در تهران زندگی میکنم."
results = ner_pipeline(text)
print(results)
- Downloads last month
- 2
Model tree for MoAmini77/bert-finetuned-ner
Base model
HooshvareLab/bert-base-parsbert-uncased