|  | --- | 
					
						
						|  | license: mit | 
					
						
						|  | datasets: | 
					
						
						|  | - ljvmiranda921/tlunified-ner | 
					
						
						|  | language: | 
					
						
						|  | - tl | 
					
						
						|  | metrics: | 
					
						
						|  | - f1 | 
					
						
						|  | tags: | 
					
						
						|  | - gliner | 
					
						
						|  | pipeline_tag: token-classification | 
					
						
						|  | model-index: | 
					
						
						|  | - name: tl_gliner_small | 
					
						
						|  | results: | 
					
						
						|  | - task: | 
					
						
						|  | type: token-classification | 
					
						
						|  | name: Named Entity Recognition | 
					
						
						|  | dataset: | 
					
						
						|  | type: tlunified-ner | 
					
						
						|  | name: TLUnified-NER | 
					
						
						|  | split: test | 
					
						
						|  | revision: 3f7dab9d232414ec6204f8d6934b9a35f90a254f | 
					
						
						|  | metrics: | 
					
						
						|  | - type: f1 | 
					
						
						|  | value: 0.854 | 
					
						
						|  | name: F1 | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | # GLiNER (large) model finetuned on Tagalog data | 
					
						
						|  |  | 
					
						
						|  | This model was finetuned using the [GLiNER v2.5 suite](https://github.com/urchade/GLiNER) of models. | 
					
						
						|  | You can find and replicate the training pipeline on [Github](https://github.com/ljvmiranda921/calamanCy/tree/master/models/v0.1.0-gliner). | 
					
						
						|  |  | 
					
						
						|  | ## Usage | 
					
						
						|  |  | 
					
						
						|  | ```python | 
					
						
						|  | from gliner import GLiNER | 
					
						
						|  |  | 
					
						
						|  | # Initialize GLiNER with the base model | 
					
						
						|  | model = GLiNER.from_pretrained("ljvmiranda921/tl_gliner_large") | 
					
						
						|  |  | 
					
						
						|  | # Sample text for entity prediction | 
					
						
						|  | # Reference: Leni Robredo’s speech at the 2022 UP College of Law recognition rites | 
					
						
						|  | text = """" | 
					
						
						|  | Nagsimula ako sa Public Attorney’s Office, kung saan araw-araw, mula Lunes hanggang Biyernes, nasa loob ako ng iba’t ibang court room at tambak ang kaso. | 
					
						
						|  | Bawat Sabado, nasa BJMP ako para ihanda ang aking mga kliyente. Nahasa ako sa crim law at litigation. Pero kinalaunan, lumipat ako sa isang NGO, | 
					
						
						|  | ‘yung Sentro ng Alternatibong Lingap Panligal. Sa SALIGAN talaga ako nahubog bilang abugado: imbes na tinatanggap na lang ang mga batas na kailangang | 
					
						
						|  | sundin, nagtatanong din kung ito ba ay tunay na instrumento para makapagbigay ng katarungan sa ordinaryong Pilipino. Imbes na maghintay ng mga kliyente | 
					
						
						|  | sa de-aircon na opisina, dinadayo namin ang mga malalayong komunidad. Kadalasan, naka-tsinelas, naka-t-shirt at maong, hinahanap namin ang mga komunidad, | 
					
						
						|  | tinatawid ang mga bundok, palayan, at mga ilog para tumungo sa mga lugar kung saan hirap ang mga batayang sektor na makakuha ng access to justice. | 
					
						
						|  | Naaalala ko pa noong naging lead lawyer ako para sa isang proyekto: sa loob ng mahigit dalawang taon, bumibiyahe ako buwan-buwan papunta sa malayong | 
					
						
						|  | isla ng Masbate, nagpa-paralegal training sa mga batayang sektor doon, ipinapaliwanag, itinituturo, at sinasanay sila sa mga batas na nagbibigay-proteksyon | 
					
						
						|  | sa mga karapatan nila. | 
					
						
						|  | """ | 
					
						
						|  |  | 
					
						
						|  | # Labels for entity prediction | 
					
						
						|  | # Most GLiNER models should work best when entity types are in lower case or title case | 
					
						
						|  | labels = ["person", "organization", "location"] | 
					
						
						|  |  | 
					
						
						|  | # Perform entity prediction | 
					
						
						|  | entities = model.predict_entities(text, labels, threshold=0.5) | 
					
						
						|  |  | 
					
						
						|  | # Display predicted entities and their labels | 
					
						
						|  | for entity in entities: | 
					
						
						|  | print(entity["text"], "=>", entity["label"]) | 
					
						
						|  |  | 
					
						
						|  | # Sample output: | 
					
						
						|  | # Public Attorney’s Office => organization | 
					
						
						|  | # BJMP => organization | 
					
						
						|  | # Sentro ng Alternatibong Lingap Panligal => organization | 
					
						
						|  | # Masbate => location | 
					
						
						|  |  | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  |  | 
					
						
						|  | ## Citation | 
					
						
						|  |  | 
					
						
						|  | Please cite the following papers when using these models: | 
					
						
						|  |  | 
					
						
						|  | ``` | 
					
						
						|  | @misc{zaratiana2023gliner, | 
					
						
						|  | title={GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer}, | 
					
						
						|  | author={Urchade Zaratiana and Nadi Tomeh and Pierre Holat and Thierry Charnois}, | 
					
						
						|  | year={2023}, | 
					
						
						|  | eprint={2311.08526}, | 
					
						
						|  | archivePrefix={arXiv}, | 
					
						
						|  | primaryClass={cs.CL} | 
					
						
						|  | } | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  | ``` | 
					
						
						|  | @inproceedings{miranda-2023-calamancy, | 
					
						
						|  | title = "calaman{C}y: A {T}agalog Natural Language Processing Toolkit", | 
					
						
						|  | author = "Miranda, Lester James", | 
					
						
						|  | booktitle = "Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)", | 
					
						
						|  | month = dec, | 
					
						
						|  | year = "2023", | 
					
						
						|  | address = "Singapore, Singapore", | 
					
						
						|  | publisher = "Empirical Methods in Natural Language Processing", | 
					
						
						|  | url = "https://aclanthology.org/2023.nlposs-1.1", | 
					
						
						|  | pages = "1--7", | 
					
						
						|  | } | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  | If you're using the NER dataset: | 
					
						
						|  |  | 
					
						
						|  | ``` | 
					
						
						|  | @inproceedings{miranda-2023-developing, | 
					
						
						|  | title = "Developing a Named Entity Recognition Dataset for {T}agalog", | 
					
						
						|  | author = "Miranda, Lester James", | 
					
						
						|  | booktitle = "Proceedings of the First Workshop in South East Asian Language Processing", | 
					
						
						|  | month = nov, | 
					
						
						|  | year = "2023", | 
					
						
						|  | address = "Nusa Dua, Bali, Indonesia", | 
					
						
						|  | publisher = "Association for Computational Linguistics", | 
					
						
						|  | url = "https://aclanthology.org/2023.sealp-1.2", | 
					
						
						|  | doi = "10.18653/v1/2023.sealp-1.2", | 
					
						
						|  | pages = "13--20", | 
					
						
						|  | } | 
					
						
						|  | ``` |