| --- |
| license: mit |
| datasets: |
| - him1411/EDGAR10-Q |
| language: |
| - en |
| metrics: |
| - rouge |
| --- |
| license: mit |
| language: |
| - en |
| tags: |
| - finance |
| - ContextNER |
| - language models |
| datasets: |
| - him1411/EDGAR10-Q |
| metrics: |
| - rouge |
| --- |
|
|
| EDGAR-BART-Base |
| ============= |
|
|
| BART base model finetuned on [EDGAR10-Q dataset](https://huggingface.co/datasets/him1411/EDGAR10-Q) |
|
|
| You may want to check out |
| * Our paper: [CONTEXT-NER: Contextual Phrase Generation at Scale](https://arxiv.org/abs/2109.08079/) |
| * GitHub: [Click Here](https://github.com/him1411/edgar10q-dataset) |
|
|
|
|
|
|
| Direct Use |
| ============= |
|
|
| It is possible to use this model to generate text, which is useful for experimentation and understanding its capabilities. **It should not be directly used for production or work that may directly impact people.** |
|
|
| How to Use |
| ============= |
|
|
| You can very easily load the models with Transformers, instead of downloading them manually. The [bart-base model](https://huggingface.co/facebook/bart-base) is the backbone of our model. Here is how to use the model in PyTorch: |
|
|
| ```python |
| from transformers import AutoTokenizer, AutoModelForSeq2SeqLM |
| tokenizer = AutoTokenizer.from_pretrained("him1411/EDGAR-BART-Base") |
| model = AutoModelForSeq2SeqLM.from_pretrained("him1411/EDGAR-BART-Base") |
| ``` |
| Or just clone the model repo |
| ``` |
| git lfs install |
| git clone https://huggingface.co/him1411/EDGAR-BART-Base |
| ``` |
|
|
| Inference Example |
| ============= |
|
|
| Here, we provide an example for the "ContextNER" task. Below is an example of one instance. |
|
|
| ```python |
| from transformers import AutoTokenizer, AutoModelForSeq2SeqLM |
| tokenizer = AutoTokenizer.from_pretrained("him1411/EDGAR-BART-Base") |
| model = AutoModelForSeq2SeqLM.from_pretrained("him1411/EDGAR-BART-Base") |
| # Input shows how we have appended instruction from our file for HoC dataset with instance. |
| input = "14.5 years . The definite lived intangible assets related to the contracts and trade names had estimated weighted average useful lives of 5.9 years and 14.5 years, respectively, at acquisition." |
| tokenized_input= tokenizer(input) |
| # Ideal output for this input is 'Definite lived intangible assets weighted average remaining useful life' |
| output = model(tokenized_input) |
| ``` |
|
|
|
|
| BibTeX Entry and Citation Info |
| =============== |
| If you are using our model, please cite our paper: |
|
|
| ```bibtex |
| @article{gupta2021context, |
| title={Context-NER: Contextual Phrase Generation at Scale}, |
| author={Gupta, Himanshu and Verma, Shreyas and Kumar, Tarun and Mishra, Swaroop and Agrawal, Tamanna and Badugu, Amogh and Bhatt, Himanshu Sharad}, |
| journal={arXiv preprint arXiv:2109.08079}, |
| year={2021} |
| } |
| ``` |