| # sbic-method2 | |
| An updated version of Standard-Based Impact Classification (SBIC) method of CSR report analysis in accordance with GRI framework | |
| Here's a README section with instructions on how to run the code. | |
| --- | |
| # **Multilabel Classification Step** | |
| This code performs report similarity search using **cosine similarity**, **K-Nearest Neighbor (KNN) algorithm**, and **Sigmoid activation function** to classify reports based on embeddings. | |
| ## **Prerequisites** | |
| Ensure you have the following installed before running the script: | |
| - Python 3.8+ | |
| - Required Python libraries (install using the command below) | |
| ```bash | |
| pip install numpy pandas torch sentence-transformers scikit-learn | |
| ``` | |
| ## **Input Files** | |
| Before running the script, make sure you have the following input files in the working directory: | |
| 1. **Patent Data Files**: | |
| - `embeddings_labeled.csv` | |
| - `embeddings_prediction.csv` | |
| 2. **Precomputed Embeddings**: | |
| - labeled dataset: `embeddings_labeled.pkl` | |
| - dataset for prediction: `embeddings_prediction.pkl` | |
| ## **Running the Script** | |
| Run the script using the following command: | |
| ```bash | |
| python script.py | |
| ``` | |
| ## **Processing Steps** | |
| The script follows these main steps: | |
| 1. **Load Data & Pretrained Embeddings** | |
| 2. **Perform Cosine Similarity Search**: Finds the most relevant reports (sentences) using `semantic_search` from `sentence-transformers`. | |
| 3. **Apply K-Nearest Neighbor (KNN) Algorithm**: Selects top similar reports (sentences) and aggregates predictions. | |
| 4. **Use Sigmoid Activation for Classification**: Applies a threshold to generate final classification outputs. | |
| 5. **Save Results**: Exports `df_results_0_50k.csv` containing the processed data. | |
| ## **Output File** | |
| The processed results will be saved in: | |
| - `df_results_0_50k.csv` | |
| ## **Execution Time** | |
| Execution time depends on the number of test samples and system resources. The script prints the total processing time upon completion. | |
| --- | |
| license: gpl-3.0 | |
| --- | |