# sbic-method2

An updated version of Standard-Based Impact Classification (SBIC) method of CSR report analysis in accordance with GRI framework


Here's a README section with instructions on how to run the code.  

---

# **Multilabel Classification Step**  

This code performs report similarity search using **cosine similarity**, **K-Nearest Neighbor (KNN) algorithm**, and **Sigmoid activation function** to classify reports based on embeddings.  

## **Prerequisites**  

Ensure you have the following installed before running the script:  

- Python 3.8+  
- Required Python libraries (install using the command below)  

```bash
pip install numpy pandas torch sentence-transformers scikit-learn
```

## **Input Files**  

Before running the script, make sure you have the following input files in the working directory:  

1. **Patent Data Files**:  
   - `embeddings_labeled.csv`  
   - `embeddings_prediction.csv`  

2. **Precomputed Embeddings**:  
   - labeled dataset: `embeddings_labeled.pkl`  
   - dataset for prediction: `embeddings_prediction.pkl`
      
## **Running the Script**  

Run the script using the following command:  

```bash
python script.py
```

## **Processing Steps**  

The script follows these main steps:  

1. **Load Data & Pretrained Embeddings**  
2. **Perform Cosine Similarity Search**: Finds the most relevant reports (sentences) using `semantic_search` from `sentence-transformers`.  
3. **Apply K-Nearest Neighbor (KNN) Algorithm**: Selects top similar reports (sentences) and aggregates predictions.  
4. **Use Sigmoid Activation for Classification**: Applies a threshold to generate final classification outputs.  
5. **Save Results**: Exports `df_results_0_50k.csv` containing the processed data.  

## **Output File**  

The processed results will be saved in:  

- `df_results_0_50k.csv`  

## **Execution Time**  

Execution time depends on the number of test samples and system resources. The script prints the total processing time upon completion.  

---
license: gpl-3.0
---