sbic-method2 / README.md
HamidBekam's picture
Update README.md
6efe8f4 verified
|
raw
history blame
2.03 kB
# sbic-method2
An updated version of Standard-Based Impact Classification (SBIC) method of CSR report analysis in accordance with GRI framework
Here's a README section with instructions on how to run the code.
---
# **Multilabel Classification Step**
This code performs report similarity search using **cosine similarity**, **K-Nearest Neighbor (KNN) algorithm**, and **Sigmoid activation function** to classify reports based on embeddings.
## **Prerequisites**
Ensure you have the following installed before running the script:
- Python 3.8+
- Required Python libraries (install using the command below)
```bash
pip install numpy pandas torch sentence-transformers scikit-learn
```
## **Input Files**
Before running the script, make sure you have the following input files in the working directory:
1. **Patent Data Files**:
- `embeddings_labeled.csv`
- `embeddings_prediction.csv`
2. **Precomputed Embeddings**:
- labeled dataset: `embeddings_labeled.pkl`
- dataset for prediction: `embeddings_prediction.pkl`
## **Running the Script**
Run the script using the following command:
```bash
python script.py
```
## **Processing Steps**
The script follows these main steps:
1. **Load Data & Pretrained Embeddings**
2. **Perform Cosine Similarity Search**: Finds the most relevant reports (sentences) using `semantic_search` from `sentence-transformers`.
3. **Apply K-Nearest Neighbor (KNN) Algorithm**: Selects top similar reports (sentences) and aggregates predictions.
4. **Use Sigmoid Activation for Classification**: Applies a threshold to generate final classification outputs.
5. **Save Results**: Exports `df_results_0_50k.csv` containing the processed data.
## **Output File**
The processed results will be saved in:
- `df_results_0_50k.csv`
## **Execution Time**
Execution time depends on the number of test samples and system resources. The script prints the total processing time upon completion.
---
license: gpl-3.0
---