sbic-method2 / README.md
HamidBekam's picture
Update README.md
6efe8f4 verified
|
raw
history blame
2.03 kB

sbic-method2

An updated version of Standard-Based Impact Classification (SBIC) method of CSR report analysis in accordance with GRI framework

Here's a README section with instructions on how to run the code.


Multilabel Classification Step

This code performs report similarity search using cosine similarity, K-Nearest Neighbor (KNN) algorithm, and Sigmoid activation function to classify reports based on embeddings.

Prerequisites

Ensure you have the following installed before running the script:

  • Python 3.8+
  • Required Python libraries (install using the command below)
pip install numpy pandas torch sentence-transformers scikit-learn

Input Files

Before running the script, make sure you have the following input files in the working directory:

  1. Patent Data Files:

    • embeddings_labeled.csv
    • embeddings_prediction.csv
  2. Precomputed Embeddings:

    • labeled dataset: embeddings_labeled.pkl
    • dataset for prediction: embeddings_prediction.pkl

Running the Script

Run the script using the following command:

python script.py

Processing Steps

The script follows these main steps:

  1. Load Data & Pretrained Embeddings
  2. Perform Cosine Similarity Search: Finds the most relevant reports (sentences) using semantic_search from sentence-transformers.
  3. Apply K-Nearest Neighbor (KNN) Algorithm: Selects top similar reports (sentences) and aggregates predictions.
  4. Use Sigmoid Activation for Classification: Applies a threshold to generate final classification outputs.
  5. Save Results: Exports df_results_0_50k.csv containing the processed data.

Output File

The processed results will be saved in:

  • df_results_0_50k.csv

Execution Time

Execution time depends on the number of test samples and system resources. The script prints the total processing time upon completion.


license: gpl-3.0