ia-nechaev
/

sbic-method2

Text Classification

Model card Files Files and versions

sbic-method2 / README.md

HamidBekam's picture

Update README.md

6efe8f4 verified 9 months ago

|

2.03 kB

	# sbic-method2

	An updated version of Standard-Based Impact Classification (SBIC) method of CSR report analysis in accordance with GRI framework


	Here's a README section with instructions on how to run the code.

	---

	# Multilabel Classification Step

	This code performs report similarity search using cosine similarity, K-Nearest Neighbor (KNN) algorithm, and Sigmoid activation function to classify reports based on embeddings.

	## Prerequisites

	Ensure you have the following installed before running the script:

	- Python 3.8+
	- Required Python libraries (install using the command below)

	```bash
	pip install numpy pandas torch sentence-transformers scikit-learn
	```

	## Input Files

	Before running the script, make sure you have the following input files in the working directory:

	1. Patent Data Files:
	- `embeddings_labeled.csv`
	- `embeddings_prediction.csv`

	2. Precomputed Embeddings:
	- labeled dataset: `embeddings_labeled.pkl`
	- dataset for prediction: `embeddings_prediction.pkl`

	## Running the Script

	Run the script using the following command:

	```bash
	python script.py
	```

	## Processing Steps

	The script follows these main steps:

	1. Load Data & Pretrained Embeddings
	2. Perform Cosine Similarity Search: Finds the most relevant reports (sentences) using `semantic_search` from `sentence-transformers`.
	3. Apply K-Nearest Neighbor (KNN) Algorithm: Selects top similar reports (sentences) and aggregates predictions.
	4. Use Sigmoid Activation for Classification: Applies a threshold to generate final classification outputs.
	5. Save Results: Exports `df_results_0_50k.csv` containing the processed data.

	## Output File

	The processed results will be saved in:

	- `df_results_0_50k.csv`

	## Execution Time

	Execution time depends on the number of test samples and system resources. The script prints the total processing time upon completion.

	---
	license: gpl-3.0
	---