archich commited on
Commit
172c327
·
verified ·
1 Parent(s): 0335191

Upload folder using huggingface_hub

Browse files
Files changed (4) hide show
  1. README.md +80 -0
  2. config.json +23 -0
  3. label_encoders.pkl +3 -0
  4. pytorch_model.bin +3 -0
README.md ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - hi
5
+ license: apache-2.0
6
+ tags:
7
+ - hate-speech-detection
8
+ - reddit
9
+ - xlm-roberta
10
+ - hindi
11
+ - english
12
+ datasets:
13
+ - HASOC2019
14
+ metrics:
15
+ - accuracy
16
+ - f1
17
+ model-index:
18
+ - name: reddit-hate-speech-detector
19
+ results:
20
+ - task:
21
+ type: text-classification
22
+ metrics:
23
+ - type: accuracy
24
+ value: 0.8293
25
+ - type: f1
26
+ value: 0.8278
27
+ ---
28
+
29
+ # Reddit Hate Speech Detector (Hindi + English)
30
+
31
+ This model detects hate speech in Reddit comments for both Hindi and English languages.
32
+
33
+ ## Model Description
34
+
35
+ - **Base Model:** XLM-RoBERTa
36
+ - **Languages:** Hindi, English
37
+ - **Task:** Multi-task classification (hate speech detection + type + target)
38
+ - **Accuracy:** 82.93%
39
+ - **F1 Score:** 0.8278
40
+
41
+ ## Intended Use
42
+
43
+ This model is designed for:
44
+ - Content moderation on Reddit
45
+ - Automated hate speech detection
46
+ - Research purposes
47
+
48
+ ⚠️ **Important:** This model should assist human moderators, not replace them.
49
+
50
+ ## Usage
51
+
52
+ ```python
53
+ import torch
54
+ from transformers import XLMRobertaTokenizer
55
+
56
+ # Load tokenizer
57
+ tokenizer = XLMRobertaTokenizer.from_pretrained('xlm-roberta-base')
58
+
59
+ # Your model loading code here
60
+ # (See inference script)
61
+ ```
62
+
63
+ ## Training Data
64
+
65
+ - HASOC 2019 Hindi Dataset
66
+ - HASOC 2019 English Dataset
67
+ - Combined training with class balancing
68
+
69
+ ## Limitations
70
+
71
+ - May have biases present in training data
72
+ - Requires context for accurate detection
73
+ - Cultural nuances may not be fully captured
74
+
75
+ ## Ethical Considerations
76
+
77
+ - Should be used transparently
78
+ - Allow user appeals
79
+ - Regular monitoring for fairness
80
+ - Consider cultural context
config.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "xlm-roberta",
3
+ "num_task1_classes": 2,
4
+ "num_task2_classes": 4,
5
+ "num_task3_classes": 3,
6
+ "dropout": 0.2,
7
+ "base_model": "xlm-roberta-base",
8
+ "task_1_labels": [
9
+ "HOF",
10
+ "NOT"
11
+ ],
12
+ "task_2_labels": [
13
+ "HATE",
14
+ "NONE",
15
+ "OFFN",
16
+ "PRFN"
17
+ ],
18
+ "task_3_labels": [
19
+ "NONE",
20
+ "TIN",
21
+ "UNT"
22
+ ]
23
+ }
label_encoders.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7f7e5d3a775c7a57ce504c49d73c79d6398947bbf674fe79974fe6e661ad2190
3
+ size 398
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cc8d188d80793c4ef856d172c34f63ea79aff1f5a57abb2538b4e0a9b60932b0
3
+ size 1115855655