File size: 5,720 Bytes
8dd3195
 
 
 
 
 
 
 
8f86a8d
8dd3195
 
 
8f86a8d
8dd3195
a07368f
8dd3195
 
 
50f0d1d
 
 
 
 
a07368f
8f86a8d
8dd3195
 
50f0d1d
8dd3195
4cc4703
 
 
 
 
 
 
8dd3195
8f86a8d
50f0d1d
8f86a8d
8dd3195
 
 
f2690f9
8dd3195
e8af800
 
 
 
f2690f9
 
 
 
 
 
1982043
 
 
4d4acd4
f2690f9
 
 
1982043
f2690f9
a07368f
1982043
e8af800
f2690f9
1982043
f2690f9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1982043
 
f2690f9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8dd3195
 
8f86a8d
50f0d1d
 
 
 
 
8f86a8d
 
 
50f0d1d
8f86a8d
 
8dd3195
 
 
8f86a8d
 
50f0d1d
8f86a8d
 
50f0d1d
8f86a8d
8dd3195
 
91fa368
 
 
 
41e9120
91fa368
 
 
 
8f86a8d
 
 
 
 
 
 
 
5bd120e
8f86a8d
 
947c415
8f86a8d
 
8dd3195
50f0d1d
8dd3195
41e9120
50f0d1d
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
---
language: en
tags:
- text-classification
- multilabel-classification
- food
- climate-change
- sustainability
- veganism-&-vegetarianism
license: mit
---

# Veganism & Vegetarianism Classifier (Distilbert)

This model classifies content related to veganism and vegetarianism on climate change subreddits.

## Model Details

- Model Type: Distilbert
- Task: Multilabel text classification
- Sector: Veganism & Vegetarianism
- Base Model: Distilbert base uncased
- Labels: 7
- Training Data: Sample from 1000 GPT 4o-mini-labeled Reddit posts from climate subreddits (2010-2023)

## Labels

The model predicts 7 labels simultaneously:

1. **Animal Welfare**: Cites animal suffering, cruelty, or ethics as motivation.
2. **Environmental Impact**: Links diet choice to climate change, land, water, or emissions.
3. **Health**: Claims physical health benefits or risks of eating less meat / going vegan.
4. **Lab Grown And Alt Proteins**: References cultivated meat, precision fermentation, insect protein or plant-based substitutes.
5. **Psychology And Identity**: Diet as part of personal identity, moral virtue signalling or tribal politics.
6. **Systemic Vs Individual Action**: Calls for policy, corporate reform or large-scale funding instead of just personal diet shifts.
7. **Taste And Convenience**: Talks about flavour, texture, cooking ease, availability of vegan options, or social convenience.


Note: Label order in predictions matches the order above.

## Usage

```python
import torch, sys, os, tempfile
from transformers import DistilBertTokenizer
from huggingface_hub import snapshot_download

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

def print_sorted_label_scores(label_scores):
    # Sort label_scores dict by score descending
    sorted_items = sorted(label_scores.items(), key=lambda x: x[1], reverse=True)
    for label, score in sorted_items:
        print(f"  {label}: {score:.6f}")

# Model link and examples for this specific model
model_link = 'sanchow/veganism_and_vegetarianism-distilbert-classifier'
examples = [
    "Plant-based diets have a much lower carbon footprint than meat-heavy diets."
]

print(f"\n{'='*60}")
print("MODEL: VEGANISM & VEGETARIANISM SECTOR")
print(f"{'='*60}")

print(f"Downloading model: {model_link}")
with tempfile.TemporaryDirectory() as temp_dir:
    snapshot_download(
        repo_id=model_link,
        local_dir=temp_dir,
        local_dir_use_symlinks=False
    )
    model_class_path = os.path.join(temp_dir, 'model_class.py')
    if not os.path.exists(model_class_path):
        print(f"model_class.py not found in downloaded files")
        print(f"   Available files: {os.listdir(temp_dir)}")
    else:
        sys.path.insert(0, temp_dir)
        from model_class import MultilabelClassifier
        tokenizer = DistilBertTokenizer.from_pretrained(temp_dir)
        checkpoint = torch.load(os.path.join(temp_dir, 'model.pt'), map_location='cpu', weights_only=False)
        model = MultilabelClassifier(checkpoint['model_name'], len(checkpoint['label_names']))
        model.load_state_dict(checkpoint['model_state_dict'])
        model.to(device)
        model.eval()
        print("Model loaded successfully")
        print(f"   Labels: {checkpoint['label_names']}")
        print("\nVeganism & Vegetarianism classifier results:\n")
        for i, test_text in enumerate(examples):
            inputs = tokenizer(
                test_text, 
                return_tensors="pt", 
                truncation=True, 
                max_length=512,
                padding=True
            ).to(device)
            with torch.no_grad():
                outputs = model(**inputs)
                predictions = outputs.cpu().numpy() if isinstance(outputs, (tuple, list)) else outputs.cpu().numpy()
            label_scores = {label: float(score) for label, score in zip(checkpoint['label_names'], predictions[0])}
            print(f"Example {i+1}: '{test_text}'")
            print("Predictions (all label scores, highest first):")
            print_sorted_label_scores(label_scores)
            print("-" * 40)
```


## Performance

Best model performance:
- Micro Jaccard: 0.5584
- Macro Jaccard: 0.6710
- F1 Score: 0.8906
- Accuracy: 0.8906

Dataset: ~900 GPT-labeled samples per sector (600 train, 150 validation, 150 test)



## Optimal Thresholds

```python
optimal_thresholds = {'Animal Welfare': 0.48107979620047003, 'Environmental Impact': 0.45919171852850427, 'Health': 0.20115313966833437, 'Lab Grown And Alt Proteins': 0.3414601502146817, 'Psychology And Identity': 0.5246278637433214, 'Systemic Vs Individual Action': 0.37517437676211585, 'Taste And Convenience': 0.6635140143644325}
for label, score in zip(label_names, predictions[0]):
    threshold = optimal_thresholds.get(label, 0.5)
    if score > threshold:
        print(f"{label}: {score:.3f}")
```


## Training

Trained on GPT-labeled Reddit data:
1. Data collection from climate subreddits
2. keyword based filtering for sector-specific content
3. GPT labeling for multilabel classification
4. 80/10/10 train/validation/test split
5. Fine-tuning with threshold optimization

## Citation

If you use this model in your research, please cite:

```bibtex
@misc{veganism_and_vegetarianism_distilbert_classifier,
  title={Veganism & Vegetarianism Classifier for Climate Change Analysis},
  author={Sandeep Chowdhary},
  year={2025},
  publisher={Hugging Face},
  journal={Hugging Face Hub},
  howpublished={\url{https://huggingface.co/echoboi/veganism_and_vegetarianism-distilbert-classifier}},
}
```

## Limitations

- Trained on data from specific climate change subreddits and limited to English content
- Performance depends on GPT-generated labels