File size: 5,720 Bytes
8dd3195 8f86a8d 8dd3195 8f86a8d 8dd3195 a07368f 8dd3195 50f0d1d a07368f 8f86a8d 8dd3195 50f0d1d 8dd3195 4cc4703 8dd3195 8f86a8d 50f0d1d 8f86a8d 8dd3195 f2690f9 8dd3195 e8af800 f2690f9 1982043 4d4acd4 f2690f9 1982043 f2690f9 a07368f 1982043 e8af800 f2690f9 1982043 f2690f9 1982043 f2690f9 8dd3195 8f86a8d 50f0d1d 8f86a8d 50f0d1d 8f86a8d 8dd3195 8f86a8d 50f0d1d 8f86a8d 50f0d1d 8f86a8d 8dd3195 91fa368 41e9120 91fa368 8f86a8d 5bd120e 8f86a8d 947c415 8f86a8d 8dd3195 50f0d1d 8dd3195 41e9120 50f0d1d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 |
---
language: en
tags:
- text-classification
- multilabel-classification
- food
- climate-change
- sustainability
- veganism-&-vegetarianism
license: mit
---
# Veganism & Vegetarianism Classifier (Distilbert)
This model classifies content related to veganism and vegetarianism on climate change subreddits.
## Model Details
- Model Type: Distilbert
- Task: Multilabel text classification
- Sector: Veganism & Vegetarianism
- Base Model: Distilbert base uncased
- Labels: 7
- Training Data: Sample from 1000 GPT 4o-mini-labeled Reddit posts from climate subreddits (2010-2023)
## Labels
The model predicts 7 labels simultaneously:
1. **Animal Welfare**: Cites animal suffering, cruelty, or ethics as motivation.
2. **Environmental Impact**: Links diet choice to climate change, land, water, or emissions.
3. **Health**: Claims physical health benefits or risks of eating less meat / going vegan.
4. **Lab Grown And Alt Proteins**: References cultivated meat, precision fermentation, insect protein or plant-based substitutes.
5. **Psychology And Identity**: Diet as part of personal identity, moral virtue signalling or tribal politics.
6. **Systemic Vs Individual Action**: Calls for policy, corporate reform or large-scale funding instead of just personal diet shifts.
7. **Taste And Convenience**: Talks about flavour, texture, cooking ease, availability of vegan options, or social convenience.
Note: Label order in predictions matches the order above.
## Usage
```python
import torch, sys, os, tempfile
from transformers import DistilBertTokenizer
from huggingface_hub import snapshot_download
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
def print_sorted_label_scores(label_scores):
# Sort label_scores dict by score descending
sorted_items = sorted(label_scores.items(), key=lambda x: x[1], reverse=True)
for label, score in sorted_items:
print(f" {label}: {score:.6f}")
# Model link and examples for this specific model
model_link = 'sanchow/veganism_and_vegetarianism-distilbert-classifier'
examples = [
"Plant-based diets have a much lower carbon footprint than meat-heavy diets."
]
print(f"\n{'='*60}")
print("MODEL: VEGANISM & VEGETARIANISM SECTOR")
print(f"{'='*60}")
print(f"Downloading model: {model_link}")
with tempfile.TemporaryDirectory() as temp_dir:
snapshot_download(
repo_id=model_link,
local_dir=temp_dir,
local_dir_use_symlinks=False
)
model_class_path = os.path.join(temp_dir, 'model_class.py')
if not os.path.exists(model_class_path):
print(f"model_class.py not found in downloaded files")
print(f" Available files: {os.listdir(temp_dir)}")
else:
sys.path.insert(0, temp_dir)
from model_class import MultilabelClassifier
tokenizer = DistilBertTokenizer.from_pretrained(temp_dir)
checkpoint = torch.load(os.path.join(temp_dir, 'model.pt'), map_location='cpu', weights_only=False)
model = MultilabelClassifier(checkpoint['model_name'], len(checkpoint['label_names']))
model.load_state_dict(checkpoint['model_state_dict'])
model.to(device)
model.eval()
print("Model loaded successfully")
print(f" Labels: {checkpoint['label_names']}")
print("\nVeganism & Vegetarianism classifier results:\n")
for i, test_text in enumerate(examples):
inputs = tokenizer(
test_text,
return_tensors="pt",
truncation=True,
max_length=512,
padding=True
).to(device)
with torch.no_grad():
outputs = model(**inputs)
predictions = outputs.cpu().numpy() if isinstance(outputs, (tuple, list)) else outputs.cpu().numpy()
label_scores = {label: float(score) for label, score in zip(checkpoint['label_names'], predictions[0])}
print(f"Example {i+1}: '{test_text}'")
print("Predictions (all label scores, highest first):")
print_sorted_label_scores(label_scores)
print("-" * 40)
```
## Performance
Best model performance:
- Micro Jaccard: 0.5584
- Macro Jaccard: 0.6710
- F1 Score: 0.8906
- Accuracy: 0.8906
Dataset: ~900 GPT-labeled samples per sector (600 train, 150 validation, 150 test)
## Optimal Thresholds
```python
optimal_thresholds = {'Animal Welfare': 0.48107979620047003, 'Environmental Impact': 0.45919171852850427, 'Health': 0.20115313966833437, 'Lab Grown And Alt Proteins': 0.3414601502146817, 'Psychology And Identity': 0.5246278637433214, 'Systemic Vs Individual Action': 0.37517437676211585, 'Taste And Convenience': 0.6635140143644325}
for label, score in zip(label_names, predictions[0]):
threshold = optimal_thresholds.get(label, 0.5)
if score > threshold:
print(f"{label}: {score:.3f}")
```
## Training
Trained on GPT-labeled Reddit data:
1. Data collection from climate subreddits
2. keyword based filtering for sector-specific content
3. GPT labeling for multilabel classification
4. 80/10/10 train/validation/test split
5. Fine-tuning with threshold optimization
## Citation
If you use this model in your research, please cite:
```bibtex
@misc{veganism_and_vegetarianism_distilbert_classifier,
title={Veganism & Vegetarianism Classifier for Climate Change Analysis},
author={Sandeep Chowdhary},
year={2025},
publisher={Hugging Face},
journal={Hugging Face Hub},
howpublished={\url{https://huggingface.co/echoboi/veganism_and_vegetarianism-distilbert-classifier}},
}
```
## Limitations
- Trained on data from specific climate change subreddits and limited to English content
- Performance depends on GPT-generated labels
|