๐Ÿ” Mistral-PRCT

A LoRA fine-tuned Mistral-7B model for detecting Population Replacement Conspiracy Theory (PRCT) content across (at least) Portuguese Telegram and Italian news headlines.

License: MIT Base Model

Overview

Mistral-PRCT is a LoRA adapter fine-tuned on Portuguese Telegram messages for detecting Population Replacement Conspiracy Theories. The model demonstrates strong cross-domain generalization, achieving competitive performance on Italian news headlines despite being trained exclusively on informal social media discourse.

Key Metrics

Dataset F1-Macro F1-Binary Accuracy
Telegram PT (in-domain) 0.819 0.700 0.896
News ITA (cross-domain) 0.753 0.688 0.771

Model Description

Mistral-PRCT is a LoRA (Low-Rank Adaptation) fine-tuned version of Mistral-7B-Instruct-v0.3, specifically adapted for detecting Population Replacement Conspiracy Theory (PRCT) content. The model was trained on Portuguese Telegram messages and demonstrates robust cross-domain transfer to Italian news headlines.

What are PRCTs?

Population Replacement Conspiracy Theories are false narratives claiming deliberate orchestration of demographic substitution through immigration. Main variants include:

  • The Great Replacement Theory (Renaud Camus)
  • White Genocide
  • Kalergi Plan
  • Eurabia

These narratives are linked to extremist violence (Christchurch 2019, Utรธya 2011) and pose serious threats to democratic discourse.

Model Configuration

Architecture

  • Base Model: Mistral-7B-Instruct-v0.3 (7B parameters)
  • Adapter Type: LoRA (Low-Rank Adaptation)
  • LoRA Rank: 16
  • LoRA Alpha: 32
  • Target Modules: q_proj, v_proj, k_proj, o_proj

Label Mapping

  • 0: Non-PRCT content
  • 1: PRCT content (supports/mentions replacement narratives)

Input Requirements

  • Maximum sequence length: 2048 tokens
  • Input type: Text (Portuguese, Italian, Spanish)
  • Preprocessing: Standard Mistral tokenization

Intended Uses & Limitations

โœ… Intended Uses

  • AI-assisted content moderation (with human oversight)
  • Research on conspiracy theory propagation
  • Cross-domain and multilingual PRCT detection
  • Analysis of informal social media discourse

โš ๏ธ Limitations

  • Training bias: Optimized for Portuguese Telegram messages
  • Cross-domain performance: 6.6pp F1-macro drop on formal news (expected)
  • Language coverage: Best on Portuguese, good on Italian, untested on other Romance languages
  • Inference cost: ~4.6s per sample (slower than zero-shot but higher accuracy)

Important: Should be used as part of a broader content moderation strategy, not as sole decision-maker.

Training Data

  • Primary training: Portuguese Telegram messages (n=919)
  • Domain: Informal social media discourse, conspiracy-oriented channels
  • PRCT prevalence: 15.7%
  • Annotation: Expert annotators (Krippendorff's ฮฑ=0.58)
  • Time period: 2020-2024

Training Procedure

Hyperparameters

  • Learning rate: 2e-5
  • Batch size: 4 (with gradient accumulation)
  • Training steps: 600
  • Optimizer: AdamW 8-bit
  • LoRA dropout: 0.05
  • Weight decay: 0.01

Hardware

  • GPU: NVIDIA A100 40GB
  • Training time: ~2 hours
  • Framework: PyTorch + PEFT

Results

In-Domain Performance (Telegram PT)

Metric Score
Accuracy 0.896
Precision (Macro) 0.797
Recall (Macro) 0.848
F1-Macro 0.819
F1-Binary 0.700
Inference Time 4.62s/sample

Cross-Domain Performance (News ITA)

Metric Score
Accuracy 0.771
Precision (Macro) 0.748
Recall (Macro) 0.786
F1-Macro 0.753
F1-Binary 0.688
Inference Time 4.50s/sample

Key Finding: Training on informal Portuguese Telegram enhances detection of implicit PRCT framing in formal Italian news, demonstrating effective cross-domain transfer from social media to journalistic discourse.

Usage

Installation


### Basic Usage
```pythonfrom transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torchLoad base model and tokenizer
base_model_name = "mistralai/Mistral-7B-Instruct-v0.3"
model = AutoModelForCausalLM.from_pretrained(
base_model_name,
torch_dtype=torch.float16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(base_model_name)Load LoRA adapter
model = PeftModel.from_pretrained(model, "erikbranmarino/Mistral-PRCT")Prepare prompt
text = "Your Portuguese or Italian text here"
prompt = f"""Classify if the following text contains Population Replacement Conspiracy Theory (PRCT) content.Text: {text}Classification (YES/NO):"""Generate prediction
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=10,
temperature=0.0,
do_sample=False
)
prediction = tokenizer.decode(outputs[0], skip_special_tokens=True)print(prediction)

### Batch Processing Example
```pythondef classify_batch(texts, model, tokenizer, batch_size=8):
"""Classify multiple texts efficiently"""
predictions = []for i in range(0, len(texts), batch_size):
    batch = texts[i:i+batch_size]
    prompts = [f"Classify PRCT: {text}" for text in batch]    inputs = tokenizer(prompts, return_tensors="pt", padding=True).to(model.device)
    outputs = model.generate(**inputs, max_new_tokens=10, temperature=0.0)    for output in outputs:
        pred = tokenizer.decode(output, skip_special_tokens=True)
        predictions.append(pred)return predictions

## Bias and Ethical Considerations

### Known Biases
- **Platform bias**: Optimized for Telegram-style informal discourse
- **Language bias**: Primarily Portuguese, with cross-lingual transfer to Italian
- **Temporal bias**: Training data from 2020-2024 may not capture evolving narratives

### Ethical Use
- โš ๏ธ **Not for automated censorship**: Requires human review
- โœ… **Research purposes**: Understanding conspiracy theory propagation
- โœ… **Content flagging**: Assisting moderators, not replacing them
- โŒ **Surveillance**: Not intended for monitoring individuals

We advocate for freedom of speech and constitutional rights. This tool should support informed moderation, not suppress legitimate discourse.

## Citation
```bibtex@inproceedings{marino2025prct,
title={Population Replacement Conspiracy Theories Detection on Telegram and News Headlines:
benchmarking LLMs and BERT models in Portuguese and Italian},
author={Marino, Erik Bran and Vieira, Renata and Ribeiro, Ana Sofia},
booktitle={Proceedings of PROPOR 2026},
year={2026}
}

## Model Card Authors

Erik Bran Marino (Universidade de ร‰vora, HYBRIDS Project)

## Contact

- **Email**: [email protected]
- **Project**: MSCA HYBRIDS (Grant Agreement No. 101073351)
- **Institution**: Universidade de ร‰vora, Portugal

## License

MIT License - Free for research and educational purposes.

---

**Developed as part of the HYBRIDS Marie Skล‚odowska-Curie Actions project**
Downloads last month
15
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for erikbranmarino/Mistral-PRCT

Adapter
(508)
this model