File size: 3,431 Bytes

acc7ce8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
02bdefc
acc7ce8
 
 
 
 
 
 
02bdefc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
acc7ce8
 
02bdefc
acc7ce8
02bdefc
acc7ce8
02bdefc
 
acc7ce8
02bdefc
acc7ce8
02bdefc
 
 
 
acc7ce8
02bdefc
acc7ce8
02bdefc
 
acc7ce8
02bdefc
 
 
 
acc7ce8
02bdefc
 
 
 
acc7ce8
 
02bdefc
acc7ce8
02bdefc
 
 
acc7ce8
02bdefc
 
 
acc7ce8
02bdefc
 
 
acc7ce8
02bdefc
 
 
 
 
 
 
acc7ce8
 
02bdefc
acc7ce8
02bdefc
 
 
 
 
 
acc7ce8
02bdefc
 
 
 
 
 
 
acc7ce8
02bdefc
acc7ce8
02bdefc
 
 
 
acc7ce8
02bdefc
acc7ce8
 
02bdefc
 
 
acc7ce8
02bdefc
acc7ce8
 
 
 
02bdefc
acc7ce8
02bdefc

---

language: 
- fr
- en
- de
- es
- it
license: mit
tags:
- text-classification
- commit-messages
- humor-detection
- eurobert
- lora
- git
- optuna-optimized
datasets:
- custom
metrics:
- accuracy
- f1
library_name: transformers
pipeline_tag: text-classification
model-index:
- name: eurobert-commit-humor
  results:
  - task:
      type: text-classification
      name: Text Classification
    dataset:
      type: custom
      name: Git Commit Humor Detection
    metrics:
    - type: accuracy
      value: 85.3
      name: Global Accuracy
    - type: accuracy
      value: 82.9
      name: Funny Class Accuracy
---


# 🎭 EuroBERT Commit Humor Classifier (Optimized)

## 📋 Description

Ce modèle est une version optimisée d'EuroBERT fine-tunée pour détecter l'humour dans les messages de commit Git. 
Il a été optimisé avec Optuna sur plusieurs cycles d'amélioration automatique du dataset.

## 🎯 Performances

- **Accuracy globale**: 85.3%
- **Accuracy classe "funny"**: 82.9%
- **Accuracy classe "neutral"**: 85.6%
- **Seuil optimal**: 0.35

## 🚀 Utilisation

```python

from transformers import pipeline



# Charger le modèle

classifier = pipeline("text-classification", 

                     model="LBerthalon/eurobert-commit-humor", 

                     trust_remote_code=True)



# Prédiction

result = classifier("fix: gcc et moi c'est compliqué")

print(result)

# [{"label": "funny", "score": 0.85}]

```

## 🔧 Utilisation avancée

```python

from transformers import AutoTokenizer, AutoModelForSequenceClassification

import torch



# Charger le modèle et tokenizer

tokenizer = AutoTokenizer.from_pretrained("LBerthalon/eurobert-commit-humor", trust_remote_code=True)

model = AutoModelForSequenceClassification.from_pretrained("LBerthalon/eurobert-commit-humor", trust_remote_code=True)



# Préparer l'input

text = "feat: ajout de la fonctionnalité qui marche pas"

inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)



# Prédiction

with torch.no_grad():

    outputs = model(**inputs)

    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

    

print(f"Funny: {predictions[0][1]:.3f}")

print(f"Neutral: {predictions[0][0]:.3f}")

```

## 📊 Exemples de Prédictions

| Message de Commit | Prédiction | Score |
|-------------------|------------|-------|
| "fix: correction du bug" | neutral | 0.92 |
| "feat: ajout de la magie noire" | funny | 0.78 |
| "docs: mise à jour README" | neutral | 0.95 |
| "fix: ça marche sur ma machine" | funny | 0.83 |

## 🛠️ Optimisation

Ce modèle a été optimisé avec :
- **Optuna** pour l'optimisation bayésienne des hyperparamètres
- **LoRA** (Low-Rank Adaptation) pour un fine-tuning efficace
- **Amélioration itérative** du dataset
- **5 cycles d'optimisation** automatique

## 📈 Architecture

- **Modèle de base**: EuroBERT
- **Technique**: LoRA Fine-tuning
- **Classes**: 2 (funny, neutral)
- **Langues supportées**: Français (principal), Anglais, Allemand, Espagnol, Italien

## 🎓 Citation

```bibtex

@misc{eurobert-commit-humor-optimized,

  title={EuroBERT Commit Humor Classifier (Optimized)},

  author={LBerthalon},

  year={2025},

  publisher={Hugging Face},

  url={https://huggingface.co/LBerthalon/eurobert-commit-humor}

}

```

## 📄 Licence

MIT License