Wav2Vec2 for Agricultural Speech Transcription (Senegal Project)

(Version française en dessous)

Project Context

This model is part of a student research internship aiming to design a smart voice assistant dedicated to Senegalese farmers.
The overall project — called Agricultural Voice Assistant — combines speech transcription (ASR) with text classification (theme, crop, and intention) in order to answer farmers’ technical questions.
Farmers often ask oral questions in French influenced by local languages (Wolof, Serer, etc.), for example:
> "Which millet variety is suitable for clay soil?"
The goal is to build a robust speech-to-text model adapted to these accents and vocabulary.
Note: This repository contains only the speech transcription model.
For the full pipeline (ASR + classifiers), see the dedicated repository:
👉 Kadidiatou131313/agri-assistant-pipeline

Model Description

This model is a fine-tuned version of [`jonatasgrosman/wav2vec2-large-xlsr-53-french`] on a custom dataset of agricultural voice queries from Senegal.
It achieves the following results on the evaluation set:
- Loss: 0.5955
- WER (Word Error Rate): 0.19

How to Use

Example code to reload and use this transcription model:

import librosa
import torch
from transformers import AutoModelForCTC, Wav2Vec2Processor
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

# Load model and processor
model = AutoModelForCTC.from_pretrained("Kadidiatou131313/wav2vec2-fr-agriculture").to(DEVICE)
processor = Wav2Vec2Processor.from_pretrained("Kadidiatou131313/wav2vec2-fr-agriculture")

def transcribe(audio_path: str) -> str:
    # Load audio in mono, 16kHz
    wav, sr = librosa.load(audio_path, sr=16000, mono=True)

    # Prepare input
    inputs = processor(wav, sampling_rate=16000, return_tensors="pt", padding=True)
    with torch.no_grad():
        logits = model(inputs.input_values.to(DEVICE)).logits
    pred_ids = torch.argmax(logits, dim=-1)
    # Decode to text
    text = processor.decode(pred_ids[0])
    return text.strip()

Training Procedure

Hyperparameters:

Learning rate: 0.0003
Train batch size: 8
Eval batch size: 8
Seed: 42
Optimizer: AdamW (betas=(0.9,0.999), eps=1e-08)
Scheduler: linear
Warmup steps: 100
Epochs: 25
Mixed precision: Native AMP

Training Results

Training Loss	Epoch	Step	Validation Loss	WER
3.9756	2.27	100	3.0001	1.0
3.1017	4.54	200	2.7959	1.0
1.0941	6.82	300	0.6419	0.4625
0.4260	9.09	400	0.4922	0.295
0.2826	11.36	500	0.4858	0.2375
0.2694	13.63	600	0.4439	0.215
0.1746	15.90	700	0.4631	0.2025
0.1464	18.18	800	0.5627	0.185
0.1219	20.45	900	0.5898	0.1875
0.0924	22.72	1000	0.6369	0.2025
0.0981	25.0	1100	0.5955	0.19

Applications

Automatic transcription of farmers’ oral queries in French with Senegalese accents
Preprocessing step for downstream classification models (theme, crop, intention)
Contribution to an end-to-end agricultural voice assistant

Versions des frameworks

Transformers 4.53.2
PyTorch 2.6.0+cu124
Datasets 2.14.4
Tokenizers 0.21.2

Contact

Feel free to use this model in your projects. For questions, suggestions or collaborations:

LinkedIn: Kadidiatou Diallo
Email: kadidiatou.diallo.k@gmail.com

🇫🇷 Wav2Vec2 pour la Transcription Vocale Agricole (Projet Sénégal)

Contexte du projet

Ce modèle fait partie d’un stage de recherche visant à concevoir un assistant vocal intelligent pour les agriculteurs sénégalais.
Le projet global — Assistant Vocal Agricole — combine transcription vocale (ASR) et classification de texte (thématique, culture, intention) afin de répondre aux questions techniques des agriculteurs.

Ces derniers posent souvent des questions en français oral influencé par des langues locales (wolof, sérère…), par exemple :

"Quelle variété de mil est adaptée à un sol argileux ?"

L’objectif est de proposer un modèle robuste de speech-to-text adapté à ce vocabulaire et à ces accents.

Remarque : Ce dépôt contient uniquement le modèle de transcription vocale.
Pour la pipeline complète (ASR + classifieurs), voir le dépôt :
👉 Kadidiatou131313/agri-assistant-pipeline

Description du modèle

Ce modèle est une version fine-tunée de [jonatasgrosman/wav2vec2-large-xlsr-53-french] sur un jeu de données personnalisé de requêtes vocales agricoles sénégalaises.
Il obtient les résultats suivants sur le jeu d’évaluation :

Loss : 0.5955
WER (Word Error Rate) : 0.19

Comment utiliser

Exemple de code pour recharger et utiliser ce modèle :

import librosa
import torch
from transformers import AutoModelForCTC, Wav2Vec2Processor
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
# Charger modèle et processor
model = AutoModelForCTC.from_pretrained("Kadidiatou131313/wav2vec2-fr-agriculture").to(DEVICE)
processor = Wav2Vec2Processor.from_pretrained("Kadidiatou131313/wav2vec2-fr-agriculture")
def transcrire(audio_path: str) -> str:
    # Charger l’audio en mono 16kHz
    wav, sr = librosa.load(audio_path, sr=16000, mono=True)
    # Préparer l’entrée
    inputs = processor(wav, sampling_rate=16000, return_tensors="pt", padding=True)
    with torch.no_grad():
        logits = model(inputs.input_values.to(DEVICE)).logits
    pred_ids = torch.argmax(logits, dim=-1)
    # Décodage en texte
    text = processor.decode(pred_ids[0])
    return text.strip()

Procédure d’entraînement

Hyperparamètres :

Learning rate : 0.0003
Batch size entraînement : 8
Batch size évaluation : 8
Seed : 42
Optimizer : AdamW (betas=(0.9,0.999), eps=1e-08)
Scheduler : linéaire
Warmup steps : 100
Époques : 25
Précision mixte : Native AMP

Résultats d’entraînement

Training Loss	Époque	Step	Validation Loss	WER
3.9756	2.27	100	3.0001	1.0
3.1017	4.54	200	2.7959	1.0
1.0941	6.82	300	0.6419	0.4625
0.4260	9.09	400	0.4922	0.295
0.2826	11.36	500	0.4858	0.2375
0.2694	13.63	600	0.4439	0.215
0.1746	15.90	700	0.4631	0.2025
0.1464	18.18	800	0.5627	0.185
0.1219	20.45	900	0.5898	0.1875
0.0924	22.72	1000	0.6369	0.2025
0.0981	25.0	1100	0.5955	0.19

Applications

Transcription automatique des requêtes orales agricoles en français avec accent sénégalais
Prétraitement pour les modèles de classification (thème, culture, intention)
Intégration dans une pipeline complète d’assistant vocal agricole

Versions des frameworks

Transformers 4.53.2
PyTorch 2.6.0+cu124
Datasets 2.14.4
Tokenizers 0.21.2

Contact

N’hésitez pas à utiliser ce modèle dans vos projets. Pour toute question, suggestion ou collaboration :

LinkedIn : Kadidiatou Diallo
Email : kadidiatou.diallo.k@gmail.com

Downloads last month: 3

Safetensors

Model size

0.3B params

Tensor type

F32

Model tree for Kadidiatou131313/wav2vec2-fr-agriculture

Base model

jonatasgrosman/wav2vec2-large-xlsr-53-french

Finetuned

(3)

this model

Wav2Vec2 for Agricultural Speech Transcription (Senegal Project)

(Version française en dessous)

Project Context

Model Description

This model is a fine-tuned version of [jonatasgrosman/wav2vec2-large-xlsr-53-french] on a custom dataset of agricultural voice queries from Senegal.It achieves the following results on the evaluation set:- Loss: 0.5955- WER (Word Error Rate): 0.19

How to Use

Training Procedure

Training Results

Applications

Versions des frameworks

Contact

🇫🇷 Wav2Vec2 pour la Transcription Vocale Agricole (Projet Sénégal)

Contexte du projet

Description du modèle

Comment utiliser

Procédure d’entraînement

Résultats d’entraînement

Applications

Versions des frameworks

Contact

Model tree for Kadidiatou131313/wav2vec2-fr-agriculture

This model is a fine-tuned version of [`jonatasgrosman/wav2vec2-large-xlsr-53-french`] on a custom dataset of agricultural voice queries from Senegal.
It achieves the following results on the evaluation set:
- Loss: 0.5955
- WER (Word Error Rate): 0.19