Masked Diffusion Language Model - Bimodal Gaussian Schedule

🎤 Oral Presentation at BabyLM Workshop @ EMNLP 2025

This model is a Masked Diffusion Language Model (MDLM) trained with a Bimodal Gaussian noise schedule and frequency-informed masking for the BabyLM Challenge 2025.

Model Details

Model Type: Masked Diffusion Language Model
Training Data: BabyLM corpus (100M words, strict track)
Sequence Length: 512 tokens
Noise Schedule: Bimodal Gaussian
Masking Strategy: Frequency-informed with curriculum learning
Tokenizer: BPE with 16,384 vocabulary size

Training Approach

This model uses a diffusion-based training objective that combines:

Bimodal Gaussian noise schedule
Bidirectional context modeling
Frequency-informed masking (prioritizing rare tokens)
NELBO weighting with derivative softening (γ = 0.1)

Performance

Performance on BabyLM Challenge zero-shot tasks:

Task	Score
BLiMP	78.2
BLiMP Supplement	73.6
EWoK	52.5
COMPS	56.6
Entity Tracking	39.7

Usage

from transformers import AutoTokenizer
import torch

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("despoinakk/diffusion_gaussian_babylm")

# Load model (custom modeling code required)
# See: https://github.com/DespoinaKK/babylm-diffusion

Citation

If you use this model, please cite:

TBA

Contact

Despoina Kosmopoulou: [email protected]
Efthymios Georgiou: [email protected]

Acknowledgments

Based on work from:

Downloads last month: 39

despoinakk
/

diffusion_gaussian_babylm