Masked Diffusion Language Model - Bimodal Gaussian Schedule

๐ŸŽค Oral Presentation at BabyLM Workshop @ EMNLP 2025

This model is a Masked Diffusion Language Model (MDLM) trained with a Bimodal Gaussian noise schedule and frequency-informed masking for the BabyLM Challenge 2025.

Model Details

  • Model Type: Masked Diffusion Language Model
  • Training Data: BabyLM corpus (100M words, strict track)
  • Sequence Length: 512 tokens
  • Noise Schedule: Bimodal Gaussian
  • Masking Strategy: Frequency-informed with curriculum learning
  • Tokenizer: BPE with 16,384 vocabulary size

Training Approach

This model uses a diffusion-based training objective that combines:

  • Bimodal Gaussian noise schedule
  • Bidirectional context modeling
  • Frequency-informed masking (prioritizing rare tokens)
  • NELBO weighting with derivative softening (ฮณ = 0.1)

Performance

Performance on BabyLM Challenge zero-shot tasks:

Task Score
BLiMP 78.2
BLiMP Supplement 73.6
EWoK 52.5
COMPS 56.6
Entity Tracking 39.7

Usage

from transformers import AutoTokenizer
import torch

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("despoinakk/diffusion_gaussian_babylm")

# Load model (custom modeling code required)
# See: https://github.com/DespoinaKK/babylm-diffusion

Citation

If you use this model, please cite:

TBA

Links

Contact

Acknowledgments

Based on work from:

Downloads last month
39
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support