t5-moe-xsum_hash

Model Description

T5 with MoE (Hash Routing) fine-tuned on the XSUM dataset for abstractive summarization.

Architecture

This model uses Sparse Mixture of Experts with deterministic hash-based routing.

Key Features:

  • Hash-based routing (deterministic, no learned gating)
  • 8 expert networks per layer
  • Efficient and reproducible routing

Model Configuration

  • Base Model: google-t5/t5-small
  • Total Parameters: 236,667,392
  • Trainable Parameters: 236,667,392
  • Number of Experts: 8
  • Top-k: 2
  • Routing Strategy: hash
  • Load Balancing: Enabled

Training Data

The model was trained on the XSUM dataset, which contains:

  • ~204k training examples
  • ~11k validation examples
  • ~11k test examples

Each example consists of a BBC news article and a one-sentence summary.

Usage

from transformers import T5Tokenizer

# Load tokenizer
tokenizer = T5Tokenizer.from_pretrained("YOUR_USERNAME/t5-moe-xsum_hash")

# Note: For MoE models, you need to reconstruct the architecture
# See the model repository for detailed loading instructions

Evaluation

Evaluate using standard ROUGE metrics and SummaC consistency scores.

Training Procedure

The model was trained using:

  • AdamW optimizer with weight decay
  • Learning rate: 5e-5
  • Warmup steps: 500
  • Mixed precision (FP16) training
  • Gradient accumulation for larger effective batch size

Limitations

  • Trained only on English news articles
  • May not generalize well to other domains
  • MoE models require custom loading code

Citation

If you use this model, please cite the XSUM dataset:

@inproceedings{narayan-etal-2018-dont,
    title = "Don{'}t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization",
    author = "Narayan, Shashi and Cohen, Shay B. and Lapata, Mirella",
    booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing",
    year = "2018",
}
Downloads last month
3
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Syd-J/t5-moe-hash-xsum