---
title: LEMM Training Data & LoRA Storage
tags:
- music-generation
- audio
- lora
- training-data
- diffrhythm2
license: mit
---

# LEMM Dataset Storage

This dataset repository stores training data and LoRA adapters for **LEMM (Let Everyone Make Music)** - an advanced AI music generation system.

## 🎯 Purpose

This repository serves as persistent storage for:
- **LoRA Adapters**: Fine-tuned music generation models
- **Prepared Datasets**: Training data extracted from various music datasets
- **Cross-rebuild Persistence**: Data survives HuggingFace Space rebuilds

## 📁 Repository Structure

```
lemm-dataset/
├── loras/                    # LoRA adapter storage
│   ├── {lora_name}/         # Each LoRA in its own folder
│   │   ├── final_model.pt   # Trained LoRA weights
│   │   └── config.yaml      # Training configuration
│   └── ...
│
└── datasets/                 # Prepared training datasets
    ├── {dataset_key}/       # Each dataset in its own folder
    │   ├── train/           # Training samples
    │   ├── val/             # Validation samples
    │   └── metadata.json    # Dataset metadata
    └── ...
```

## 🔄 Automatic Sync

The LEMM Space automatically:
- **Downloads** all LoRAs and datasets on startup
- **Uploads** newly trained LoRAs after training completes
- **Uploads** newly prepared datasets after preparation

## 🔐 Access Control

- **Visibility**: Public (anyone can view)
- **Access Requests**: Enabled with automatic approval
- **Purpose**: Allows LEMM Space to read/write data

## 🚀 Usage

### From LEMM Space
Data syncs automatically - no manual intervention needed.

### From Your Own Code
```python
from huggingface_hub import hf_hub_download, snapshot_download

# Download a specific LoRA
lora_path = snapshot_download(
    repo_id="Gamahea/lemm-dataset",
    repo_type="dataset",
    allow_patterns="loras/your_lora_name/*"
)

# Download all datasets
datasets_path = snapshot_download(
    repo_id="Gamahea/lemm-dataset",
    repo_type="dataset",
    allow_patterns="datasets/*"
)
```

## 📊 Supported Datasets

LEMM can prepare and train on:
- **GTZAN**: Music genre classification dataset
- **MusicCaps**: Google's music captioning dataset
- **Free Music Archive (FMA)**: Large-scale music dataset
- **Custom datasets**: Upload your own music collections

## 🎵 LoRA Training

LoRA (Low-Rank Adaptation) allows efficient fine-tuning of DiffRhythm2 for:
- Specific music styles
- Genre specialization
- Artist emulation
- Custom sound aesthetics

## 🛠️ Related Projects

- **LEMM Space**: [Gamahea/lemm-test-100](https://huggingface.co/spaces/Gamahea/lemm-test-100)
- **DiffRhythm2**: Advanced music generation with built-in vocals

## 📝 License

MIT License - Feel free to use and modify

## 🤝 Contributing

This is a storage repository. To contribute to LEMM:
1. Visit the [LEMM Space](https://huggingface.co/spaces/Gamahea/lemm-test-100)
2. Train your own LoRAs
3. Share your results with the community

## ⚠️ Notes

- Data is organized for LEMM's automatic sync system
- Manual edits may be overwritten by Space operations
- Each LoRA/dataset includes configuration metadata
- Storage persists across Space rebuilds