--- title: LEMM Training Data & LoRA Storage tags: - music-generation - audio - lora - training-data - diffrhythm2 license: mit --- # LEMM Dataset Storage This dataset repository stores training data and LoRA adapters for **LEMM (Let Everyone Make Music)** - an advanced AI music generation system. ## 🎯 Purpose This repository serves as persistent storage for: - **LoRA Adapters**: Fine-tuned music generation models - **Prepared Datasets**: Training data extracted from various music datasets - **Cross-rebuild Persistence**: Data survives HuggingFace Space rebuilds ## 📁 Repository Structure ``` lemm-dataset/ ├── loras/ # LoRA adapter storage │ ├── {lora_name}/ # Each LoRA in its own folder │ │ ├── final_model.pt # Trained LoRA weights │ │ └── config.yaml # Training configuration │ └── ... │ └── datasets/ # Prepared training datasets ├── {dataset_key}/ # Each dataset in its own folder │ ├── train/ # Training samples │ ├── val/ # Validation samples │ └── metadata.json # Dataset metadata └── ... ``` ## 🔄 Automatic Sync The LEMM Space automatically: - **Downloads** all LoRAs and datasets on startup - **Uploads** newly trained LoRAs after training completes - **Uploads** newly prepared datasets after preparation ## 🔐 Access Control - **Visibility**: Public (anyone can view) - **Access Requests**: Enabled with automatic approval - **Purpose**: Allows LEMM Space to read/write data ## 🚀 Usage ### From LEMM Space Data syncs automatically - no manual intervention needed. ### From Your Own Code ```python from huggingface_hub import hf_hub_download, snapshot_download # Download a specific LoRA lora_path = snapshot_download( repo_id="Gamahea/lemm-dataset", repo_type="dataset", allow_patterns="loras/your_lora_name/*" ) # Download all datasets datasets_path = snapshot_download( repo_id="Gamahea/lemm-dataset", repo_type="dataset", allow_patterns="datasets/*" ) ``` ## 📊 Supported Datasets LEMM can prepare and train on: - **GTZAN**: Music genre classification dataset - **MusicCaps**: Google's music captioning dataset - **Free Music Archive (FMA)**: Large-scale music dataset - **Custom datasets**: Upload your own music collections ## 🎵 LoRA Training LoRA (Low-Rank Adaptation) allows efficient fine-tuning of DiffRhythm2 for: - Specific music styles - Genre specialization - Artist emulation - Custom sound aesthetics ## 🛠️ Related Projects - **LEMM Space**: [Gamahea/lemm-test-100](https://huggingface.co/spaces/Gamahea/lemm-test-100) - **DiffRhythm2**: Advanced music generation with built-in vocals ## 📝 License MIT License - Feel free to use and modify ## 🤝 Contributing This is a storage repository. To contribute to LEMM: 1. Visit the [LEMM Space](https://huggingface.co/spaces/Gamahea/lemm-test-100) 2. Train your own LoRAs 3. Share your results with the community ## ⚠️ Notes - Data is organized for LEMM's automatic sync system - Manual edits may be overwritten by Space operations - Each LoRA/dataset includes configuration metadata - Storage persists across Space rebuilds