Spaces:

Gamahea
/

lemm-test-100

Runtime error

Gamahea commited on 4 days ago

Commit

4a7ff7a

1 Parent(s): 8b1bcac

Add Dataset Card for lemm-dataset repo

- Created comprehensive README for Gamahea/lemm-dataset
- Explains purpose: storage for LEMM LoRAs and training data
- Documents repository structure (loras/ and datasets/)
- Describes automatic sync behavior
- Includes usage examples
- Notes about gated access with automatic approval

Dataset Card uploaded to: https://huggingface.co/datasets/Gamahea/lemm-dataset

Files changed (3) hide show

DATASET_README.md +115 -0
upload_dataset_readme.ps1 +35 -0
upload_dataset_readme.py +31 -0

DATASET_README.md ADDED Viewed

	@@ -0,0 +1,115 @@

+---
+title: LEMM Training Data & LoRA Storage
+tags:
+- music-generation
+- audio
+- lora
+- training-data
+- diffrhythm2
+license: mit
+---
+# LEMM Dataset Storage
+This dataset repository stores training data and LoRA adapters for **LEMM (Let Everyone Make Music)** - an advanced AI music generation system.
+## 🎯 Purpose
+This repository serves as persistent storage for:
+- **LoRA Adapters**: Fine-tuned music generation models
+- **Prepared Datasets**: Training data extracted from various music datasets
+- **Cross-rebuild Persistence**: Data survives HuggingFace Space rebuilds
+## 📁 Repository Structure
+```
+lemm-dataset/
+├── loras/                    # LoRA adapter storage
+│   ├── {lora_name}/         # Each LoRA in its own folder
+│   │   ├── final_model.pt   # Trained LoRA weights
+│   │   └── config.yaml      # Training configuration
+│   └── ...
+│
+└── datasets/                 # Prepared training datasets
+    ├── {dataset_key}/       # Each dataset in its own folder
+    │   ├── train/           # Training samples
+    │   ├── val/             # Validation samples
+    │   └── metadata.json    # Dataset metadata
+    └── ...
+```
+## 🔄 Automatic Sync
+The LEMM Space automatically:
+- **Downloads** all LoRAs and datasets on startup
+- **Uploads** newly trained LoRAs after training completes
+- **Uploads** newly prepared datasets after preparation
+## 🔐 Access Control
+- **Visibility**: Public (anyone can view)
+- **Access Requests**: Enabled with automatic approval
+- **Purpose**: Allows LEMM Space to read/write data
+## 🚀 Usage
+### From LEMM Space
+Data syncs automatically - no manual intervention needed.
+### From Your Own Code
+```python
+from huggingface_hub import hf_hub_download, snapshot_download
+# Download a specific LoRA
+lora_path = snapshot_download(
+    repo_id="Gamahea/lemm-dataset",
+    repo_type="dataset",
+    allow_patterns="loras/your_lora_name/*"
+)
+# Download all datasets
+datasets_path = snapshot_download(
+    repo_id="Gamahea/lemm-dataset",
+    repo_type="dataset",
+    allow_patterns="datasets/*"
+)
+```
+## 📊 Supported Datasets
+LEMM can prepare and train on:
+- **GTZAN**: Music genre classification dataset
+- **MusicCaps**: Google's music captioning dataset
+- **Free Music Archive (FMA)**: Large-scale music dataset
+- **Custom datasets**: Upload your own music collections
+## 🎵 LoRA Training
+LoRA (Low-Rank Adaptation) allows efficient fine-tuning of DiffRhythm2 for:
+- Specific music styles
+- Genre specialization
+- Artist emulation
+- Custom sound aesthetics
+## 🛠️ Related Projects
+- **LEMM Space**: [Gamahea/lemm-test-100](https://huggingface.co/spaces/Gamahea/lemm-test-100)
+- **DiffRhythm2**: Advanced music generation with built-in vocals
+## 📝 License
+MIT License - Feel free to use and modify
+## 🤝 Contributing
+This is a storage repository. To contribute to LEMM:
+1. Visit the [LEMM Space](https://huggingface.co/spaces/Gamahea/lemm-test-100)
+2. Train your own LoRAs
+3. Share your results with the community
+## ⚠️ Notes
+- Data is organized for LEMM's automatic sync system
+- Manual edits may be overwritten by Space operations
+- Each LoRA/dataset includes configuration metadata
+- Storage persists across Space rebuilds

upload_dataset_readme.ps1 ADDED Viewed

	@@ -0,0 +1,35 @@

+# Upload Dataset README to HuggingFace
+# This uploads the Dataset Card to Gamahea/lemm-dataset
+Write-Host "📤 Uploading Dataset Card to Gamahea/lemm-dataset..." -ForegroundColor Cyan
+# Check if huggingface_hub is installed
+python -c "import huggingface_hub" 2>$null
+if ($LASTEXITCODE -ne 0) {
+    Write-Host "❌ huggingface_hub not installed. Installing..." -ForegroundColor Yellow
+    pip install huggingface-hub
+}
+# Upload README to dataset repo
+python -c @"
+from huggingface_hub import HfApi
+from pathlib import Path
+api = HfApi()
+# Upload README as Dataset Card
+try:
+    api.upload_file(
+        repo_id='Gamahea/lemm-dataset',
+        repo_type='dataset',
+        path_or_fileobj='DATASET_README.md',
+        path_in_repo='README.md',
+        commit_message='Add comprehensive Dataset Card documentation'
+    )
+    print('✅ Dataset Card uploaded successfully!')
+except Exception as e:
+    print(f'❌ Upload failed: {e}')
+    print('💡 Make sure you are logged in: huggingface-cli login')
+"@
+Write-Host "`n✅ Done! Check https://huggingface.co/datasets/Gamahea/lemm-dataset" -ForegroundColor Green

upload_dataset_readme.py ADDED Viewed

	@@ -0,0 +1,31 @@

+"""
+Upload Dataset README to HuggingFace lemm-dataset repo
+"""
+from huggingface_hub import HfApi
+from pathlib import Path
+def main():
+    api = HfApi()
+    print("📤 Uploading Dataset Card to Gamahea/lemm-dataset...")
+    try:
+        # Upload README as Dataset Card
+        api.upload_file(
+            repo_id='Gamahea/lemm-dataset',
+            repo_type='dataset',
+            path_or_fileobj='DATASET_README.md',
+            path_in_repo='README.md',
+            commit_message='Add comprehensive Dataset Card documentation'
+        )
+        print('✅ Dataset Card uploaded successfully!')
+        print('🔗 View at: https://huggingface.co/datasets/Gamahea/lemm-dataset')
+    except Exception as e:
+        print(f'❌ Upload failed: {e}')
+        print('💡 Make sure you are logged in: huggingface-cli login')
+        return 1
+    return 0
+if __name__ == '__main__':
+    exit(main())