Gamahea commited on
Commit
4a7ff7a
Β·
1 Parent(s): 8b1bcac

Add Dataset Card for lemm-dataset repo

Browse files

- Created comprehensive README for Gamahea/lemm-dataset
- Explains purpose: storage for LEMM LoRAs and training data
- Documents repository structure (loras/ and datasets/)
- Describes automatic sync behavior
- Includes usage examples
- Notes about gated access with automatic approval

Dataset Card uploaded to: https://huggingface.co/datasets/Gamahea/lemm-dataset

DATASET_README.md ADDED
@@ -0,0 +1,115 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: LEMM Training Data & LoRA Storage
3
+ tags:
4
+ - music-generation
5
+ - audio
6
+ - lora
7
+ - training-data
8
+ - diffrhythm2
9
+ license: mit
10
+ ---
11
+
12
+ # LEMM Dataset Storage
13
+
14
+ This dataset repository stores training data and LoRA adapters for **LEMM (Let Everyone Make Music)** - an advanced AI music generation system.
15
+
16
+ ## 🎯 Purpose
17
+
18
+ This repository serves as persistent storage for:
19
+ - **LoRA Adapters**: Fine-tuned music generation models
20
+ - **Prepared Datasets**: Training data extracted from various music datasets
21
+ - **Cross-rebuild Persistence**: Data survives HuggingFace Space rebuilds
22
+
23
+ ## πŸ“ Repository Structure
24
+
25
+ ```
26
+ lemm-dataset/
27
+ β”œβ”€β”€ loras/ # LoRA adapter storage
28
+ β”‚ β”œβ”€β”€ {lora_name}/ # Each LoRA in its own folder
29
+ β”‚ β”‚ β”œβ”€β”€ final_model.pt # Trained LoRA weights
30
+ β”‚ β”‚ └── config.yaml # Training configuration
31
+ β”‚ └── ...
32
+ β”‚
33
+ └── datasets/ # Prepared training datasets
34
+ β”œβ”€β”€ {dataset_key}/ # Each dataset in its own folder
35
+ β”‚ β”œβ”€β”€ train/ # Training samples
36
+ β”‚ β”œβ”€β”€ val/ # Validation samples
37
+ β”‚ └── metadata.json # Dataset metadata
38
+ └── ...
39
+ ```
40
+
41
+ ## πŸ”„ Automatic Sync
42
+
43
+ The LEMM Space automatically:
44
+ - **Downloads** all LoRAs and datasets on startup
45
+ - **Uploads** newly trained LoRAs after training completes
46
+ - **Uploads** newly prepared datasets after preparation
47
+
48
+ ## πŸ” Access Control
49
+
50
+ - **Visibility**: Public (anyone can view)
51
+ - **Access Requests**: Enabled with automatic approval
52
+ - **Purpose**: Allows LEMM Space to read/write data
53
+
54
+ ## πŸš€ Usage
55
+
56
+ ### From LEMM Space
57
+ Data syncs automatically - no manual intervention needed.
58
+
59
+ ### From Your Own Code
60
+ ```python
61
+ from huggingface_hub import hf_hub_download, snapshot_download
62
+
63
+ # Download a specific LoRA
64
+ lora_path = snapshot_download(
65
+ repo_id="Gamahea/lemm-dataset",
66
+ repo_type="dataset",
67
+ allow_patterns="loras/your_lora_name/*"
68
+ )
69
+
70
+ # Download all datasets
71
+ datasets_path = snapshot_download(
72
+ repo_id="Gamahea/lemm-dataset",
73
+ repo_type="dataset",
74
+ allow_patterns="datasets/*"
75
+ )
76
+ ```
77
+
78
+ ## πŸ“Š Supported Datasets
79
+
80
+ LEMM can prepare and train on:
81
+ - **GTZAN**: Music genre classification dataset
82
+ - **MusicCaps**: Google's music captioning dataset
83
+ - **Free Music Archive (FMA)**: Large-scale music dataset
84
+ - **Custom datasets**: Upload your own music collections
85
+
86
+ ## 🎡 LoRA Training
87
+
88
+ LoRA (Low-Rank Adaptation) allows efficient fine-tuning of DiffRhythm2 for:
89
+ - Specific music styles
90
+ - Genre specialization
91
+ - Artist emulation
92
+ - Custom sound aesthetics
93
+
94
+ ## πŸ› οΈ Related Projects
95
+
96
+ - **LEMM Space**: [Gamahea/lemm-test-100](https://huggingface.co/spaces/Gamahea/lemm-test-100)
97
+ - **DiffRhythm2**: Advanced music generation with built-in vocals
98
+
99
+ ## πŸ“ License
100
+
101
+ MIT License - Feel free to use and modify
102
+
103
+ ## 🀝 Contributing
104
+
105
+ This is a storage repository. To contribute to LEMM:
106
+ 1. Visit the [LEMM Space](https://huggingface.co/spaces/Gamahea/lemm-test-100)
107
+ 2. Train your own LoRAs
108
+ 3. Share your results with the community
109
+
110
+ ## ⚠️ Notes
111
+
112
+ - Data is organized for LEMM's automatic sync system
113
+ - Manual edits may be overwritten by Space operations
114
+ - Each LoRA/dataset includes configuration metadata
115
+ - Storage persists across Space rebuilds
upload_dataset_readme.ps1 ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Upload Dataset README to HuggingFace
2
+ # This uploads the Dataset Card to Gamahea/lemm-dataset
3
+
4
+ Write-Host "πŸ“€ Uploading Dataset Card to Gamahea/lemm-dataset..." -ForegroundColor Cyan
5
+
6
+ # Check if huggingface_hub is installed
7
+ python -c "import huggingface_hub" 2>$null
8
+ if ($LASTEXITCODE -ne 0) {
9
+ Write-Host "❌ huggingface_hub not installed. Installing..." -ForegroundColor Yellow
10
+ pip install huggingface-hub
11
+ }
12
+
13
+ # Upload README to dataset repo
14
+ python -c @"
15
+ from huggingface_hub import HfApi
16
+ from pathlib import Path
17
+
18
+ api = HfApi()
19
+
20
+ # Upload README as Dataset Card
21
+ try:
22
+ api.upload_file(
23
+ repo_id='Gamahea/lemm-dataset',
24
+ repo_type='dataset',
25
+ path_or_fileobj='DATASET_README.md',
26
+ path_in_repo='README.md',
27
+ commit_message='Add comprehensive Dataset Card documentation'
28
+ )
29
+ print('βœ… Dataset Card uploaded successfully!')
30
+ except Exception as e:
31
+ print(f'❌ Upload failed: {e}')
32
+ print('πŸ’‘ Make sure you are logged in: huggingface-cli login')
33
+ "@
34
+
35
+ Write-Host "`nβœ… Done! Check https://huggingface.co/datasets/Gamahea/lemm-dataset" -ForegroundColor Green
upload_dataset_readme.py ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Upload Dataset README to HuggingFace lemm-dataset repo
3
+ """
4
+ from huggingface_hub import HfApi
5
+ from pathlib import Path
6
+
7
+ def main():
8
+ api = HfApi()
9
+
10
+ print("πŸ“€ Uploading Dataset Card to Gamahea/lemm-dataset...")
11
+
12
+ try:
13
+ # Upload README as Dataset Card
14
+ api.upload_file(
15
+ repo_id='Gamahea/lemm-dataset',
16
+ repo_type='dataset',
17
+ path_or_fileobj='DATASET_README.md',
18
+ path_in_repo='README.md',
19
+ commit_message='Add comprehensive Dataset Card documentation'
20
+ )
21
+ print('βœ… Dataset Card uploaded successfully!')
22
+ print('πŸ”— View at: https://huggingface.co/datasets/Gamahea/lemm-dataset')
23
+ except Exception as e:
24
+ print(f'❌ Upload failed: {e}')
25
+ print('πŸ’‘ Make sure you are logged in: huggingface-cli login')
26
+ return 1
27
+
28
+ return 0
29
+
30
+ if __name__ == '__main__':
31
+ exit(main())