Gamahea commited on
Commit
e1e9d05
Β·
verified Β·
1 Parent(s): 5cc20bf

Add ZeroGPU authentication requirements

Browse files
Files changed (1) hide show
  1. README.md +79 -447
README.md CHANGED
@@ -1,447 +1,79 @@
1
- ---
2
- title: LEMM - Let Everyone Make Music
3
- emoji: 🎡
4
- colorFrom: purple
5
- colorTo: pink
6
- sdk: gradio
7
- sdk_version: "4.44.1"
8
- app_file: app.py
9
- pinned: false
10
- license: mit
11
- hf_oauth: true
12
- ---
13
-
14
- # LEMM - Let Everyone Make Music
15
-
16
- **Version 1.0.0 (Beta)**
17
-
18
- An advanced AI music generation system with **training capabilities**, built-in vocals, professional mastering, and audio enhancement. Powered by DiffRhythm2 with LoRA fine-tuning support.
19
-
20
- 🎡 **Live Demo**: [Try LEMM on HuggingFace Spaces](https://huggingface.co/spaces/Gamahea/lemm-test-100)
21
- πŸ“¦ **LoRA Collection**: [Browse Trained Models](https://huggingface.co/collections/Gamahea/lemm-100-pre-beta)
22
- 🏒 **Organization**: [lemm-ai on GitHub](https://github.com/lemm-ai)
23
-
24
- ---
25
-
26
- ## ✨ Key Features
27
-
28
- ### 🎡 Music Generation
29
- - **Text-to-Music**: Generate music from style descriptions
30
- - **Built-in Vocals**: DiffRhythm2 generates vocals directly with music (no separate TTS)
31
- - **Style Consistency**: New clips inherit musical character from existing ones
32
- - **Flexible Duration**: 10-120 second clips
33
-
34
- ### πŸŽ“ LoRA Training
35
- - **Custom Style Training**: Fine-tune on your own music datasets
36
- - **Public Datasets**: GTZAN, MusicCaps, FMA support
37
- - **Continued Training**: Use existing LoRAs as base models
38
- - **Automatic Upload**: Trained LoRAs uploaded to HuggingFace Hub
39
-
40
- ### 🎚️ Professional Audio Tools
41
- - **Advanced Mastering**: 32 professional presets (Pop, Rock, Electronic, etc.)
42
- - **Custom EQ**: 8-band parametric equalizer
43
- - **Dynamics**: Compression and limiting controls
44
- - **Audio Enhancement**:
45
- - Stem separation (Demucs)
46
- - Noise reduction
47
- - Super resolution (upscale to 48kHz)
48
-
49
- ### πŸŽ›οΈ DAW-Style Interface
50
- - **Horizontal Timeline**: Professional multi-track layout
51
- - **Visual Waveforms**: See your music as you build
52
- - **Track Management**: Add, remove, rearrange clips
53
- - **Real-time Preview**: Play individual clips or full timeline
54
-
55
- ---
56
-
57
- ## πŸš€ Quick Start
58
-
59
- ### Option 1: HuggingFace Spaces (Recommended)
60
-
61
- Try LEMM instantly with zero setup:
62
-
63
- πŸ‘‰ **[Launch LEMM Space](https://huggingface.co/spaces/Gamahea/lemm-test-100)**
64
-
65
- - No installation required
66
- - Free GPU access
67
- - Pre-loaded models
68
- - Immediate start
69
-
70
- ### Option 2: Local Installation
71
-
72
- **Prerequisites:**
73
- - Python 3.10 or 3.11
74
- - 16GB+ RAM recommended
75
- - NVIDIA GPU recommended (CUDA 12.x) or CPU
76
-
77
- **Installation:**
78
-
79
- ```bash
80
- # Clone the repository
81
- git clone https://github.com/lemm-ai/LEMM-1.0.0-ALPHA.git
82
- cd LEMM-1.0.0-ALPHA
83
-
84
- # Create virtual environment
85
- python -m venv .venv
86
-
87
- # Activate virtual environment
88
- # Windows:
89
- .\.venv\Scripts\activate
90
- # Linux/Mac:
91
- source .venv/bin/activate
92
-
93
- # Install dependencies
94
- pip install -r requirements.txt
95
-
96
- # Launch LEMM
97
- python app.py
98
- ```
99
-
100
- **Access at**: http://localhost:7860
101
-
102
- ---
103
-
104
- ## πŸ“– Usage Guide
105
-
106
- ### 1️⃣ Generate Your First Track
107
-
108
- 1. **Enter Music Prompt**: Describe the style
109
- - Example: *"upbeat electronic dance music with heavy bass"*
110
- 2. **Add Lyrics** (optional): DiffRhythm2 will sing them
111
- - Leave empty for instrumental
112
- 3. **Set Duration**: 10-120 seconds (default: 30s)
113
- 4. **Generate**: Click "✨ Generate Music Clip"
114
- 5. **Preview**: Listen in the audio player
115
-
116
- ### 2️⃣ Build Your Composition
117
-
118
- 1. **Timeline Tab**: View all generated clips
119
- 2. **Waveform Preview**: Visual representation of each clip
120
- 3. **Add More**: Generate additional clips at different positions
121
- 4. **Style Consistency**: New clips automatically match existing style
122
-
123
- ### 3️⃣ Master & Export
124
-
125
- 1. **Mastering Tab**:
126
- - Choose preset (Pop, Rock, EDM, etc.)
127
- - Or customize: EQ, compression, limiting
128
- 2. **Enhancement** (optional):
129
- - Stem separation
130
- - Noise reduction
131
- - Audio super resolution
132
- 3. **Export Tab**:
133
- - Choose format (WAV, MP3, FLAC)
134
- - Download your finished track
135
-
136
- ### 4️⃣ Train Custom LoRAs
137
-
138
- 1. **Dataset Management Tab**:
139
- - Select public dataset (GTZAN, MusicCaps, FMA)
140
- - Or upload your own music
141
- - Download and prepare dataset
142
- 2. **Training Configuration Tab**:
143
- - Name your LoRA
144
- - Set training parameters
145
- - Choose base LoRA (optional - for continued training)
146
- - Start training
147
- 3. **Wait for Training**: Progress shown in real-time
148
- 4. **Auto-Upload**: LoRA uploaded to HuggingFace as model
149
- 5. **Reuse**: Download and use in future generations
150
-
151
- ---
152
-
153
- ## πŸ—οΈ Architecture
154
-
155
- ### Core Technology
156
-
157
- **DiffRhythm2** (ASLP-lab)
158
- - State-of-the-art music generation with vocals
159
- - Continuous Flow Matching (CFM) diffusion
160
- - MuQ-MuLan style encoding for consistency
161
- - Native vocal generation (no separate TTS)
162
-
163
- **LoRA Fine-Tuning** (PEFT)
164
- - Low-Rank Adaptation for efficient training
165
- - Parameter-efficient fine-tuning
166
- - Custom style specialization
167
- - Continued training support
168
-
169
- ### System Components
170
-
171
- ```
172
- LEMM/
173
- β”œβ”€β”€ app.py # Main Gradio interface
174
- β”œβ”€β”€ backend/
175
- β”‚ β”œβ”€β”€ services/
176
- β”‚ β”‚ β”œβ”€β”€ diffrhythm_service.py # DiffRhythm2 integration
177
- β”‚ οΏ½οΏ½οΏ½ β”œβ”€β”€ lora_training_service.py # LoRA training
178
- β”‚ β”‚ β”œβ”€β”€ dataset_service.py # Dataset management
179
- β”‚ β”‚ β”œβ”€β”€ mastering_service.py # Audio mastering
180
- β”‚ β”‚ β”œβ”€β”€ stem_enhancement_service.py # Audio enhancement
181
- β”‚ β”‚ β”œβ”€β”€ audio_upscale_service.py # Super resolution
182
- β”‚ β”‚ β”œβ”€β”€ hf_storage_service.py # HuggingFace uploads
183
- β”‚ β”‚ └── ...
184
- β”‚ β”œβ”€β”€ routes/ # API endpoints
185
- β”‚ β”œβ”€β”€ models/ # Data schemas
186
- β”‚ └── config/ # Configuration
187
- β”œβ”€β”€ models/
188
- β”‚ β”œβ”€β”€ diffrhythm2/ # Music generation model
189
- β”‚ β”œβ”€β”€ loras/ # Trained LoRA adapters
190
- β”‚ └── ...
191
- β”œβ”€β”€ training_data/ # Prepared datasets
192
- β”œβ”€β”€ outputs/ # Generated music
193
- └── requirements.txt # Dependencies
194
- ```
195
-
196
- ### Key Dependencies
197
-
198
- - **torch**: 2.4.0+ (PyTorch)
199
- - **diffusers**: Diffusion models
200
- - **transformers**: 4.47.1 (HuggingFace)
201
- - **peft**: LoRA training
202
- - **gradio**: Web interface
203
- - **pedalboard**: Audio mastering
204
- - **demucs**: Stem separation
205
- - **huggingface-hub**: Model uploads
206
-
207
- ---
208
-
209
- ## πŸŽ“ Training Your Own LoRAs
210
-
211
- ### Supported Datasets
212
-
213
- **Public Datasets:**
214
- - **GTZAN**: Music genre classification (1,000 tracks, 10 genres)
215
- - **MusicCaps**: Google's music captioning dataset
216
- - **FMA (Free Music Archive)**: Large-scale music collection
217
-
218
- **Custom Datasets:**
219
- - Upload your own music collections
220
- - Supports MP3, WAV, FLAC, OGG
221
-
222
- ### Training Process
223
-
224
- 1. **Prepare Dataset**:
225
- - Download or upload music
226
- - Extract audio samples
227
- - Split into train/validation sets
228
-
229
- 2. **Configure Training**:
230
- - **LoRA Rank**: 4-64 (higher = more expressive, slower)
231
- - **Learning Rate**: 1e-4 to 1e-3
232
- - **Batch Size**: 1-8 (depends on GPU memory)
233
- - **Epochs**: 10-100 (depends on dataset size)
234
- - **Base LoRA**: Optional - continue from existing model
235
-
236
- 3. **Monitor Training**:
237
- - Real-time loss graphs
238
- - Validation metrics
239
- - Progress percentage
240
-
241
- 4. **Upload & Share**:
242
- - Automatic upload to HuggingFace Hub
243
- - Model ID: `Gamahea/lemm-lora-{your-name}`
244
- - Add to [LEMM Collection](https://huggingface.co/collections/Gamahea/lemm-100-pre-beta)
245
-
246
- ### Example: Training on GTZAN
247
-
248
- ```
249
- 1. Dataset Management β†’ Select GTZAN β†’ Download
250
- 2. Prepare Dataset β†’ GTZAN β†’ Prepare (800 train, 200 val)
251
- 3. Training Configuration:
252
- - Name: "my_jazz_lora"
253
- - Dataset: gtzan
254
- - Epochs: 50
255
- - LoRA Rank: 8
256
- - Learning Rate: 1e-4
257
- 4. Start Training β†’ Wait ~2-4 hours (GPU dependent)
258
- 5. βœ… Uploaded: Gamahea/lemm-lora-my-jazz-lora
259
- 6. Reuse in generation or continue training
260
- ```
261
-
262
- ---
263
-
264
- ## 🎨 LoRA Management
265
-
266
- ### Download from HuggingFace
267
-
268
- 1. Go to **LoRA Management Tab**
269
- 2. Enter model ID: `Gamahea/lemm-lora-{name}`
270
- 3. Click "Download from Hub"
271
- 4. Use immediately in generation
272
-
273
- ### Browse Collection
274
-
275
- πŸ‘‰ [LEMM LoRA Collection](https://huggingface.co/collections/Gamahea/lemm-100-pre-beta)
276
-
277
- Discover community-trained LoRAs:
278
- - Genre specialists (jazz, rock, electronic)
279
- - Style adaptations
280
- - Custom fine-tuned models
281
-
282
- ### Export/Import
283
-
284
- **Export:**
285
- - Download trained LoRA as ZIP
286
- - Share with others
287
- - Backup your work
288
-
289
- **Import:**
290
- - Upload LoRA ZIP file
291
- - Instantly available for use
292
- - Continue training from checkpoint
293
-
294
- ---
295
-
296
- ## πŸ”§ Advanced Configuration
297
-
298
- ### GPU Acceleration
299
-
300
- **NVIDIA (Recommended):**
301
- ```bash
302
- # CUDA 12.x automatically detected
303
- # No additional configuration needed
304
- ```
305
-
306
- **CPU Mode:**
307
- ```bash
308
- # Automatic fallback if no GPU detected
309
- # Slower but fully functional
310
- ```
311
-
312
- ### Model Paths
313
-
314
- Models downloaded to:
315
- - DiffRhythm2: `models/diffrhythm2/`
316
- - LoRAs: `models/loras/`
317
- - Training data: `training_data/`
318
-
319
- ### Environment Variables
320
-
321
- Create `.env` file:
322
- ```env
323
- # HuggingFace token for uploads (optional)
324
- HF_TOKEN=hf_xxxxxxxxxxxxx
325
-
326
- # Gradio server port (default: 7860)
327
- GRADIO_SERVER_PORT=7860
328
-
329
- # Enable debug logging
330
- DEBUG=false
331
- ```
332
-
333
- ---
334
-
335
- ## πŸ“Š Technical Specifications
336
-
337
- ### Generation
338
-
339
- - **Model**: DiffRhythm2 (CFM-based diffusion)
340
- - **Sampling**: 22050 Hz (can upscale to 48kHz)
341
- - **Duration**: 10-120 seconds per clip
342
- - **Vocals**: Built-in (no separate TTS)
343
- - **Style Encoding**: MuQ-MuLan
344
-
345
- ### Training
346
-
347
- - **Method**: LoRA (Low-Rank Adaptation)
348
- - **Rank**: 4-64 (configurable)
349
- - **Precision**: Mixed (FP16/FP32)
350
- - **Optimizer**: AdamW
351
- - **Scheduler**: Cosine annealing
352
-
353
- ### Audio Enhancement
354
-
355
- - **Stem Separation**: Demucs 4.0.1 (4-stem)
356
- - **Noise Reduction**: Spectral subtraction
357
- - **Super Resolution**: AudioSR (up to 48kHz)
358
- - **Mastering**: Pedalboard (Spotify LUFS-compliant)
359
-
360
- ---
361
-
362
- ## 🀝 Contributing
363
-
364
- We welcome contributions! Here's how:
365
-
366
- ### Report Issues
367
-
368
- - [GitHub Issues](https://github.com/lemm-ai/LEMM-1.0.0-ALPHA/issues)
369
- - Include: steps to reproduce, logs, system info
370
-
371
- ### Share LoRAs
372
-
373
- 1. Train custom LoRA in LEMM
374
- 2. Upload to HuggingFace (automatic)
375
- 3. Add to [Collection](https://huggingface.co/collections/Gamahea/lemm-100-pre-beta)
376
- 4. Share with community
377
-
378
- ### Development
379
-
380
- ```bash
381
- # Fork the repository
382
- # Clone your fork
383
- git clone https://github.com/YOUR-USERNAME/LEMM-1.0.0-ALPHA.git
384
-
385
- # Create feature branch
386
- git checkout -b feature/your-feature
387
-
388
- # Make changes and commit
389
- git commit -am "Add your feature"
390
-
391
- # Push and create PR
392
- git push origin feature/your-feature
393
- ```
394
-
395
- ---
396
-
397
- ## πŸ“„ License
398
-
399
- **MIT License** - See [LICENSE](LICENSE) file
400
-
401
- Free to use, modify, and distribute.
402
-
403
- ---
404
-
405
- ## πŸ™ Acknowledgments
406
-
407
- ### Models & Technologies
408
-
409
- - **DiffRhythm2**: ASLP-lab for state-of-the-art music generation
410
- - **LoRA/PEFT**: HuggingFace for parameter-efficient fine-tuning
411
- - **Gradio**: For the beautiful web interface
412
- - **Demucs**: Meta AI for stem separation
413
- - **Pedalboard**: Spotify for professional audio processing
414
-
415
- ### Datasets
416
-
417
- - **GTZAN**: Music genre classification dataset
418
- - **MusicCaps**: Google's music captioning dataset
419
- - **FMA**: Free Music Archive community
420
-
421
- ---
422
-
423
- ## πŸ“ž Support & Community
424
-
425
- - **Documentation**: [Full Docs](https://github.com/lemm-ai/LEMM-1.0.0-ALPHA/wiki)
426
- - **HuggingFace Space**: [Try Now](https://huggingface.co/spaces/Gamahea/lemm-test-100)
427
- - **LoRA Collection**: [Browse Models](https://huggingface.co/collections/Gamahea/lemm-100-pre-beta)
428
- - **Issues**: [GitHub Issues](https://github.com/lemm-ai/LEMM-1.0.0-ALPHA/issues)
429
-
430
- ---
431
-
432
- ## πŸš€ What's Next
433
-
434
- **Planned Features:**
435
- - Multi-track composition tools
436
- - Real-time style transfer
437
- - Collaborative projects
438
- - Mobile app
439
- - VST plugin support
440
-
441
- **Join the Journey!**
442
-
443
- Built with ❀️ by the LEMM community
444
-
445
- ---
446
-
447
- **LEMM - Let Everyone Make Music** 🎡
 
1
+ ---
2
+ title: Music Generation Studio
3
+ emoji: 🎡
4
+ colorFrom: purple
5
+ colorTo: pink
6
+ sdk: gradio
7
+ sdk_version: 4.44.0
8
+ app_file: app.py
9
+ pinned: false
10
+ license: mit
11
+ ---
12
+
13
+ # 🎡 Music Generation Studio
14
+
15
+ Create AI-powered music with intelligent prompt analysis and context-aware generation using DiffRhythm2 and LyricMind AI.
16
+
17
+ **⚠️ Important:**
18
+ - This Space requires ZeroGPU to run
19
+ - **You must be logged in** to HuggingFace to use GPU features
20
+ - Free users get daily ZeroGPU quota - check your usage at https://huggingface.co/settings/billing
21
+ - If you see quota errors while logged in, try duplicating this Space to your account
22
+
23
+ ## Features
24
+
25
+ - **Intelligent Music Generation**: DiffRhythm2 model for high-quality music with vocals
26
+ - **Smart Lyrics Generation**: LyricMind AI for context-aware lyric creation
27
+ - **Prompt Analysis**: Automatically detects genre, BPM, and mood from your description
28
+ - **Flexible Vocal Modes**:
29
+ - Instrumental: Pure music without vocals
30
+ - User Lyrics: Provide your own lyrics
31
+ - Auto Lyrics: AI-generated lyrics based on prompt
32
+ - **Timeline Management**: Build complete songs clip-by-clip
33
+ - **Export**: Download your creations in WAV, MP3, or FLAC formats
34
+
35
+ ## How to Use
36
+
37
+ 1. **Generate Music**:
38
+ - Enter a descriptive prompt (e.g., "energetic rock song with electric guitar at 140 BPM")
39
+ - Choose vocal mode (Instrumental, User Lyrics, or Auto Lyrics)
40
+ - Set duration (10-120 seconds)
41
+ - Click "Generate Music Clip"
42
+
43
+ 2. **Manage Timeline**:
44
+ - View all generated clips in the timeline
45
+ - Remove specific clips or clear all
46
+ - Clips are arranged sequentially
47
+
48
+ 3. **Export**:
49
+ - Enter a filename
50
+ - Choose format (WAV recommended for best quality)
51
+ - Download your complete song
52
+
53
+ ## Models
54
+
55
+ - **DiffRhythm2**: Music generation with integrated vocals ([ASLP-lab/DiffRhythm2](https://huggingface.co/ASLP-lab/DiffRhythm2))
56
+ - **MuQ-MuLan**: Music style encoding ([OpenMuQ/MuQ-MuLan-large](https://huggingface.co/OpenMuQ/MuQ-MuLan-large))
57
+
58
+ ## Performance
59
+
60
+ ⏱️ Generation time: ~2-4 minutes per 30-second clip on CPU (HuggingFace Spaces free tier)
61
+
62
+ πŸ’‘ Tip: Start with shorter durations (10-20 seconds) for faster results
63
+
64
+ ## Technical Details
65
+
66
+ - Built with Gradio and PyTorch
67
+ - Uses DiffRhythm2 for music generation with vocals
68
+ - Employs flow-matching techniques for high-quality audio synthesis
69
+ - Supports multiple languages for lyrics (English, Chinese, Japanese)
70
+
71
+ ## Credits
72
+
73
+ - DiffRhythm2 by ASLP-lab
74
+ - MuQ-MuLan by OpenMuQ
75
+ - Application interface and integration by Music Generation App Team
76
+
77
+ ## License
78
+
79
+ MIT License - See LICENSE file for details