Spaces:
Running
on
Zero
A newer version of the Gradio SDK is available:
6.2.0
title: LEMM - Let Everyone Make Music
emoji: π΅
colorFrom: purple
colorTo: pink
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
pinned: false
license: mit
hf_oauth: true
LEMM - Let Everyone Make Music
Version 1.0.0 (Beta)
An advanced AI music generation system with training capabilities, built-in vocals, professional mastering, and audio enhancement. Powered by DiffRhythm2 with LoRA fine-tuning support.
π΅ Live Demo: Try LEMM on HuggingFace Spaces
π¦ LoRA Collection: Browse Trained Models
π’ Organization: lemm-ai on GitHub
β¨ Key Features
π΅ Music Generation
- Text-to-Music: Generate music from style descriptions
- Built-in Vocals: DiffRhythm2 generates vocals directly with music (no separate TTS)
- Style Consistency: New clips inherit musical character from existing ones
- Flexible Duration: 10-120 second clips
π LoRA Training
- Custom Style Training: Fine-tune on your own music datasets
- Public Datasets: GTZAN, MusicCaps, FMA support
- Continued Training: Use existing LoRAs as base models
- Automatic Upload: Trained LoRAs uploaded to HuggingFace Hub
ποΈ Professional Audio Tools
- Advanced Mastering: 32 professional presets (Pop, Rock, Electronic, etc.)
- Custom EQ: 8-band parametric equalizer
- Dynamics: Compression and limiting controls
- Audio Enhancement:
- Stem separation (Demucs)
- Noise reduction
- Super resolution (upscale to 48kHz)
ποΈ DAW-Style Interface
- Horizontal Timeline: Professional multi-track layout
- Visual Waveforms: See your music as you build
- Track Management: Add, remove, rearrange clips
- Real-time Preview: Play individual clips or full timeline
π Quick Start
Option 1: HuggingFace Spaces (Recommended)
Try LEMM instantly with zero setup:
π Launch LEMM Space
- No installation required
- Free GPU access
- Pre-loaded models
- Immediate start
Option 2: Local Installation
Prerequisites:
- Python 3.10 or 3.11
- 16GB+ RAM recommended
- NVIDIA GPU recommended (CUDA 12.x) or CPU
Installation:
# Clone the repository
git clone https://github.com/lemm-ai/LEMM-1.0.0-ALPHA.git
cd LEMM-1.0.0-ALPHA
# Create virtual environment
python -m venv .venv
# Activate virtual environment
# Windows:
.\.venv\Scripts\activate
# Linux/Mac:
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Launch LEMM
python app.py
Access at: http://localhost:7860
π Usage Guide
1οΈβ£ Generate Your First Track
- Enter Music Prompt: Describe the style
- Example: "upbeat electronic dance music with heavy bass"
- Add Lyrics (optional): DiffRhythm2 will sing them
- Leave empty for instrumental
- Set Duration: 10-120 seconds (default: 30s)
- Generate: Click "β¨ Generate Music Clip"
- Preview: Listen in the audio player
2οΈβ£ Build Your Composition
- Timeline Tab: View all generated clips
- Waveform Preview: Visual representation of each clip
- Add More: Generate additional clips at different positions
- Style Consistency: New clips automatically match existing style
3οΈβ£ Master & Export
- Mastering Tab:
- Choose preset (Pop, Rock, EDM, etc.)
- Or customize: EQ, compression, limiting
- Enhancement (optional):
- Stem separation
- Noise reduction
- Audio super resolution
- Export Tab:
- Choose format (WAV, MP3, FLAC)
- Download your finished track
4οΈβ£ Train Custom LoRAs
- Dataset Management Tab:
- Select public dataset (GTZAN, MusicCaps, FMA)
- Or upload your own music
- Download and prepare dataset
- Training Configuration Tab:
- Name your LoRA
- Set training parameters
- Choose base LoRA (optional - for continued training)
- Start training
- Wait for Training: Progress shown in real-time
- Auto-Upload: LoRA uploaded to HuggingFace as model
- Reuse: Download and use in future generations
ποΈ Architecture
Core Technology
DiffRhythm2 (ASLP-lab)
- State-of-the-art music generation with vocals
- Continuous Flow Matching (CFM) diffusion
- MuQ-MuLan style encoding for consistency
- Native vocal generation (no separate TTS)
LoRA Fine-Tuning (PEFT)
- Low-Rank Adaptation for efficient training
- Parameter-efficient fine-tuning
- Custom style specialization
- Continued training support
System Components
LEMM/
βββ app.py # Main Gradio interface
βββ backend/
β βββ services/
β β βββ diffrhythm_service.py # DiffRhythm2 integration
β β βββ lora_training_service.py # LoRA training
β β βββ dataset_service.py # Dataset management
β β βββ mastering_service.py # Audio mastering
β β βββ stem_enhancement_service.py # Audio enhancement
β β βββ audio_upscale_service.py # Super resolution
β β βββ hf_storage_service.py # HuggingFace uploads
β β βββ ...
β βββ routes/ # API endpoints
β βββ models/ # Data schemas
β βββ config/ # Configuration
βββ models/
β βββ diffrhythm2/ # Music generation model
β βββ loras/ # Trained LoRA adapters
β βββ ...
βββ training_data/ # Prepared datasets
βββ outputs/ # Generated music
βββ requirements.txt # Dependencies
Key Dependencies
- torch: 2.4.0+ (PyTorch)
- diffusers: Diffusion models
- transformers: 4.47.1 (HuggingFace)
- peft: LoRA training
- gradio: Web interface
- pedalboard: Audio mastering
- demucs: Stem separation
- huggingface-hub: Model uploads
π Training Your Own LoRAs
Supported Datasets
Public Datasets:
- GTZAN: Music genre classification (1,000 tracks, 10 genres)
- MusicCaps: Google's music captioning dataset
- FMA (Free Music Archive): Large-scale music collection
Custom Datasets:
- Upload your own music collections
- Supports MP3, WAV, FLAC, OGG
Training Process
Prepare Dataset:
- Download or upload music
- Extract audio samples
- Split into train/validation sets
Configure Training:
- LoRA Rank: 4-64 (higher = more expressive, slower)
- Learning Rate: 1e-4 to 1e-3
- Batch Size: 1-8 (depends on GPU memory)
- Epochs: 10-100 (depends on dataset size)
- Base LoRA: Optional - continue from existing model
Monitor Training:
- Real-time loss graphs
- Validation metrics
- Progress percentage
Upload & Share:
- Automatic upload to HuggingFace Hub
- Model ID:
Gamahea/lemm-lora-{your-name} - Add to LEMM Collection
Example: Training on GTZAN
1. Dataset Management β Select GTZAN β Download
2. Prepare Dataset β GTZAN β Prepare (800 train, 200 val)
3. Training Configuration:
- Name: "my_jazz_lora"
- Dataset: gtzan
- Epochs: 50
- LoRA Rank: 8
- Learning Rate: 1e-4
4. Start Training β Wait ~2-4 hours (GPU dependent)
5. β
Uploaded: Gamahea/lemm-lora-my-jazz-lora
6. Reuse in generation or continue training
π¨ LoRA Management
Download from HuggingFace
- Go to LoRA Management Tab
- Enter model ID:
Gamahea/lemm-lora-{name} - Click "Download from Hub"
- Use immediately in generation
Browse Collection
π LEMM LoRA Collection
Discover community-trained LoRAs:
- Genre specialists (jazz, rock, electronic)
- Style adaptations
- Custom fine-tuned models
Export/Import
Export:
- Download trained LoRA as ZIP
- Share with others
- Backup your work
Import:
- Upload LoRA ZIP file
- Instantly available for use
- Continue training from checkpoint
π§ Advanced Configuration
GPU Acceleration
NVIDIA (Recommended):
# CUDA 12.x automatically detected
# No additional configuration needed
CPU Mode:
# Automatic fallback if no GPU detected
# Slower but fully functional
Model Paths
Models downloaded to:
- DiffRhythm2:
models/diffrhythm2/ - LoRAs:
models/loras/ - Training data:
training_data/
Environment Variables
Create .env file:
# HuggingFace token for uploads (optional)
HF_TOKEN=hf_xxxxxxxxxxxxx
# Gradio server port (default: 7860)
GRADIO_SERVER_PORT=7860
# Enable debug logging
DEBUG=false
π Technical Specifications
Generation
- Model: DiffRhythm2 (CFM-based diffusion)
- Sampling: 22050 Hz (can upscale to 48kHz)
- Duration: 10-120 seconds per clip
- Vocals: Built-in (no separate TTS)
- Style Encoding: MuQ-MuLan
Training
- Method: LoRA (Low-Rank Adaptation)
- Rank: 4-64 (configurable)
- Precision: Mixed (FP16/FP32)
- Optimizer: AdamW
- Scheduler: Cosine annealing
Audio Enhancement
- Stem Separation: Demucs 4.0.1 (4-stem)
- Noise Reduction: Spectral subtraction
- Super Resolution: AudioSR (up to 48kHz)
- Mastering: Pedalboard (Spotify LUFS-compliant)
π€ Contributing
We welcome contributions! Here's how:
Report Issues
- GitHub Issues
- Include: steps to reproduce, logs, system info
Share LoRAs
- Train custom LoRA in LEMM
- Upload to HuggingFace (automatic)
- Add to Collection
- Share with community
Development
# Fork the repository
# Clone your fork
git clone https://github.com/YOUR-USERNAME/LEMM-1.0.0-ALPHA.git
# Create feature branch
git checkout -b feature/your-feature
# Make changes and commit
git commit -am "Add your feature"
# Push and create PR
git push origin feature/your-feature
π License
MIT License - See LICENSE file
Free to use, modify, and distribute.
π Acknowledgments
Models & Technologies
- DiffRhythm2: ASLP-lab for state-of-the-art music generation
- LoRA/PEFT: HuggingFace for parameter-efficient fine-tuning
- Gradio: For the beautiful web interface
- Demucs: Meta AI for stem separation
- Pedalboard: Spotify for professional audio processing
Datasets
- GTZAN: Music genre classification dataset
- MusicCaps: Google's music captioning dataset
- FMA: Free Music Archive community
π Support & Community
- Documentation: Full Docs
- HuggingFace Space: Try Now
- LoRA Collection: Browse Models
- Issues: GitHub Issues
π What's Next
Planned Features:
- Multi-track composition tools
- Real-time style transfer
- Collaborative projects
- Mobile app
- VST plugin support
Join the Journey!
Built with β€οΈ by the LEMM community
LEMM - Let Everyone Make Music π΅