lemm-test-100 / README.md
Gamahea
Fix ZeroGPU compatibility - Dynamic device allocation
d5ccfff

A newer version of the Gradio SDK is available: 6.2.0

Upgrade
metadata
title: LEMM - Let Everyone Make Music
emoji: 🎡
colorFrom: purple
colorTo: pink
sdk: gradio
sdk_version: 4.44.1
app_file: app.py
pinned: false
license: mit
hf_oauth: true

LEMM - Let Everyone Make Music

Version 1.0.0 (Beta)

An advanced AI music generation system with training capabilities, built-in vocals, professional mastering, and audio enhancement. Powered by DiffRhythm2 with LoRA fine-tuning support.

🎡 Live Demo: Try LEMM on HuggingFace Spaces
πŸ“¦ LoRA Collection: Browse Trained Models
🏒 Organization: lemm-ai on GitHub


✨ Key Features

🎡 Music Generation

  • Text-to-Music: Generate music from style descriptions
  • Built-in Vocals: DiffRhythm2 generates vocals directly with music (no separate TTS)
  • Style Consistency: New clips inherit musical character from existing ones
  • Flexible Duration: 10-120 second clips

πŸŽ“ LoRA Training

  • Custom Style Training: Fine-tune on your own music datasets
  • Public Datasets: GTZAN, MusicCaps, FMA support
  • Continued Training: Use existing LoRAs as base models
  • Automatic Upload: Trained LoRAs uploaded to HuggingFace Hub

🎚️ Professional Audio Tools

  • Advanced Mastering: 32 professional presets (Pop, Rock, Electronic, etc.)
  • Custom EQ: 8-band parametric equalizer
  • Dynamics: Compression and limiting controls
  • Audio Enhancement:
    • Stem separation (Demucs)
    • Noise reduction
    • Super resolution (upscale to 48kHz)

πŸŽ›οΈ DAW-Style Interface

  • Horizontal Timeline: Professional multi-track layout
  • Visual Waveforms: See your music as you build
  • Track Management: Add, remove, rearrange clips
  • Real-time Preview: Play individual clips or full timeline

πŸš€ Quick Start

Option 1: HuggingFace Spaces (Recommended)

Try LEMM instantly with zero setup:

πŸ‘‰ Launch LEMM Space

  • No installation required
  • Free GPU access
  • Pre-loaded models
  • Immediate start

Option 2: Local Installation

Prerequisites:

  • Python 3.10 or 3.11
  • 16GB+ RAM recommended
  • NVIDIA GPU recommended (CUDA 12.x) or CPU

Installation:

# Clone the repository
git clone https://github.com/lemm-ai/LEMM-1.0.0-ALPHA.git
cd LEMM-1.0.0-ALPHA

# Create virtual environment
python -m venv .venv

# Activate virtual environment
# Windows:
.\.venv\Scripts\activate
# Linux/Mac:
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Launch LEMM
python app.py

Access at: http://localhost:7860


πŸ“– Usage Guide

1️⃣ Generate Your First Track

  1. Enter Music Prompt: Describe the style
    • Example: "upbeat electronic dance music with heavy bass"
  2. Add Lyrics (optional): DiffRhythm2 will sing them
    • Leave empty for instrumental
  3. Set Duration: 10-120 seconds (default: 30s)
  4. Generate: Click "✨ Generate Music Clip"
  5. Preview: Listen in the audio player

2️⃣ Build Your Composition

  1. Timeline Tab: View all generated clips
  2. Waveform Preview: Visual representation of each clip
  3. Add More: Generate additional clips at different positions
  4. Style Consistency: New clips automatically match existing style

3️⃣ Master & Export

  1. Mastering Tab:
    • Choose preset (Pop, Rock, EDM, etc.)
    • Or customize: EQ, compression, limiting
  2. Enhancement (optional):
    • Stem separation
    • Noise reduction
    • Audio super resolution
  3. Export Tab:
    • Choose format (WAV, MP3, FLAC)
    • Download your finished track

4️⃣ Train Custom LoRAs

  1. Dataset Management Tab:
    • Select public dataset (GTZAN, MusicCaps, FMA)
    • Or upload your own music
    • Download and prepare dataset
  2. Training Configuration Tab:
    • Name your LoRA
    • Set training parameters
    • Choose base LoRA (optional - for continued training)
    • Start training
  3. Wait for Training: Progress shown in real-time
  4. Auto-Upload: LoRA uploaded to HuggingFace as model
  5. Reuse: Download and use in future generations

πŸ—οΈ Architecture

Core Technology

DiffRhythm2 (ASLP-lab)

  • State-of-the-art music generation with vocals
  • Continuous Flow Matching (CFM) diffusion
  • MuQ-MuLan style encoding for consistency
  • Native vocal generation (no separate TTS)

LoRA Fine-Tuning (PEFT)

  • Low-Rank Adaptation for efficient training
  • Parameter-efficient fine-tuning
  • Custom style specialization
  • Continued training support

System Components

LEMM/
β”œβ”€β”€ app.py                      # Main Gradio interface
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ services/
β”‚   β”‚   β”œβ”€β”€ diffrhythm_service.py       # DiffRhythm2 integration
β”‚   β”‚   β”œβ”€β”€ lora_training_service.py    # LoRA training
β”‚   β”‚   β”œβ”€β”€ dataset_service.py          # Dataset management
β”‚   β”‚   β”œβ”€β”€ mastering_service.py        # Audio mastering
β”‚   β”‚   β”œβ”€β”€ stem_enhancement_service.py # Audio enhancement
β”‚   β”‚   β”œβ”€β”€ audio_upscale_service.py    # Super resolution
β”‚   β”‚   β”œβ”€β”€ hf_storage_service.py       # HuggingFace uploads
β”‚   β”‚   └── ...
β”‚   β”œβ”€β”€ routes/                 # API endpoints
β”‚   β”œβ”€β”€ models/                 # Data schemas
β”‚   └── config/                 # Configuration
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ diffrhythm2/           # Music generation model
β”‚   β”œβ”€β”€ loras/                 # Trained LoRA adapters
β”‚   └── ...
β”œβ”€β”€ training_data/             # Prepared datasets
β”œβ”€β”€ outputs/                   # Generated music
└── requirements.txt           # Dependencies

Key Dependencies

  • torch: 2.4.0+ (PyTorch)
  • diffusers: Diffusion models
  • transformers: 4.47.1 (HuggingFace)
  • peft: LoRA training
  • gradio: Web interface
  • pedalboard: Audio mastering
  • demucs: Stem separation
  • huggingface-hub: Model uploads

πŸŽ“ Training Your Own LoRAs

Supported Datasets

Public Datasets:

  • GTZAN: Music genre classification (1,000 tracks, 10 genres)
  • MusicCaps: Google's music captioning dataset
  • FMA (Free Music Archive): Large-scale music collection

Custom Datasets:

  • Upload your own music collections
  • Supports MP3, WAV, FLAC, OGG

Training Process

  1. Prepare Dataset:

    • Download or upload music
    • Extract audio samples
    • Split into train/validation sets
  2. Configure Training:

    • LoRA Rank: 4-64 (higher = more expressive, slower)
    • Learning Rate: 1e-4 to 1e-3
    • Batch Size: 1-8 (depends on GPU memory)
    • Epochs: 10-100 (depends on dataset size)
    • Base LoRA: Optional - continue from existing model
  3. Monitor Training:

    • Real-time loss graphs
    • Validation metrics
    • Progress percentage
  4. Upload & Share:

    • Automatic upload to HuggingFace Hub
    • Model ID: Gamahea/lemm-lora-{your-name}
    • Add to LEMM Collection

Example: Training on GTZAN

1. Dataset Management β†’ Select GTZAN β†’ Download
2. Prepare Dataset β†’ GTZAN β†’ Prepare (800 train, 200 val)
3. Training Configuration:
   - Name: "my_jazz_lora"
   - Dataset: gtzan
   - Epochs: 50
   - LoRA Rank: 8
   - Learning Rate: 1e-4
4. Start Training β†’ Wait ~2-4 hours (GPU dependent)
5. βœ… Uploaded: Gamahea/lemm-lora-my-jazz-lora
6. Reuse in generation or continue training

🎨 LoRA Management

Download from HuggingFace

  1. Go to LoRA Management Tab
  2. Enter model ID: Gamahea/lemm-lora-{name}
  3. Click "Download from Hub"
  4. Use immediately in generation

Browse Collection

πŸ‘‰ LEMM LoRA Collection

Discover community-trained LoRAs:

  • Genre specialists (jazz, rock, electronic)
  • Style adaptations
  • Custom fine-tuned models

Export/Import

Export:

  • Download trained LoRA as ZIP
  • Share with others
  • Backup your work

Import:

  • Upload LoRA ZIP file
  • Instantly available for use
  • Continue training from checkpoint

πŸ”§ Advanced Configuration

GPU Acceleration

NVIDIA (Recommended):

# CUDA 12.x automatically detected
# No additional configuration needed

CPU Mode:

# Automatic fallback if no GPU detected
# Slower but fully functional

Model Paths

Models downloaded to:

  • DiffRhythm2: models/diffrhythm2/
  • LoRAs: models/loras/
  • Training data: training_data/

Environment Variables

Create .env file:

# HuggingFace token for uploads (optional)
HF_TOKEN=hf_xxxxxxxxxxxxx

# Gradio server port (default: 7860)
GRADIO_SERVER_PORT=7860

# Enable debug logging
DEBUG=false

πŸ“Š Technical Specifications

Generation

  • Model: DiffRhythm2 (CFM-based diffusion)
  • Sampling: 22050 Hz (can upscale to 48kHz)
  • Duration: 10-120 seconds per clip
  • Vocals: Built-in (no separate TTS)
  • Style Encoding: MuQ-MuLan

Training

  • Method: LoRA (Low-Rank Adaptation)
  • Rank: 4-64 (configurable)
  • Precision: Mixed (FP16/FP32)
  • Optimizer: AdamW
  • Scheduler: Cosine annealing

Audio Enhancement

  • Stem Separation: Demucs 4.0.1 (4-stem)
  • Noise Reduction: Spectral subtraction
  • Super Resolution: AudioSR (up to 48kHz)
  • Mastering: Pedalboard (Spotify LUFS-compliant)

🀝 Contributing

We welcome contributions! Here's how:

Report Issues

Share LoRAs

  1. Train custom LoRA in LEMM
  2. Upload to HuggingFace (automatic)
  3. Add to Collection
  4. Share with community

Development

# Fork the repository
# Clone your fork
git clone https://github.com/YOUR-USERNAME/LEMM-1.0.0-ALPHA.git

# Create feature branch
git checkout -b feature/your-feature

# Make changes and commit
git commit -am "Add your feature"

# Push and create PR
git push origin feature/your-feature

πŸ“„ License

MIT License - See LICENSE file

Free to use, modify, and distribute.


πŸ™ Acknowledgments

Models & Technologies

  • DiffRhythm2: ASLP-lab for state-of-the-art music generation
  • LoRA/PEFT: HuggingFace for parameter-efficient fine-tuning
  • Gradio: For the beautiful web interface
  • Demucs: Meta AI for stem separation
  • Pedalboard: Spotify for professional audio processing

Datasets

  • GTZAN: Music genre classification dataset
  • MusicCaps: Google's music captioning dataset
  • FMA: Free Music Archive community

πŸ“ž Support & Community


πŸš€ What's Next

Planned Features:

  • Multi-track composition tools
  • Real-time style transfer
  • Collaborative projects
  • Mobile app
  • VST plugin support

Join the Journey!

Built with ❀️ by the LEMM community


LEMM - Let Everyone Make Music 🎡