Spaces:
Running
on
Zero
Running
on
Zero
Gamahea
commited on
Commit
Β·
aad9d66
1
Parent(s):
7d5476d
Deploy Music Generation Studio - 2025-12-12 16:01
Browse files- .gitignore +9 -0
- README.md +68 -8
- app.py +487 -0
- backend/__init__.py +1 -0
- backend/app.py +80 -0
- backend/config/__init__.py +4 -0
- backend/config/settings.py +50 -0
- backend/routes/__init__.py +6 -0
- backend/routes/export.py +124 -0
- backend/routes/generation.py +191 -0
- backend/routes/mastering.py +185 -0
- backend/routes/timeline.py +137 -0
- backend/run.py +50 -0
- backend/services/__init__.py +12 -0
- backend/services/diffrhythm_service.py +397 -0
- backend/services/export_service.py +123 -0
- backend/services/fish_speech_service.py +148 -0
- backend/services/lyricmind_service.py +220 -0
- backend/services/mastering_service.py +641 -0
- backend/services/style_consistency_service.py +340 -0
- backend/services/timeline_service.py +186 -0
- backend/start_with_env.py +21 -0
- backend/utils/__init__.py +5 -0
- backend/utils/amd_gpu.py +96 -0
- backend/utils/logger.py +57 -0
- backend/utils/prompt_analyzer.py +291 -0
- backend/utils/validators.py +64 -0
- hf_config.py +30 -0
- packages.txt +3 -0
- pre_startup.sh +42 -0
- requirements.txt +46 -0
- setup_diffrhythm2_src.sh +24 -0
.gitignore
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
__pycache__/
|
| 2 |
+
*.pyc
|
| 3 |
+
*.pyo
|
| 4 |
+
.Python
|
| 5 |
+
*.log
|
| 6 |
+
models/
|
| 7 |
+
outputs/
|
| 8 |
+
logs/
|
| 9 |
+
.env
|
README.md
CHANGED
|
@@ -1,14 +1,74 @@
|
|
| 1 |
---
|
| 2 |
-
title:
|
| 3 |
-
emoji:
|
| 4 |
-
colorFrom:
|
| 5 |
-
colorTo:
|
| 6 |
sdk: gradio
|
| 7 |
-
sdk_version:
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
-
license:
|
| 11 |
-
|
| 12 |
---
|
| 13 |
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
title: Music Generation Studio
|
| 3 |
+
emoji: π΅
|
| 4 |
+
colorFrom: purple
|
| 5 |
+
colorTo: pink
|
| 6 |
sdk: gradio
|
| 7 |
+
sdk_version: 4.44.0
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
+
license: mit
|
| 11 |
+
python_version: 3.11
|
| 12 |
---
|
| 13 |
|
| 14 |
+
# π΅ Music Generation Studio
|
| 15 |
+
|
| 16 |
+
Create AI-powered music with intelligent prompt analysis and context-aware generation using DiffRhythm2 and LyricMind AI.
|
| 17 |
+
|
| 18 |
+
## Features
|
| 19 |
+
|
| 20 |
+
- **Intelligent Music Generation**: DiffRhythm2 model for high-quality music with vocals
|
| 21 |
+
- **Smart Lyrics Generation**: LyricMind AI for context-aware lyric creation
|
| 22 |
+
- **Prompt Analysis**: Automatically detects genre, BPM, and mood from your description
|
| 23 |
+
- **Flexible Vocal Modes**:
|
| 24 |
+
- Instrumental: Pure music without vocals
|
| 25 |
+
- User Lyrics: Provide your own lyrics
|
| 26 |
+
- Auto Lyrics: AI-generated lyrics based on prompt
|
| 27 |
+
- **Timeline Management**: Build complete songs clip-by-clip
|
| 28 |
+
- **Export**: Download your creations in WAV, MP3, or FLAC formats
|
| 29 |
+
|
| 30 |
+
## How to Use
|
| 31 |
+
|
| 32 |
+
1. **Generate Music**:
|
| 33 |
+
- Enter a descriptive prompt (e.g., "energetic rock song with electric guitar at 140 BPM")
|
| 34 |
+
- Choose vocal mode (Instrumental, User Lyrics, or Auto Lyrics)
|
| 35 |
+
- Set duration (10-120 seconds)
|
| 36 |
+
- Click "Generate Music Clip"
|
| 37 |
+
|
| 38 |
+
2. **Manage Timeline**:
|
| 39 |
+
- View all generated clips in the timeline
|
| 40 |
+
- Remove specific clips or clear all
|
| 41 |
+
- Clips are arranged sequentially
|
| 42 |
+
|
| 43 |
+
3. **Export**:
|
| 44 |
+
- Enter a filename
|
| 45 |
+
- Choose format (WAV recommended for best quality)
|
| 46 |
+
- Download your complete song
|
| 47 |
+
|
| 48 |
+
## Models
|
| 49 |
+
|
| 50 |
+
- **DiffRhythm2**: Music generation with integrated vocals ([ASLP-lab/DiffRhythm2](https://huggingface.co/ASLP-lab/DiffRhythm2))
|
| 51 |
+
- **MuQ-MuLan**: Music style encoding ([OpenMuQ/MuQ-MuLan-large](https://huggingface.co/OpenMuQ/MuQ-MuLan-large))
|
| 52 |
+
|
| 53 |
+
## Performance
|
| 54 |
+
|
| 55 |
+
β±οΈ Generation time: ~2-4 minutes per 30-second clip on CPU (HuggingFace Spaces free tier)
|
| 56 |
+
|
| 57 |
+
π‘ Tip: Start with shorter durations (10-20 seconds) for faster results
|
| 58 |
+
|
| 59 |
+
## Technical Details
|
| 60 |
+
|
| 61 |
+
- Built with Gradio and PyTorch
|
| 62 |
+
- Uses DiffRhythm2 for music generation with vocals
|
| 63 |
+
- Employs flow-matching techniques for high-quality audio synthesis
|
| 64 |
+
- Supports multiple languages for lyrics (English, Chinese, Japanese)
|
| 65 |
+
|
| 66 |
+
## Credits
|
| 67 |
+
|
| 68 |
+
- DiffRhythm2 by ASLP-lab
|
| 69 |
+
- MuQ-MuLan by OpenMuQ
|
| 70 |
+
- Application interface and integration by Music Generation App Team
|
| 71 |
+
|
| 72 |
+
## License
|
| 73 |
+
|
| 74 |
+
MIT License - See LICENSE file for details
|
app.py
ADDED
|
@@ -0,0 +1,487 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Music Generation Studio - HuggingFace Spaces Deployment
|
| 3 |
+
Main application file for Gradio interface
|
| 4 |
+
"""
|
| 5 |
+
import os
|
| 6 |
+
import sys
|
| 7 |
+
import gradio as gr
|
| 8 |
+
import logging
|
| 9 |
+
from pathlib import Path
|
| 10 |
+
import shutil
|
| 11 |
+
import subprocess
|
| 12 |
+
|
| 13 |
+
# Run DiffRhythm2 source setup if needed
|
| 14 |
+
setup_script = Path(__file__).parent / "setup_diffrhythm2_src.sh"
|
| 15 |
+
if setup_script.exists():
|
| 16 |
+
try:
|
| 17 |
+
subprocess.run(["bash", str(setup_script)], check=True)
|
| 18 |
+
except Exception as e:
|
| 19 |
+
print(f"Warning: Failed to run setup script: {e}")
|
| 20 |
+
|
| 21 |
+
# Configure environment for HuggingFace Spaces (espeak-ng paths, etc.)
|
| 22 |
+
import hf_config
|
| 23 |
+
|
| 24 |
+
# Setup paths for HuggingFace Spaces
|
| 25 |
+
SPACE_DIR = Path(__file__).parent
|
| 26 |
+
sys.path.insert(0, str(SPACE_DIR / 'backend'))
|
| 27 |
+
|
| 28 |
+
# Configure logging
|
| 29 |
+
logging.basicConfig(
|
| 30 |
+
level=logging.INFO,
|
| 31 |
+
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
|
| 32 |
+
)
|
| 33 |
+
logger = logging.getLogger(__name__)
|
| 34 |
+
|
| 35 |
+
# Import services
|
| 36 |
+
try:
|
| 37 |
+
from services.diffrhythm_service import DiffRhythmService
|
| 38 |
+
from services.lyricmind_service import LyricMindService
|
| 39 |
+
from services.timeline_service import TimelineService
|
| 40 |
+
from services.export_service import ExportService
|
| 41 |
+
from config.settings import Config
|
| 42 |
+
from utils.prompt_analyzer import PromptAnalyzer
|
| 43 |
+
except ImportError as e:
|
| 44 |
+
logger.error(f"Import error: {e}")
|
| 45 |
+
raise
|
| 46 |
+
|
| 47 |
+
# Initialize configuration
|
| 48 |
+
config = Config()
|
| 49 |
+
|
| 50 |
+
# Create necessary directories
|
| 51 |
+
os.makedirs("outputs", exist_ok=True)
|
| 52 |
+
os.makedirs("outputs/music", exist_ok=True)
|
| 53 |
+
os.makedirs("outputs/mixed", exist_ok=True)
|
| 54 |
+
os.makedirs("models", exist_ok=True)
|
| 55 |
+
os.makedirs("logs", exist_ok=True)
|
| 56 |
+
|
| 57 |
+
# Initialize services
|
| 58 |
+
timeline_service = TimelineService()
|
| 59 |
+
export_service = ExportService()
|
| 60 |
+
|
| 61 |
+
# Lazy-load AI services (heavy models)
|
| 62 |
+
diffrhythm_service = None
|
| 63 |
+
lyricmind_service = None
|
| 64 |
+
|
| 65 |
+
def get_diffrhythm_service():
|
| 66 |
+
"""Lazy load DiffRhythm service"""
|
| 67 |
+
global diffrhythm_service
|
| 68 |
+
if diffrhythm_service is None:
|
| 69 |
+
logger.info("Loading DiffRhythm2 model...")
|
| 70 |
+
diffrhythm_service = DiffRhythmService(model_path=config.DIFFRHYTHM_MODEL_PATH)
|
| 71 |
+
logger.info("DiffRhythm2 model loaded")
|
| 72 |
+
return diffrhythm_service
|
| 73 |
+
|
| 74 |
+
def get_lyricmind_service():
|
| 75 |
+
"""Lazy load LyricMind service"""
|
| 76 |
+
global lyricmind_service
|
| 77 |
+
if lyricmind_service is None:
|
| 78 |
+
logger.info("Loading LyricMind model...")
|
| 79 |
+
lyricmind_service = LyricMindService(model_path=config.LYRICMIND_MODEL_PATH)
|
| 80 |
+
logger.info("LyricMind model loaded")
|
| 81 |
+
return lyricmind_service
|
| 82 |
+
|
| 83 |
+
def generate_lyrics(prompt: str, duration: int, progress=gr.Progress()):
|
| 84 |
+
"""Generate lyrics from prompt using analysis"""
|
| 85 |
+
try:
|
| 86 |
+
if not prompt or not prompt.strip():
|
| 87 |
+
return "β Please enter a prompt"
|
| 88 |
+
|
| 89 |
+
progress(0, desc="π Analyzing prompt...")
|
| 90 |
+
logger.info(f"Generating lyrics for: {prompt}")
|
| 91 |
+
|
| 92 |
+
# Analyze prompt
|
| 93 |
+
analysis = PromptAnalyzer.analyze(prompt)
|
| 94 |
+
genre = analysis.get('genres', ['general'])[0] if analysis.get('genres') else 'general'
|
| 95 |
+
mood = analysis.get('mood', 'unknown')
|
| 96 |
+
|
| 97 |
+
logger.info(f"Analysis - Genre: {genre}, Mood: {mood}")
|
| 98 |
+
|
| 99 |
+
progress(0.3, desc=f"βοΈ Generating {genre} lyrics...")
|
| 100 |
+
|
| 101 |
+
service = get_lyricmind_service()
|
| 102 |
+
lyrics = service.generate(
|
| 103 |
+
prompt=prompt,
|
| 104 |
+
duration=duration,
|
| 105 |
+
prompt_analysis=analysis
|
| 106 |
+
)
|
| 107 |
+
|
| 108 |
+
progress(1.0, desc="β
Lyrics generated!")
|
| 109 |
+
return lyrics
|
| 110 |
+
|
| 111 |
+
except Exception as e:
|
| 112 |
+
logger.error(f"Error generating lyrics: {e}", exc_info=True)
|
| 113 |
+
return f"β Error: {str(e)}"
|
| 114 |
+
|
| 115 |
+
def generate_music(prompt: str, lyrics: str, lyrics_mode: str, duration: int, position: str, progress=gr.Progress()):
|
| 116 |
+
"""Generate music clip and add to timeline"""
|
| 117 |
+
try:
|
| 118 |
+
if not prompt or not prompt.strip():
|
| 119 |
+
return "β Please enter a music prompt", get_timeline_display(), None
|
| 120 |
+
|
| 121 |
+
# Estimate time (CPU on HF Spaces)
|
| 122 |
+
est_time = int(duration * 4) # Conservative estimate for CPU
|
| 123 |
+
|
| 124 |
+
progress(0, desc=f"π Analyzing prompt... (Est. {est_time}s)")
|
| 125 |
+
logger.info(f"Generating music: {prompt}, mode={lyrics_mode}, duration={duration}s")
|
| 126 |
+
|
| 127 |
+
# Analyze prompt
|
| 128 |
+
analysis = PromptAnalyzer.analyze(prompt)
|
| 129 |
+
genre = analysis.get('genres', ['general'])[0] if analysis.get('genres') else 'general'
|
| 130 |
+
bpm = analysis.get('bpm', 120)
|
| 131 |
+
mood = analysis.get('mood', 'neutral')
|
| 132 |
+
|
| 133 |
+
logger.info(f"Analysis - Genre: {genre}, BPM: {bpm}, Mood: {mood}")
|
| 134 |
+
|
| 135 |
+
# Determine lyrics based on mode
|
| 136 |
+
lyrics_to_use = None
|
| 137 |
+
|
| 138 |
+
if lyrics_mode == "Instrumental":
|
| 139 |
+
logger.info("Generating instrumental (no vocals)")
|
| 140 |
+
progress(0.1, desc=f"πΉ Preparing instrumental generation... ({est_time}s)")
|
| 141 |
+
|
| 142 |
+
elif lyrics_mode == "User Lyrics":
|
| 143 |
+
if not lyrics or not lyrics.strip():
|
| 144 |
+
return "β Please enter lyrics or switch mode", get_timeline_display(), None
|
| 145 |
+
lyrics_to_use = lyrics.strip()
|
| 146 |
+
logger.info("Using user-provided lyrics")
|
| 147 |
+
progress(0.1, desc=f"π€ Preparing vocal generation... ({est_time}s)")
|
| 148 |
+
|
| 149 |
+
elif lyrics_mode == "Auto Lyrics":
|
| 150 |
+
if lyrics and lyrics.strip():
|
| 151 |
+
lyrics_to_use = lyrics.strip()
|
| 152 |
+
logger.info("Using existing lyrics from textbox")
|
| 153 |
+
progress(0.1, desc=f"π€ Using provided lyrics... ({est_time}s)")
|
| 154 |
+
else:
|
| 155 |
+
progress(0.1, desc="βοΈ Generating lyrics...")
|
| 156 |
+
logger.info("Auto-generating lyrics...")
|
| 157 |
+
lyric_service = get_lyricmind_service()
|
| 158 |
+
lyrics_to_use = lyric_service.generate(
|
| 159 |
+
prompt=prompt,
|
| 160 |
+
duration=duration,
|
| 161 |
+
prompt_analysis=analysis
|
| 162 |
+
)
|
| 163 |
+
logger.info(f"Generated {len(lyrics_to_use)} characters of lyrics")
|
| 164 |
+
progress(0.25, desc=f"π΅ Lyrics ready, generating music... ({est_time}s)")
|
| 165 |
+
|
| 166 |
+
# Generate music
|
| 167 |
+
progress(0.3, desc=f"πΌ Generating {genre} at {bpm} BPM... ({est_time}s)")
|
| 168 |
+
service = get_diffrhythm_service()
|
| 169 |
+
|
| 170 |
+
final_path = service.generate(
|
| 171 |
+
prompt=prompt,
|
| 172 |
+
duration=duration,
|
| 173 |
+
lyrics=lyrics_to_use
|
| 174 |
+
)
|
| 175 |
+
|
| 176 |
+
# Add to timeline
|
| 177 |
+
progress(0.9, desc="π Adding to timeline...")
|
| 178 |
+
clip_id = os.path.basename(final_path).split('.')[0]
|
| 179 |
+
|
| 180 |
+
from models.schemas import ClipPosition
|
| 181 |
+
clip_info = timeline_service.add_clip(
|
| 182 |
+
clip_id=clip_id,
|
| 183 |
+
file_path=final_path,
|
| 184 |
+
duration=float(duration),
|
| 185 |
+
position=ClipPosition(position)
|
| 186 |
+
)
|
| 187 |
+
|
| 188 |
+
logger.info(f"Music added to timeline at position {clip_info['timeline_position']}")
|
| 189 |
+
|
| 190 |
+
# Build status message
|
| 191 |
+
progress(1.0, desc="β
Complete!")
|
| 192 |
+
status_msg = f"β
Music generated successfully!\n"
|
| 193 |
+
status_msg += f"πΈ Genre: {genre} | π₯ BPM: {bpm} | π Mood: {mood}\n"
|
| 194 |
+
status_msg += f"π€ Mode: {lyrics_mode} | π Position: {position}\n"
|
| 195 |
+
|
| 196 |
+
if lyrics_mode == "Auto Lyrics" and lyrics_to_use and not lyrics:
|
| 197 |
+
status_msg += "βοΈ (Lyrics auto-generated)"
|
| 198 |
+
|
| 199 |
+
return status_msg, get_timeline_display(), final_path
|
| 200 |
+
|
| 201 |
+
except Exception as e:
|
| 202 |
+
logger.error(f"Error generating music: {e}", exc_info=True)
|
| 203 |
+
return f"β Error: {str(e)}", get_timeline_display(), None
|
| 204 |
+
|
| 205 |
+
def get_timeline_display():
|
| 206 |
+
"""Get timeline clips as formatted text"""
|
| 207 |
+
clips = timeline_service.get_all_clips()
|
| 208 |
+
|
| 209 |
+
if not clips:
|
| 210 |
+
return "π Timeline is empty. Generate clips to get started!"
|
| 211 |
+
|
| 212 |
+
total_duration = timeline_service.get_total_duration()
|
| 213 |
+
|
| 214 |
+
display = f"**π Timeline ({len(clips)} clips, {format_duration(total_duration)} total)**\n\n"
|
| 215 |
+
|
| 216 |
+
for i, clip in enumerate(clips, 1):
|
| 217 |
+
display += f"**{i}.** `{clip['clip_id'][:12]}...` | "
|
| 218 |
+
display += f"β±οΈ {format_duration(clip['duration'])} | "
|
| 219 |
+
display += f"βΆοΈ {format_duration(clip['start_time'])}\n"
|
| 220 |
+
|
| 221 |
+
return display
|
| 222 |
+
|
| 223 |
+
def remove_clip(clip_number: int):
|
| 224 |
+
"""Remove a clip from timeline"""
|
| 225 |
+
try:
|
| 226 |
+
clips = timeline_service.get_all_clips()
|
| 227 |
+
|
| 228 |
+
if not clips:
|
| 229 |
+
return "π Timeline is empty", get_timeline_display()
|
| 230 |
+
|
| 231 |
+
if clip_number < 1 or clip_number > len(clips):
|
| 232 |
+
return f"β Invalid clip number. Choose 1-{len(clips)}", get_timeline_display()
|
| 233 |
+
|
| 234 |
+
clip_id = clips[clip_number - 1]['clip_id']
|
| 235 |
+
timeline_service.remove_clip(clip_id)
|
| 236 |
+
|
| 237 |
+
return f"β
Clip {clip_number} removed", get_timeline_display()
|
| 238 |
+
|
| 239 |
+
except Exception as e:
|
| 240 |
+
logger.error(f"Error removing clip: {e}", exc_info=True)
|
| 241 |
+
return f"β Error: {str(e)}", get_timeline_display()
|
| 242 |
+
|
| 243 |
+
def clear_timeline():
|
| 244 |
+
"""Clear all clips from timeline"""
|
| 245 |
+
try:
|
| 246 |
+
timeline_service.clear()
|
| 247 |
+
return "β
Timeline cleared", get_timeline_display()
|
| 248 |
+
except Exception as e:
|
| 249 |
+
logger.error(f"Error clearing timeline: {e}", exc_info=True)
|
| 250 |
+
return f"β Error: {str(e)}", get_timeline_display()
|
| 251 |
+
|
| 252 |
+
def export_timeline(filename: str, export_format: str, progress=gr.Progress()):
|
| 253 |
+
"""Export timeline to audio file"""
|
| 254 |
+
try:
|
| 255 |
+
clips = timeline_service.get_all_clips()
|
| 256 |
+
|
| 257 |
+
if not clips:
|
| 258 |
+
return "β No clips to export", None
|
| 259 |
+
|
| 260 |
+
if not filename or not filename.strip():
|
| 261 |
+
filename = "output"
|
| 262 |
+
|
| 263 |
+
progress(0, desc="π Merging clips...")
|
| 264 |
+
logger.info(f"Exporting timeline: {filename}.{export_format}")
|
| 265 |
+
|
| 266 |
+
export_service.timeline_service = timeline_service
|
| 267 |
+
|
| 268 |
+
progress(0.5, desc="πΎ Encoding audio...")
|
| 269 |
+
output_path = export_service.merge_clips(
|
| 270 |
+
filename=filename,
|
| 271 |
+
export_format=export_format
|
| 272 |
+
)
|
| 273 |
+
|
| 274 |
+
if output_path:
|
| 275 |
+
progress(1.0, desc="β
Export complete!")
|
| 276 |
+
return f"β
Exported: {os.path.basename(output_path)}", output_path
|
| 277 |
+
else:
|
| 278 |
+
return "β Export failed", None
|
| 279 |
+
|
| 280 |
+
except Exception as e:
|
| 281 |
+
logger.error(f"Error exporting: {e}", exc_info=True)
|
| 282 |
+
return f"β Error: {str(e)}", None
|
| 283 |
+
|
| 284 |
+
def format_duration(seconds: float) -> str:
|
| 285 |
+
"""Format duration as MM:SS"""
|
| 286 |
+
mins = int(seconds // 60)
|
| 287 |
+
secs = int(seconds % 60)
|
| 288 |
+
return f"{mins}:{secs:02d}"
|
| 289 |
+
|
| 290 |
+
# Create Gradio interface
|
| 291 |
+
with gr.Blocks(
|
| 292 |
+
title="π΅ Music Generation Studio",
|
| 293 |
+
theme=gr.themes.Soft(primary_hue="purple", secondary_hue="pink")
|
| 294 |
+
) as app:
|
| 295 |
+
|
| 296 |
+
gr.Markdown(
|
| 297 |
+
"""
|
| 298 |
+
# π΅ Music Generation Studio
|
| 299 |
+
|
| 300 |
+
Create AI-powered music with DiffRhythm2 and LyricMind AI
|
| 301 |
+
|
| 302 |
+
π‘ **Tip**: Start with 10-20 second clips for faster generation on HuggingFace Spaces
|
| 303 |
+
"""
|
| 304 |
+
)
|
| 305 |
+
|
| 306 |
+
with gr.Row():
|
| 307 |
+
# Left Column - Generation
|
| 308 |
+
with gr.Column(scale=2):
|
| 309 |
+
gr.Markdown("### πΌ Music Generation")
|
| 310 |
+
|
| 311 |
+
prompt_input = gr.Textbox(
|
| 312 |
+
label="π― Music Prompt",
|
| 313 |
+
placeholder="energetic rock song with electric guitar at 140 BPM",
|
| 314 |
+
lines=3,
|
| 315 |
+
info="Describe the music style, instruments, tempo, and mood"
|
| 316 |
+
)
|
| 317 |
+
|
| 318 |
+
lyrics_mode = gr.Radio(
|
| 319 |
+
choices=["Instrumental", "User Lyrics", "Auto Lyrics"],
|
| 320 |
+
value="Instrumental",
|
| 321 |
+
label="π€ Vocal Mode",
|
| 322 |
+
info="Instrumental: no vocals | User: provide lyrics | Auto: AI-generated"
|
| 323 |
+
)
|
| 324 |
+
|
| 325 |
+
with gr.Row():
|
| 326 |
+
auto_gen_btn = gr.Button("βοΈ Generate Lyrics", size="sm")
|
| 327 |
+
|
| 328 |
+
lyrics_input = gr.Textbox(
|
| 329 |
+
label="π Lyrics",
|
| 330 |
+
placeholder="Enter lyrics or click 'Generate Lyrics'...",
|
| 331 |
+
lines=6
|
| 332 |
+
)
|
| 333 |
+
|
| 334 |
+
with gr.Row():
|
| 335 |
+
duration_input = gr.Slider(
|
| 336 |
+
minimum=10,
|
| 337 |
+
maximum=60,
|
| 338 |
+
value=20,
|
| 339 |
+
step=5,
|
| 340 |
+
label="β±οΈ Duration (seconds)",
|
| 341 |
+
info="Shorter = faster generation"
|
| 342 |
+
)
|
| 343 |
+
position_input = gr.Radio(
|
| 344 |
+
choices=["intro", "previous", "next", "outro"],
|
| 345 |
+
value="next",
|
| 346 |
+
label="π Position"
|
| 347 |
+
)
|
| 348 |
+
|
| 349 |
+
generate_btn = gr.Button(
|
| 350 |
+
"β¨ Generate Music Clip",
|
| 351 |
+
variant="primary",
|
| 352 |
+
size="lg"
|
| 353 |
+
)
|
| 354 |
+
|
| 355 |
+
gen_status = gr.Textbox(label="π Status", lines=3, interactive=False)
|
| 356 |
+
audio_output = gr.Audio(label="π§ Preview", type="filepath")
|
| 357 |
+
|
| 358 |
+
# Right Column - Timeline
|
| 359 |
+
with gr.Column(scale=1):
|
| 360 |
+
gr.Markdown("### π Timeline")
|
| 361 |
+
|
| 362 |
+
timeline_display = gr.Textbox(
|
| 363 |
+
label="Clips",
|
| 364 |
+
value=get_timeline_display(),
|
| 365 |
+
lines=12,
|
| 366 |
+
interactive=False
|
| 367 |
+
)
|
| 368 |
+
|
| 369 |
+
with gr.Row():
|
| 370 |
+
clip_number_input = gr.Number(
|
| 371 |
+
label="Clip #",
|
| 372 |
+
precision=0,
|
| 373 |
+
minimum=1,
|
| 374 |
+
scale=1
|
| 375 |
+
)
|
| 376 |
+
remove_btn = gr.Button("ποΈ Remove", size="sm", scale=1)
|
| 377 |
+
|
| 378 |
+
clear_btn = gr.Button("ποΈ Clear All", variant="stop")
|
| 379 |
+
timeline_status = gr.Textbox(label="Status", lines=1, interactive=False)
|
| 380 |
+
|
| 381 |
+
# Export Section
|
| 382 |
+
gr.Markdown("---")
|
| 383 |
+
gr.Markdown("### πΎ Export")
|
| 384 |
+
|
| 385 |
+
with gr.Row():
|
| 386 |
+
export_filename = gr.Textbox(
|
| 387 |
+
label="Filename",
|
| 388 |
+
value="my_song",
|
| 389 |
+
scale=2
|
| 390 |
+
)
|
| 391 |
+
export_format = gr.Dropdown(
|
| 392 |
+
choices=["wav", "mp3"],
|
| 393 |
+
value="wav",
|
| 394 |
+
label="Format",
|
| 395 |
+
scale=1
|
| 396 |
+
)
|
| 397 |
+
export_btn = gr.Button("πΎ Export", variant="primary", scale=1)
|
| 398 |
+
|
| 399 |
+
export_status = gr.Textbox(label="Status", lines=1, interactive=False)
|
| 400 |
+
export_audio = gr.Audio(label="π₯ Download", type="filepath")
|
| 401 |
+
|
| 402 |
+
# Event handlers
|
| 403 |
+
auto_gen_btn.click(
|
| 404 |
+
fn=generate_lyrics,
|
| 405 |
+
inputs=[prompt_input, duration_input],
|
| 406 |
+
outputs=lyrics_input
|
| 407 |
+
)
|
| 408 |
+
|
| 409 |
+
generate_btn.click(
|
| 410 |
+
fn=generate_music,
|
| 411 |
+
inputs=[prompt_input, lyrics_input, lyrics_mode, duration_input, position_input],
|
| 412 |
+
outputs=[gen_status, timeline_display, audio_output]
|
| 413 |
+
)
|
| 414 |
+
|
| 415 |
+
remove_btn.click(
|
| 416 |
+
fn=remove_clip,
|
| 417 |
+
inputs=clip_number_input,
|
| 418 |
+
outputs=[timeline_status, timeline_display]
|
| 419 |
+
)
|
| 420 |
+
|
| 421 |
+
clear_btn.click(
|
| 422 |
+
fn=clear_timeline,
|
| 423 |
+
outputs=[timeline_status, timeline_display]
|
| 424 |
+
)
|
| 425 |
+
|
| 426 |
+
export_btn.click(
|
| 427 |
+
fn=export_timeline,
|
| 428 |
+
inputs=[export_filename, export_format],
|
| 429 |
+
outputs=[export_status, export_audio]
|
| 430 |
+
)
|
| 431 |
+
|
| 432 |
+
# Help section
|
| 433 |
+
with gr.Accordion("βΉοΈ Help & Tips", open=False):
|
| 434 |
+
gr.Markdown(
|
| 435 |
+
"""
|
| 436 |
+
## π Quick Start
|
| 437 |
+
|
| 438 |
+
1. **Enter a prompt**: "upbeat pop song with synth at 128 BPM"
|
| 439 |
+
2. **Choose mode**: Instrumental (fastest) or with vocals
|
| 440 |
+
3. **Set duration**: Start with 10-20s for quick results
|
| 441 |
+
4. **Generate**: Click the button and wait ~2-4 minutes
|
| 442 |
+
5. **Export**: Download your complete song
|
| 443 |
+
|
| 444 |
+
## β‘ Performance Tips
|
| 445 |
+
|
| 446 |
+
- **Shorter clips = faster**: 10-20s clips generate in ~1-2 minutes
|
| 447 |
+
- **Instrumental mode**: ~30% faster than with vocals
|
| 448 |
+
- **HF Spaces uses CPU**: Expect 2-4 minutes per 30s clip
|
| 449 |
+
- **Build incrementally**: Generate short clips, then combine
|
| 450 |
+
|
| 451 |
+
## π― Prompt Tips
|
| 452 |
+
|
| 453 |
+
- **Be specific**: "energetic rock with distorted guitar" > "rock song"
|
| 454 |
+
- **Include BPM**: "at 140 BPM" helps set tempo
|
| 455 |
+
- **Mention instruments**: "with piano and drums"
|
| 456 |
+
- **Describe mood**: "melancholic", "upbeat", "aggressive"
|
| 457 |
+
|
| 458 |
+
## π€ Vocal Modes
|
| 459 |
+
|
| 460 |
+
- **Instrumental**: Pure music, no vocals (fastest)
|
| 461 |
+
- **User Lyrics**: Provide your own lyrics
|
| 462 |
+
- **Auto Lyrics**: AI generates lyrics based on prompt
|
| 463 |
+
|
| 464 |
+
## π Timeline
|
| 465 |
+
|
| 466 |
+
- Clips are arranged sequentially
|
| 467 |
+
- Remove or clear clips as needed
|
| 468 |
+
- Export combines all clips into one file
|
| 469 |
+
|
| 470 |
+
---
|
| 471 |
+
|
| 472 |
+
β±οΈ **Average Generation Time**: 2-4 minutes per 30-second clip on CPU
|
| 473 |
+
|
| 474 |
+
π΅ **Models**: DiffRhythm2 + MuQ-MuLan + LyricMind AI
|
| 475 |
+
"""
|
| 476 |
+
)
|
| 477 |
+
|
| 478 |
+
# Configure and launch
|
| 479 |
+
if __name__ == "__main__":
|
| 480 |
+
logger.info("π΅ Starting Music Generation Studio on HuggingFace Spaces...")
|
| 481 |
+
|
| 482 |
+
app.queue(
|
| 483 |
+
default_concurrency_limit=1,
|
| 484 |
+
max_size=5
|
| 485 |
+
)
|
| 486 |
+
|
| 487 |
+
app.launch()
|
backend/__init__.py
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
"""Backend package"""
|
backend/app.py
ADDED
|
@@ -0,0 +1,80 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Main Flask application for the Music Generation App
|
| 3 |
+
"""
|
| 4 |
+
import os
|
| 5 |
+
import logging
|
| 6 |
+
from flask import Flask, jsonify, send_from_directory
|
| 7 |
+
from flask_cors import CORS
|
| 8 |
+
from dotenv import load_dotenv
|
| 9 |
+
|
| 10 |
+
from config.settings import Config
|
| 11 |
+
from routes.generation import generation_bp
|
| 12 |
+
from routes.timeline import timeline_bp
|
| 13 |
+
from routes.export import export_bp
|
| 14 |
+
from routes.mastering import mastering_bp
|
| 15 |
+
from utils.logger import setup_logger
|
| 16 |
+
|
| 17 |
+
# Load environment variables
|
| 18 |
+
load_dotenv()
|
| 19 |
+
|
| 20 |
+
def create_app(config_class=Config):
|
| 21 |
+
"""Application factory pattern"""
|
| 22 |
+
app = Flask(__name__)
|
| 23 |
+
app.config.from_object(config_class)
|
| 24 |
+
|
| 25 |
+
# Enable CORS
|
| 26 |
+
CORS(app, resources={r"/api/*": {"origins": "*"}})
|
| 27 |
+
|
| 28 |
+
# Setup logging
|
| 29 |
+
setup_logger(app)
|
| 30 |
+
|
| 31 |
+
# Create necessary directories
|
| 32 |
+
os.makedirs(app.config['UPLOAD_FOLDER'], exist_ok=True)
|
| 33 |
+
os.makedirs(app.config['OUTPUT_FOLDER'], exist_ok=True)
|
| 34 |
+
os.makedirs(app.config['MODELS_DIR'], exist_ok=True)
|
| 35 |
+
os.makedirs('logs', exist_ok=True)
|
| 36 |
+
|
| 37 |
+
# Register blueprints
|
| 38 |
+
app.register_blueprint(generation_bp, url_prefix='/api/generation')
|
| 39 |
+
app.register_blueprint(timeline_bp, url_prefix='/api/timeline')
|
| 40 |
+
app.register_blueprint(export_bp, url_prefix='/api/export')
|
| 41 |
+
app.register_blueprint(mastering_bp, url_prefix='/api/mastering')
|
| 42 |
+
|
| 43 |
+
# Serve static files from outputs directory with proper MIME types
|
| 44 |
+
@app.route('/outputs/<path:filename>')
|
| 45 |
+
def serve_output(filename):
|
| 46 |
+
response = send_from_directory(app.config['OUTPUT_FOLDER'], filename)
|
| 47 |
+
# Ensure WAV files have correct MIME type
|
| 48 |
+
if filename.lower().endswith('.wav'):
|
| 49 |
+
response.headers['Content-Type'] = 'audio/wav'
|
| 50 |
+
elif filename.lower().endswith('.mp3'):
|
| 51 |
+
response.headers['Content-Type'] = 'audio/mpeg'
|
| 52 |
+
return response
|
| 53 |
+
|
| 54 |
+
# Health check endpoint
|
| 55 |
+
@app.route('/api/health')
|
| 56 |
+
def health_check():
|
| 57 |
+
return jsonify({
|
| 58 |
+
'status': 'healthy',
|
| 59 |
+
'version': '1.0.0'
|
| 60 |
+
})
|
| 61 |
+
|
| 62 |
+
# Error handlers
|
| 63 |
+
@app.errorhandler(404)
|
| 64 |
+
def not_found(error):
|
| 65 |
+
return jsonify({'error': 'Not found'}), 404
|
| 66 |
+
|
| 67 |
+
@app.errorhandler(500)
|
| 68 |
+
def internal_error(error):
|
| 69 |
+
app.logger.error(f'Internal server error: {str(error)}')
|
| 70 |
+
return jsonify({'error': 'Internal server error'}), 500
|
| 71 |
+
|
| 72 |
+
return app
|
| 73 |
+
|
| 74 |
+
if __name__ == '__main__':
|
| 75 |
+
app = create_app()
|
| 76 |
+
port = int(os.getenv('PORT', 5000))
|
| 77 |
+
host = os.getenv('HOST', '0.0.0.0')
|
| 78 |
+
|
| 79 |
+
app.logger.info(f'Starting server on {host}:{port}')
|
| 80 |
+
app.run(host=host, port=port, debug=os.getenv('FLASK_DEBUG', 'False') == 'True')
|
backend/config/__init__.py
ADDED
|
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Configuration package"""
|
| 2 |
+
from .settings import Config, config
|
| 3 |
+
|
| 4 |
+
__all__ = ['Config', 'config']
|
backend/config/settings.py
ADDED
|
@@ -0,0 +1,50 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Application configuration settings
|
| 3 |
+
"""
|
| 4 |
+
import os
|
| 5 |
+
from pathlib import Path
|
| 6 |
+
|
| 7 |
+
class Config:
|
| 8 |
+
"""Base configuration"""
|
| 9 |
+
|
| 10 |
+
# Base directory
|
| 11 |
+
BASE_DIR = Path(__file__).parent.parent.parent
|
| 12 |
+
|
| 13 |
+
# Flask settings
|
| 14 |
+
SECRET_KEY = os.getenv('SECRET_KEY', 'dev-secret-key-change-in-production')
|
| 15 |
+
DEBUG = os.getenv('FLASK_DEBUG', 'False') == 'True'
|
| 16 |
+
|
| 17 |
+
# File upload settings
|
| 18 |
+
UPLOAD_FOLDER = os.getenv('UPLOAD_FOLDER', str(BASE_DIR / 'uploads'))
|
| 19 |
+
OUTPUT_FOLDER = os.getenv('OUTPUT_FOLDER', str(BASE_DIR / 'outputs'))
|
| 20 |
+
MAX_CONTENT_LENGTH = int(os.getenv('MAX_CONTENT_LENGTH', 16 * 1024 * 1024)) # 16MB
|
| 21 |
+
|
| 22 |
+
# Model paths
|
| 23 |
+
MODELS_DIR = BASE_DIR / 'models'
|
| 24 |
+
DIFFRHYTHM_MODEL_PATH = os.getenv('DIFFRHYTHM_MODEL_PATH', str(MODELS_DIR / 'diffrhythm2'))
|
| 25 |
+
FISH_SPEECH_MODEL_PATH = os.getenv('FISH_SPEECH_MODEL_PATH', str(MODELS_DIR / 'fish_speech'))
|
| 26 |
+
LYRICMIND_MODEL_PATH = os.getenv('LYRICMIND_MODEL_PATH', str(MODELS_DIR / 'lyricmind'))
|
| 27 |
+
|
| 28 |
+
# Generation settings
|
| 29 |
+
DEFAULT_CLIP_DURATION = int(os.getenv('DEFAULT_CLIP_DURATION', 30))
|
| 30 |
+
SAMPLE_RATE = int(os.getenv('SAMPLE_RATE', 44100))
|
| 31 |
+
BIT_DEPTH = int(os.getenv('BIT_DEPTH', 16))
|
| 32 |
+
|
| 33 |
+
# Logging
|
| 34 |
+
LOG_LEVEL = os.getenv('LOG_LEVEL', 'INFO')
|
| 35 |
+
LOG_FILE = os.getenv('LOG_FILE', str(BASE_DIR / 'logs' / 'app.log'))
|
| 36 |
+
|
| 37 |
+
class DevelopmentConfig(Config):
|
| 38 |
+
"""Development configuration"""
|
| 39 |
+
DEBUG = True
|
| 40 |
+
|
| 41 |
+
class ProductionConfig(Config):
|
| 42 |
+
"""Production configuration"""
|
| 43 |
+
DEBUG = False
|
| 44 |
+
|
| 45 |
+
# Configuration dictionary
|
| 46 |
+
config = {
|
| 47 |
+
'development': DevelopmentConfig,
|
| 48 |
+
'production': ProductionConfig,
|
| 49 |
+
'default': DevelopmentConfig
|
| 50 |
+
}
|
backend/routes/__init__.py
ADDED
|
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Routes package"""
|
| 2 |
+
from .generation import generation_bp
|
| 3 |
+
from .timeline import timeline_bp
|
| 4 |
+
from .export import export_bp
|
| 5 |
+
|
| 6 |
+
__all__ = ['generation_bp', 'timeline_bp', 'export_bp']
|
backend/routes/export.py
ADDED
|
@@ -0,0 +1,124 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Routes for exporting/downloading music
|
| 3 |
+
"""
|
| 4 |
+
import logging
|
| 5 |
+
import os
|
| 6 |
+
from flask import Blueprint, request, jsonify, send_file, current_app
|
| 7 |
+
from services.export_service import ExportService
|
| 8 |
+
from models.schemas import ExportFormat
|
| 9 |
+
|
| 10 |
+
logger = logging.getLogger(__name__)
|
| 11 |
+
|
| 12 |
+
export_bp = Blueprint('export', __name__)
|
| 13 |
+
export_service = ExportService()
|
| 14 |
+
|
| 15 |
+
@export_bp.route('/merge', methods=['POST'])
|
| 16 |
+
def merge_timeline():
|
| 17 |
+
"""
|
| 18 |
+
Merge all clips in the timeline into a single file
|
| 19 |
+
|
| 20 |
+
Request body:
|
| 21 |
+
{
|
| 22 |
+
"format": "wav", // wav, mp3, flac
|
| 23 |
+
"filename": "my_song" // optional
|
| 24 |
+
}
|
| 25 |
+
"""
|
| 26 |
+
try:
|
| 27 |
+
data = request.get_json() or {}
|
| 28 |
+
logger.info("Merging timeline clips")
|
| 29 |
+
|
| 30 |
+
# Validate format
|
| 31 |
+
export_format = data.get('format', 'wav')
|
| 32 |
+
try:
|
| 33 |
+
ExportFormat(export_format)
|
| 34 |
+
except ValueError:
|
| 35 |
+
return jsonify({
|
| 36 |
+
'error': f"Invalid format. Must be one of: {', '.join([f.value for f in ExportFormat])}"
|
| 37 |
+
}), 400
|
| 38 |
+
|
| 39 |
+
filename = data.get('filename', 'merged_output')
|
| 40 |
+
|
| 41 |
+
# Merge clips
|
| 42 |
+
output_path = export_service.merge_clips(
|
| 43 |
+
filename=filename,
|
| 44 |
+
export_format=export_format
|
| 45 |
+
)
|
| 46 |
+
|
| 47 |
+
if not output_path:
|
| 48 |
+
return jsonify({
|
| 49 |
+
'error': 'No clips to merge. Add clips to timeline first.'
|
| 50 |
+
}), 400
|
| 51 |
+
|
| 52 |
+
logger.info(f"Timeline merged successfully: {output_path}")
|
| 53 |
+
|
| 54 |
+
return jsonify({
|
| 55 |
+
'success': True,
|
| 56 |
+
'file_path': output_path,
|
| 57 |
+
'filename': os.path.basename(output_path)
|
| 58 |
+
})
|
| 59 |
+
|
| 60 |
+
except Exception as e:
|
| 61 |
+
logger.error(f"Error merging timeline: {str(e)}", exc_info=True)
|
| 62 |
+
return jsonify({
|
| 63 |
+
'error': 'Failed to merge timeline',
|
| 64 |
+
'details': str(e)
|
| 65 |
+
}), 500
|
| 66 |
+
|
| 67 |
+
@export_bp.route('/download/<filename>', methods=['GET'])
|
| 68 |
+
def download_file(filename):
|
| 69 |
+
"""Download an exported file"""
|
| 70 |
+
try:
|
| 71 |
+
output_folder = current_app.config['OUTPUT_FOLDER']
|
| 72 |
+
file_path = os.path.join(output_folder, filename)
|
| 73 |
+
|
| 74 |
+
if not os.path.exists(file_path):
|
| 75 |
+
return jsonify({'error': 'File not found'}), 404
|
| 76 |
+
|
| 77 |
+
# Security check: ensure file is in output folder
|
| 78 |
+
if not os.path.abspath(file_path).startswith(os.path.abspath(output_folder)):
|
| 79 |
+
return jsonify({'error': 'Invalid file path'}), 403
|
| 80 |
+
|
| 81 |
+
logger.info(f"Downloading file: {filename}")
|
| 82 |
+
|
| 83 |
+
return send_file(
|
| 84 |
+
file_path,
|
| 85 |
+
as_attachment=True,
|
| 86 |
+
download_name=filename
|
| 87 |
+
)
|
| 88 |
+
|
| 89 |
+
except Exception as e:
|
| 90 |
+
logger.error(f"Error downloading file: {str(e)}", exc_info=True)
|
| 91 |
+
return jsonify({'error': str(e)}), 500
|
| 92 |
+
|
| 93 |
+
@export_bp.route('/export-clip/<clip_id>', methods=['GET'])
|
| 94 |
+
def export_single_clip(clip_id):
|
| 95 |
+
"""Export a single clip"""
|
| 96 |
+
try:
|
| 97 |
+
export_format = request.args.get('format', 'wav')
|
| 98 |
+
|
| 99 |
+
try:
|
| 100 |
+
ExportFormat(export_format)
|
| 101 |
+
except ValueError:
|
| 102 |
+
return jsonify({
|
| 103 |
+
'error': f"Invalid format. Must be one of: {', '.join([f.value for f in ExportFormat])}"
|
| 104 |
+
}), 400
|
| 105 |
+
|
| 106 |
+
logger.info(f"Exporting single clip: {clip_id}")
|
| 107 |
+
|
| 108 |
+
output_path = export_service.export_clip(
|
| 109 |
+
clip_id=clip_id,
|
| 110 |
+
export_format=export_format
|
| 111 |
+
)
|
| 112 |
+
|
| 113 |
+
if not output_path:
|
| 114 |
+
return jsonify({'error': 'Clip not found'}), 404
|
| 115 |
+
|
| 116 |
+
return jsonify({
|
| 117 |
+
'success': True,
|
| 118 |
+
'file_path': output_path,
|
| 119 |
+
'filename': os.path.basename(output_path)
|
| 120 |
+
})
|
| 121 |
+
|
| 122 |
+
except Exception as e:
|
| 123 |
+
logger.error(f"Error exporting clip: {str(e)}", exc_info=True)
|
| 124 |
+
return jsonify({'error': str(e)}), 500
|
backend/routes/generation.py
ADDED
|
@@ -0,0 +1,191 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Routes for music generation
|
| 3 |
+
"""
|
| 4 |
+
import os
|
| 5 |
+
import logging
|
| 6 |
+
from flask import Blueprint, request, jsonify, current_app
|
| 7 |
+
from services.diffrhythm_service import DiffRhythmService
|
| 8 |
+
from services.lyricmind_service import LyricMindService
|
| 9 |
+
from services.style_consistency_service import StyleConsistencyService
|
| 10 |
+
from services.timeline_service import TimelineService
|
| 11 |
+
from models.schemas import GenerationRequest, LyricsRequest
|
| 12 |
+
from utils.validators import validate_generation_params
|
| 13 |
+
from utils.prompt_analyzer import PromptAnalyzer
|
| 14 |
+
|
| 15 |
+
logger = logging.getLogger(__name__)
|
| 16 |
+
|
| 17 |
+
generation_bp = Blueprint('generation', __name__)
|
| 18 |
+
|
| 19 |
+
# Initialize services (lazy loading)
|
| 20 |
+
diffrhythm_service = None
|
| 21 |
+
lyricmind_service = None
|
| 22 |
+
style_service = None
|
| 23 |
+
timeline_service = None
|
| 24 |
+
|
| 25 |
+
def get_diffrhythm_service():
|
| 26 |
+
"""Get or create DiffRhythm service instance"""
|
| 27 |
+
global diffrhythm_service
|
| 28 |
+
if diffrhythm_service is None:
|
| 29 |
+
diffrhythm_service = DiffRhythmService(
|
| 30 |
+
model_path=current_app.config['DIFFRHYTHM_MODEL_PATH']
|
| 31 |
+
)
|
| 32 |
+
return diffrhythm_service
|
| 33 |
+
|
| 34 |
+
def get_lyricmind_service():
|
| 35 |
+
"""Get or create LyricMind service instance"""
|
| 36 |
+
global lyricmind_service
|
| 37 |
+
if lyricmind_service is None:
|
| 38 |
+
lyricmind_service = LyricMindService(
|
| 39 |
+
model_path=current_app.config['LYRICMIND_MODEL_PATH']
|
| 40 |
+
)
|
| 41 |
+
return lyricmind_service
|
| 42 |
+
|
| 43 |
+
def get_style_service():
|
| 44 |
+
"""Get or create Style Consistency service instance"""
|
| 45 |
+
global style_service
|
| 46 |
+
if style_service is None:
|
| 47 |
+
style_service = StyleConsistencyService()
|
| 48 |
+
return style_service
|
| 49 |
+
|
| 50 |
+
def get_timeline_service():
|
| 51 |
+
"""Get or create Timeline service instance"""
|
| 52 |
+
global timeline_service
|
| 53 |
+
if timeline_service is None:
|
| 54 |
+
timeline_service = TimelineService()
|
| 55 |
+
return timeline_service
|
| 56 |
+
|
| 57 |
+
@generation_bp.route('/generate-lyrics', methods=['POST'])
|
| 58 |
+
def generate_lyrics():
|
| 59 |
+
"""Generate lyrics from prompt using LyricMind AI with prompt analysis"""
|
| 60 |
+
try:
|
| 61 |
+
data = LyricsRequest(**request.json)
|
| 62 |
+
|
| 63 |
+
# Analyze prompt for better context
|
| 64 |
+
logger.info(f"Analyzing prompt for lyrics generation: {data.prompt}")
|
| 65 |
+
prompt_analysis = PromptAnalyzer.analyze(data.prompt)
|
| 66 |
+
logger.info(f"Prompt analysis: {prompt_analysis}")
|
| 67 |
+
|
| 68 |
+
# Get lyrics service
|
| 69 |
+
lyrics_service = get_lyricmind_service()
|
| 70 |
+
|
| 71 |
+
# Generate lyrics with analysis context
|
| 72 |
+
style = data.style or (prompt_analysis.get('genres', [''])[0] if prompt_analysis.get('genres') else None)
|
| 73 |
+
logger.info(f"Generating lyrics with style: {style}")
|
| 74 |
+
|
| 75 |
+
lyrics = lyrics_service.generate(
|
| 76 |
+
prompt=data.prompt,
|
| 77 |
+
style=style,
|
| 78 |
+
duration=data.duration,
|
| 79 |
+
prompt_analysis=prompt_analysis
|
| 80 |
+
)
|
| 81 |
+
|
| 82 |
+
return jsonify({
|
| 83 |
+
'lyrics': lyrics,
|
| 84 |
+
'analysis': prompt_analysis
|
| 85 |
+
})
|
| 86 |
+
|
| 87 |
+
except ValueError as e:
|
| 88 |
+
logger.error(f"Validation error: {str(e)}")
|
| 89 |
+
return jsonify({'error': str(e)}), 400
|
| 90 |
+
except Exception as e:
|
| 91 |
+
logger.error(f"Error generating lyrics: {str(e)}", exc_info=True)
|
| 92 |
+
return jsonify({'error': f'Failed to generate lyrics: {str(e)}'}), 500
|
| 93 |
+
|
| 94 |
+
@generation_bp.route('/generate-music', methods=['POST'])
|
| 95 |
+
def generate_music():
|
| 96 |
+
"""
|
| 97 |
+
Generate music clip from prompt with optional vocals
|
| 98 |
+
|
| 99 |
+
Request body:
|
| 100 |
+
{
|
| 101 |
+
"prompt": "upbeat pop song with drums",
|
| 102 |
+
"lyrics": "optional lyrics text",
|
| 103 |
+
"duration": 30
|
| 104 |
+
}
|
| 105 |
+
"""
|
| 106 |
+
try:
|
| 107 |
+
data = request.get_json()
|
| 108 |
+
logger.info(f"Received music generation request: {data.get('prompt', 'No prompt')}")
|
| 109 |
+
|
| 110 |
+
# Validate request
|
| 111 |
+
validation_error = validate_generation_params(data)
|
| 112 |
+
if validation_error:
|
| 113 |
+
return jsonify({'error': validation_error}), 400
|
| 114 |
+
|
| 115 |
+
# Parse request
|
| 116 |
+
gen_request = GenerationRequest(**data)
|
| 117 |
+
|
| 118 |
+
# Analyze prompt for musical attributes
|
| 119 |
+
prompt_analysis = PromptAnalyzer.analyze(gen_request.prompt)
|
| 120 |
+
logger.info(f"Prompt analysis: {prompt_analysis['analysis_text']}")
|
| 121 |
+
|
| 122 |
+
# Get timeline clips for style consistency
|
| 123 |
+
timeline_svc = get_timeline_service()
|
| 124 |
+
existing_clips = timeline_svc.get_all_clips()
|
| 125 |
+
|
| 126 |
+
# Prepare style guidance if clips exist
|
| 127 |
+
reference_audio = None
|
| 128 |
+
style_profile = {}
|
| 129 |
+
enhanced_prompt = gen_request.prompt
|
| 130 |
+
|
| 131 |
+
if existing_clips:
|
| 132 |
+
logger.info(f"Found {len(existing_clips)} existing clips - applying style consistency")
|
| 133 |
+
style_svc = get_style_service()
|
| 134 |
+
reference_audio, style_profile = style_svc.get_style_guidance_for_generation(existing_clips)
|
| 135 |
+
|
| 136 |
+
# Enhance prompt with style characteristics
|
| 137 |
+
enhanced_prompt = style_svc.enhance_prompt_with_style(gen_request.prompt, style_profile)
|
| 138 |
+
logger.info(f"Enhanced prompt for style consistency: {enhanced_prompt}")
|
| 139 |
+
else:
|
| 140 |
+
logger.info("No existing clips - generating without style guidance")
|
| 141 |
+
|
| 142 |
+
# Generate music with DiffRhythm2 (includes vocals if lyrics provided)
|
| 143 |
+
service = get_diffrhythm_service()
|
| 144 |
+
lyrics_to_use = gen_request.lyrics if gen_request.lyrics else None
|
| 145 |
+
|
| 146 |
+
final_path = service.generate(
|
| 147 |
+
prompt=enhanced_prompt,
|
| 148 |
+
duration=gen_request.duration,
|
| 149 |
+
lyrics=lyrics_to_use,
|
| 150 |
+
reference_audio=reference_audio
|
| 151 |
+
)
|
| 152 |
+
|
| 153 |
+
logger.info(f"Music generation successful: {final_path}")
|
| 154 |
+
|
| 155 |
+
# Convert filesystem path to URL path (forward slashes, relative to outputs)
|
| 156 |
+
relative_path = os.path.relpath(final_path, 'outputs')
|
| 157 |
+
url_path = f"/outputs/{relative_path.replace(os.sep, '/')}"
|
| 158 |
+
|
| 159 |
+
return jsonify({
|
| 160 |
+
'success': True,
|
| 161 |
+
'clip_id': os.path.basename(final_path).split('.')[0],
|
| 162 |
+
'file_path': url_path,
|
| 163 |
+
'duration': gen_request.duration,
|
| 164 |
+
'analysis': prompt_analysis,
|
| 165 |
+
'style_consistent': len(existing_clips) > 0,
|
| 166 |
+
'num_reference_clips': len(existing_clips)
|
| 167 |
+
})
|
| 168 |
+
|
| 169 |
+
except Exception as e:
|
| 170 |
+
logger.error(f"Error generating music: {str(e)}", exc_info=True)
|
| 171 |
+
return jsonify({
|
| 172 |
+
'error': 'Failed to generate music',
|
| 173 |
+
'details': str(e)
|
| 174 |
+
}), 500
|
| 175 |
+
|
| 176 |
+
@generation_bp.route('/status', methods=['GET'])
|
| 177 |
+
def get_status():
|
| 178 |
+
"""Check if generation services are available"""
|
| 179 |
+
try:
|
| 180 |
+
status = {
|
| 181 |
+
'diffrhythm': diffrhythm_service is not None
|
| 182 |
+
}
|
| 183 |
+
|
| 184 |
+
return jsonify({
|
| 185 |
+
'services': status,
|
| 186 |
+
'ready': status['diffrhythm']
|
| 187 |
+
})
|
| 188 |
+
|
| 189 |
+
except Exception as e:
|
| 190 |
+
logger.error(f"Error checking status: {str(e)}")
|
| 191 |
+
return jsonify({'error': str(e)}), 500
|
backend/routes/mastering.py
ADDED
|
@@ -0,0 +1,185 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Routes for audio mastering and EQ
|
| 3 |
+
"""
|
| 4 |
+
import os
|
| 5 |
+
import logging
|
| 6 |
+
from flask import Blueprint, request, jsonify, current_app, send_file
|
| 7 |
+
from services.mastering_service import MasteringService
|
| 8 |
+
from pathlib import Path
|
| 9 |
+
|
| 10 |
+
logger = logging.getLogger(__name__)
|
| 11 |
+
|
| 12 |
+
mastering_bp = Blueprint('mastering', __name__)
|
| 13 |
+
|
| 14 |
+
# Initialize service
|
| 15 |
+
mastering_service = None
|
| 16 |
+
|
| 17 |
+
def get_mastering_service():
|
| 18 |
+
"""Get or create mastering service instance"""
|
| 19 |
+
global mastering_service
|
| 20 |
+
if mastering_service is None:
|
| 21 |
+
mastering_service = MasteringService()
|
| 22 |
+
return mastering_service
|
| 23 |
+
|
| 24 |
+
@mastering_bp.route('/presets', methods=['GET'])
|
| 25 |
+
def get_presets():
|
| 26 |
+
"""Get list of all available mastering presets"""
|
| 27 |
+
try:
|
| 28 |
+
service = get_mastering_service()
|
| 29 |
+
presets = service.get_preset_list()
|
| 30 |
+
return jsonify({'presets': presets})
|
| 31 |
+
except Exception as e:
|
| 32 |
+
logger.error(f"Error getting presets: {str(e)}", exc_info=True)
|
| 33 |
+
return jsonify({'error': 'Failed to get presets'}), 500
|
| 34 |
+
|
| 35 |
+
@mastering_bp.route('/apply-preset', methods=['POST'])
|
| 36 |
+
def apply_preset():
|
| 37 |
+
"""Apply mastering preset to audio clip"""
|
| 38 |
+
try:
|
| 39 |
+
data = request.json
|
| 40 |
+
clip_id = data.get('clip_id')
|
| 41 |
+
preset_name = data.get('preset')
|
| 42 |
+
audio_path = data.get('audio_path')
|
| 43 |
+
|
| 44 |
+
if not all([clip_id, preset_name, audio_path]):
|
| 45 |
+
return jsonify({'error': 'Missing required parameters'}), 400
|
| 46 |
+
|
| 47 |
+
# Verify audio file exists
|
| 48 |
+
if not os.path.exists(audio_path):
|
| 49 |
+
return jsonify({'error': 'Audio file not found'}), 404
|
| 50 |
+
|
| 51 |
+
# Generate output path
|
| 52 |
+
output_dir = Path(current_app.config['OUTPUT_FOLDER']) / 'mastered'
|
| 53 |
+
output_dir.mkdir(parents=True, exist_ok=True)
|
| 54 |
+
|
| 55 |
+
filename = Path(audio_path).stem
|
| 56 |
+
output_path = output_dir / f"{filename}_mastered_{preset_name}.wav"
|
| 57 |
+
|
| 58 |
+
# Apply preset
|
| 59 |
+
service = get_mastering_service()
|
| 60 |
+
processed_path = service.apply_preset(audio_path, preset_name, str(output_path))
|
| 61 |
+
|
| 62 |
+
# Return URL to processed file
|
| 63 |
+
relative_path = os.path.relpath(processed_path, current_app.config['OUTPUT_FOLDER'])
|
| 64 |
+
file_url = f"/outputs/{relative_path.replace(os.sep, '/')}"
|
| 65 |
+
|
| 66 |
+
return jsonify({
|
| 67 |
+
'success': True,
|
| 68 |
+
'processed_path': file_url,
|
| 69 |
+
'clip_id': clip_id,
|
| 70 |
+
'preset': preset_name
|
| 71 |
+
})
|
| 72 |
+
|
| 73 |
+
except ValueError as e:
|
| 74 |
+
logger.error(f"Validation error: {str(e)}")
|
| 75 |
+
return jsonify({'error': str(e)}), 400
|
| 76 |
+
except Exception as e:
|
| 77 |
+
logger.error(f"Error applying preset: {str(e)}", exc_info=True)
|
| 78 |
+
return jsonify({'error': f'Failed to apply preset: {str(e)}'}), 500
|
| 79 |
+
|
| 80 |
+
@mastering_bp.route('/apply-custom-eq', methods=['POST'])
|
| 81 |
+
def apply_custom_eq():
|
| 82 |
+
"""Apply custom EQ settings to audio clip"""
|
| 83 |
+
try:
|
| 84 |
+
data = request.json
|
| 85 |
+
clip_id = data.get('clip_id')
|
| 86 |
+
audio_path = data.get('audio_path')
|
| 87 |
+
eq_bands = data.get('eq_bands', [])
|
| 88 |
+
compression = data.get('compression')
|
| 89 |
+
limiting = data.get('limiting')
|
| 90 |
+
|
| 91 |
+
if not all([clip_id, audio_path]):
|
| 92 |
+
return jsonify({'error': 'Missing required parameters'}), 400
|
| 93 |
+
|
| 94 |
+
# Verify audio file exists
|
| 95 |
+
if not os.path.exists(audio_path):
|
| 96 |
+
return jsonify({'error': 'Audio file not found'}), 404
|
| 97 |
+
|
| 98 |
+
# Generate output path
|
| 99 |
+
output_dir = Path(current_app.config['OUTPUT_FOLDER']) / 'mastered'
|
| 100 |
+
output_dir.mkdir(parents=True, exist_ok=True)
|
| 101 |
+
|
| 102 |
+
filename = Path(audio_path).stem
|
| 103 |
+
output_path = output_dir / f"{filename}_custom_eq.wav"
|
| 104 |
+
|
| 105 |
+
# Apply custom EQ
|
| 106 |
+
service = get_mastering_service()
|
| 107 |
+
processed_path = service.apply_custom_eq(
|
| 108 |
+
audio_path,
|
| 109 |
+
str(output_path),
|
| 110 |
+
eq_bands,
|
| 111 |
+
compression,
|
| 112 |
+
limiting
|
| 113 |
+
)
|
| 114 |
+
|
| 115 |
+
# Return URL to processed file
|
| 116 |
+
relative_path = os.path.relpath(processed_path, current_app.config['OUTPUT_FOLDER'])
|
| 117 |
+
file_url = f"/outputs/{relative_path.replace(os.sep, '/')}"
|
| 118 |
+
|
| 119 |
+
return jsonify({
|
| 120 |
+
'success': True,
|
| 121 |
+
'processed_path': file_url,
|
| 122 |
+
'clip_id': clip_id
|
| 123 |
+
})
|
| 124 |
+
|
| 125 |
+
except Exception as e:
|
| 126 |
+
logger.error(f"Error applying custom EQ: {str(e)}", exc_info=True)
|
| 127 |
+
return jsonify({'error': f'Failed to apply custom EQ: {str(e)}'}), 500
|
| 128 |
+
|
| 129 |
+
@mastering_bp.route('/preview', methods=['POST'])
|
| 130 |
+
def preview_mastering():
|
| 131 |
+
"""Preview mastering effect (non-destructive)"""
|
| 132 |
+
try:
|
| 133 |
+
data = request.json
|
| 134 |
+
clip_id = data.get('clip_id')
|
| 135 |
+
audio_path = data.get('audio_path')
|
| 136 |
+
preset_name = data.get('preset')
|
| 137 |
+
eq_bands = data.get('eq_bands')
|
| 138 |
+
|
| 139 |
+
if not all([clip_id, audio_path]):
|
| 140 |
+
return jsonify({'error': 'Missing required parameters'}), 400
|
| 141 |
+
|
| 142 |
+
# Verify audio file exists
|
| 143 |
+
if not os.path.exists(audio_path):
|
| 144 |
+
return jsonify({'error': 'Audio file not found'}), 404
|
| 145 |
+
|
| 146 |
+
# Generate temp output path for preview
|
| 147 |
+
output_dir = Path(current_app.config['OUTPUT_FOLDER']) / 'preview'
|
| 148 |
+
output_dir.mkdir(parents=True, exist_ok=True)
|
| 149 |
+
|
| 150 |
+
filename = Path(audio_path).stem
|
| 151 |
+
output_path = output_dir / f"{filename}_preview.wav"
|
| 152 |
+
|
| 153 |
+
service = get_mastering_service()
|
| 154 |
+
|
| 155 |
+
if preset_name:
|
| 156 |
+
# Apply preset for preview
|
| 157 |
+
processed_path = service.apply_preset(audio_path, preset_name, str(output_path))
|
| 158 |
+
elif eq_bands:
|
| 159 |
+
# Apply custom EQ for preview
|
| 160 |
+
compression = data.get('compression')
|
| 161 |
+
limiting = data.get('limiting')
|
| 162 |
+
processed_path = service.apply_custom_eq(
|
| 163 |
+
audio_path,
|
| 164 |
+
str(output_path),
|
| 165 |
+
eq_bands,
|
| 166 |
+
compression,
|
| 167 |
+
limiting
|
| 168 |
+
)
|
| 169 |
+
else:
|
| 170 |
+
return jsonify({'error': 'No preset or EQ settings provided'}), 400
|
| 171 |
+
|
| 172 |
+
# Return URL to preview file
|
| 173 |
+
# Use absolute path from project root for frontend to access
|
| 174 |
+
relative_path = os.path.relpath(processed_path, 'outputs')
|
| 175 |
+
file_url = f"http://localhost:7860/outputs/{relative_path.replace(os.sep, '/')}"
|
| 176 |
+
|
| 177 |
+
return jsonify({
|
| 178 |
+
'success': True,
|
| 179 |
+
'preview_path': file_url,
|
| 180 |
+
'clip_id': clip_id
|
| 181 |
+
})
|
| 182 |
+
|
| 183 |
+
except Exception as e:
|
| 184 |
+
logger.error(f"Error generating preview: {str(e)}", exc_info=True)
|
| 185 |
+
return jsonify({'error': f'Failed to generate preview: {str(e)}'}), 500
|
backend/routes/timeline.py
ADDED
|
@@ -0,0 +1,137 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Routes for timeline management
|
| 3 |
+
"""
|
| 4 |
+
import logging
|
| 5 |
+
from flask import Blueprint, request, jsonify
|
| 6 |
+
from services.timeline_service import TimelineService
|
| 7 |
+
from models.schemas import ClipPosition
|
| 8 |
+
|
| 9 |
+
logger = logging.getLogger(__name__)
|
| 10 |
+
|
| 11 |
+
timeline_bp = Blueprint('timeline', __name__)
|
| 12 |
+
timeline_service = TimelineService()
|
| 13 |
+
|
| 14 |
+
@timeline_bp.route('/clips', methods=['GET'])
|
| 15 |
+
def get_clips():
|
| 16 |
+
"""Get all clips in the timeline"""
|
| 17 |
+
try:
|
| 18 |
+
clips = timeline_service.get_all_clips()
|
| 19 |
+
return jsonify({
|
| 20 |
+
'success': True,
|
| 21 |
+
'clips': clips,
|
| 22 |
+
'total_duration': timeline_service.get_total_duration()
|
| 23 |
+
})
|
| 24 |
+
except Exception as e:
|
| 25 |
+
logger.error(f"Error fetching clips: {str(e)}", exc_info=True)
|
| 26 |
+
return jsonify({'error': str(e)}), 500
|
| 27 |
+
|
| 28 |
+
@timeline_bp.route('/clips', methods=['POST'])
|
| 29 |
+
def add_clip():
|
| 30 |
+
"""
|
| 31 |
+
Add a clip to the timeline
|
| 32 |
+
|
| 33 |
+
Request body:
|
| 34 |
+
{
|
| 35 |
+
"clip_id": "unique_id",
|
| 36 |
+
"file_path": "/path/to/clip.wav",
|
| 37 |
+
"duration": 30,
|
| 38 |
+
"position": "next" // intro, previous, next, outro
|
| 39 |
+
}
|
| 40 |
+
"""
|
| 41 |
+
try:
|
| 42 |
+
data = request.get_json()
|
| 43 |
+
logger.info(f"Adding clip to timeline: {data.get('clip_id')}")
|
| 44 |
+
|
| 45 |
+
# Validate required fields
|
| 46 |
+
required_fields = ['clip_id', 'file_path', 'duration', 'position']
|
| 47 |
+
for field in required_fields:
|
| 48 |
+
if field not in data:
|
| 49 |
+
return jsonify({'error': f'Missing required field: {field}'}), 400
|
| 50 |
+
|
| 51 |
+
# Validate position
|
| 52 |
+
try:
|
| 53 |
+
position = ClipPosition(data['position'])
|
| 54 |
+
except ValueError:
|
| 55 |
+
return jsonify({
|
| 56 |
+
'error': f"Invalid position. Must be one of: {', '.join([p.value for p in ClipPosition])}"
|
| 57 |
+
}), 400
|
| 58 |
+
|
| 59 |
+
# Add clip to timeline
|
| 60 |
+
result = timeline_service.add_clip(
|
| 61 |
+
clip_id=data['clip_id'],
|
| 62 |
+
file_path=data['file_path'],
|
| 63 |
+
duration=data['duration'],
|
| 64 |
+
position=position
|
| 65 |
+
)
|
| 66 |
+
|
| 67 |
+
logger.info(f"Clip added successfully at position: {result['timeline_position']}")
|
| 68 |
+
|
| 69 |
+
return jsonify({
|
| 70 |
+
'success': True,
|
| 71 |
+
**result
|
| 72 |
+
})
|
| 73 |
+
|
| 74 |
+
except Exception as e:
|
| 75 |
+
logger.error(f"Error adding clip: {str(e)}", exc_info=True)
|
| 76 |
+
return jsonify({'error': str(e)}), 500
|
| 77 |
+
|
| 78 |
+
@timeline_bp.route('/clips/<clip_id>', methods=['DELETE'])
|
| 79 |
+
def remove_clip(clip_id):
|
| 80 |
+
"""Remove a clip from the timeline"""
|
| 81 |
+
try:
|
| 82 |
+
logger.info(f"Removing clip: {clip_id}")
|
| 83 |
+
timeline_service.remove_clip(clip_id)
|
| 84 |
+
|
| 85 |
+
return jsonify({
|
| 86 |
+
'success': True,
|
| 87 |
+
'message': f'Clip {clip_id} removed'
|
| 88 |
+
})
|
| 89 |
+
|
| 90 |
+
except Exception as e:
|
| 91 |
+
logger.error(f"Error removing clip: {str(e)}", exc_info=True)
|
| 92 |
+
return jsonify({'error': str(e)}), 500
|
| 93 |
+
|
| 94 |
+
@timeline_bp.route('/clips/reorder', methods=['POST'])
|
| 95 |
+
def reorder_clips():
|
| 96 |
+
"""
|
| 97 |
+
Reorder clips in the timeline
|
| 98 |
+
|
| 99 |
+
Request body:
|
| 100 |
+
{
|
| 101 |
+
"clip_ids": ["id1", "id2", "id3"]
|
| 102 |
+
}
|
| 103 |
+
"""
|
| 104 |
+
try:
|
| 105 |
+
data = request.get_json()
|
| 106 |
+
clip_ids = data.get('clip_ids', [])
|
| 107 |
+
|
| 108 |
+
if not clip_ids:
|
| 109 |
+
return jsonify({'error': 'clip_ids array is required'}), 400
|
| 110 |
+
|
| 111 |
+
logger.info(f"Reordering clips: {clip_ids}")
|
| 112 |
+
timeline_service.reorder_clips(clip_ids)
|
| 113 |
+
|
| 114 |
+
return jsonify({
|
| 115 |
+
'success': True,
|
| 116 |
+
'message': 'Clips reordered successfully'
|
| 117 |
+
})
|
| 118 |
+
|
| 119 |
+
except Exception as e:
|
| 120 |
+
logger.error(f"Error reordering clips: {str(e)}", exc_info=True)
|
| 121 |
+
return jsonify({'error': str(e)}), 500
|
| 122 |
+
|
| 123 |
+
@timeline_bp.route('/clear', methods=['POST'])
|
| 124 |
+
def clear_timeline():
|
| 125 |
+
"""Clear all clips from the timeline"""
|
| 126 |
+
try:
|
| 127 |
+
logger.info("Clearing timeline")
|
| 128 |
+
timeline_service.clear()
|
| 129 |
+
|
| 130 |
+
return jsonify({
|
| 131 |
+
'success': True,
|
| 132 |
+
'message': 'Timeline cleared'
|
| 133 |
+
})
|
| 134 |
+
|
| 135 |
+
except Exception as e:
|
| 136 |
+
logger.error(f"Error clearing timeline: {str(e)}", exc_info=True)
|
| 137 |
+
return jsonify({'error': str(e)}), 500
|
backend/run.py
ADDED
|
@@ -0,0 +1,50 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Startup script for the Music Generation App
|
| 3 |
+
"""
|
| 4 |
+
import sys
|
| 5 |
+
import os
|
| 6 |
+
import signal
|
| 7 |
+
|
| 8 |
+
# Add backend directory to Python path
|
| 9 |
+
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
|
| 10 |
+
|
| 11 |
+
from app import create_app
|
| 12 |
+
|
| 13 |
+
def signal_handler(sig, frame):
|
| 14 |
+
"""Handle shutdown signals gracefully"""
|
| 15 |
+
print('\n\n[INFO] Shutting down server...')
|
| 16 |
+
sys.exit(0)
|
| 17 |
+
|
| 18 |
+
if __name__ == '__main__':
|
| 19 |
+
try:
|
| 20 |
+
app = create_app()
|
| 21 |
+
port = int(os.getenv('PORT', 7860)) # Default to 7860 to match frontend expectations
|
| 22 |
+
host = os.getenv('HOST', '0.0.0.0')
|
| 23 |
+
|
| 24 |
+
print(f"""
|
| 25 |
+
================================================================
|
| 26 |
+
Music Generation App Server Starting...
|
| 27 |
+
================================================================
|
| 28 |
+
|
| 29 |
+
Server running at: http://{host}:{port}
|
| 30 |
+
API endpoints: http://{host}:{port}/api
|
| 31 |
+
Health check: http://{host}:{port}/api/health
|
| 32 |
+
|
| 33 |
+
Press Ctrl+C to stop the server
|
| 34 |
+
================================================================
|
| 35 |
+
""")
|
| 36 |
+
|
| 37 |
+
# Register signal handlers
|
| 38 |
+
signal.signal(signal.SIGINT, signal_handler)
|
| 39 |
+
signal.signal(signal.SIGTERM, signal_handler)
|
| 40 |
+
|
| 41 |
+
# Use waitress for production-ready server
|
| 42 |
+
from waitress import serve
|
| 43 |
+
print('[INFO] Server is ready!')
|
| 44 |
+
serve(app, host=host, port=port, threads=4)
|
| 45 |
+
|
| 46 |
+
except Exception as e:
|
| 47 |
+
print(f"\n[ERROR] Failed to start server: {e}", file=sys.stderr)
|
| 48 |
+
import traceback
|
| 49 |
+
traceback.print_exc()
|
| 50 |
+
sys.exit(1)
|
backend/services/__init__.py
ADDED
|
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Services package"""
|
| 2 |
+
from .diffrhythm_service import DiffRhythmService
|
| 3 |
+
from .timeline_service import TimelineService
|
| 4 |
+
from .export_service import ExportService
|
| 5 |
+
from .fish_speech_service import FishSpeechService
|
| 6 |
+
|
| 7 |
+
__all__ = [
|
| 8 |
+
'DiffRhythmService',
|
| 9 |
+
'TimelineService',
|
| 10 |
+
'ExportService',
|
| 11 |
+
'FishSpeechService'
|
| 12 |
+
]
|
backend/services/diffrhythm_service.py
ADDED
|
@@ -0,0 +1,397 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
DiffRhythm 2 music generation service
|
| 3 |
+
Integrates with the DiffRhythm 2 model for music generation with vocals
|
| 4 |
+
"""
|
| 5 |
+
import os
|
| 6 |
+
import sys
|
| 7 |
+
import logging
|
| 8 |
+
import uuid
|
| 9 |
+
from pathlib import Path
|
| 10 |
+
from typing import Optional
|
| 11 |
+
import numpy as np
|
| 12 |
+
import soundfile as sf
|
| 13 |
+
import torch
|
| 14 |
+
import torchaudio
|
| 15 |
+
import json
|
| 16 |
+
|
| 17 |
+
# Configure espeak-ng path for phonemizer (required by g2p module)
|
| 18 |
+
# Note: Environment configuration handled by hf_config.py for HuggingFace Spaces
|
| 19 |
+
# or by launch scripts for local development
|
| 20 |
+
if "PHONEMIZER_ESPEAK_PATH" not in os.environ:
|
| 21 |
+
# Fallback for local development without launcher
|
| 22 |
+
espeak_path = Path(__file__).parent.parent.parent / "external" / "espeak-ng"
|
| 23 |
+
if espeak_path.exists():
|
| 24 |
+
os.environ["PHONEMIZER_ESPEAK_LIBRARY"] = str(espeak_path / "libespeak-ng.dll")
|
| 25 |
+
os.environ["PHONEMIZER_ESPEAK_PATH"] = str(espeak_path)
|
| 26 |
+
|
| 27 |
+
# Add DiffRhythm2 source code to path (cloned repo, not pip package)
|
| 28 |
+
diffrhythm2_src = Path(__file__).parent.parent.parent / "models" / "diffrhythm2_source"
|
| 29 |
+
sys.path.insert(0, str(diffrhythm2_src))
|
| 30 |
+
|
| 31 |
+
logger = logging.getLogger(__name__)
|
| 32 |
+
|
| 33 |
+
class DiffRhythmService:
|
| 34 |
+
"""Service for DiffRhythm 2 music generation"""
|
| 35 |
+
|
| 36 |
+
def __init__(self, model_path: str):
|
| 37 |
+
"""
|
| 38 |
+
Initialize DiffRhythm 2 service
|
| 39 |
+
|
| 40 |
+
Args:
|
| 41 |
+
model_path: Path to DiffRhythm 2 model files
|
| 42 |
+
"""
|
| 43 |
+
self.model_path = model_path
|
| 44 |
+
self.model = None
|
| 45 |
+
self.mulan = None
|
| 46 |
+
self.lrc_tokenizer = None
|
| 47 |
+
self.decoder = None
|
| 48 |
+
self.is_initialized = False
|
| 49 |
+
self.device = self._get_device()
|
| 50 |
+
logger.info(f"DiffRhythm 2 service created with model path: {model_path}")
|
| 51 |
+
logger.info(f"Using device: {self.device}")
|
| 52 |
+
|
| 53 |
+
def _get_device(self):
|
| 54 |
+
"""Get compute device (CUDA or CPU)"""
|
| 55 |
+
# Try CUDA first (NVIDIA)
|
| 56 |
+
if torch.cuda.is_available():
|
| 57 |
+
logger.info("Using CUDA (NVIDIA GPU)")
|
| 58 |
+
return torch.device("cuda")
|
| 59 |
+
|
| 60 |
+
# Note: DirectML support disabled due to version conflicts with DiffRhythm2
|
| 61 |
+
# DiffRhythm2 requires torch>=2.4, but torch-directml requires torch==2.4.1
|
| 62 |
+
# For AMD GPU acceleration, consider using ROCm with compatible PyTorch build
|
| 63 |
+
|
| 64 |
+
# Fallback to CPU
|
| 65 |
+
logger.info("Using CPU (no GPU acceleration)")
|
| 66 |
+
return torch.device("cpu")
|
| 67 |
+
|
| 68 |
+
def _initialize_model(self):
|
| 69 |
+
"""Lazy load the DiffRhythm 2 model when first needed"""
|
| 70 |
+
if self.is_initialized:
|
| 71 |
+
return
|
| 72 |
+
|
| 73 |
+
try:
|
| 74 |
+
logger.info("Initializing DiffRhythm 2 model...")
|
| 75 |
+
|
| 76 |
+
from diffrhythm2.cfm import CFM
|
| 77 |
+
from diffrhythm2.backbones.dit import DiT
|
| 78 |
+
from bigvgan.model import Generator
|
| 79 |
+
from muq import MuQMuLan
|
| 80 |
+
from huggingface_hub import hf_hub_download
|
| 81 |
+
from safetensors.torch import load_file
|
| 82 |
+
|
| 83 |
+
# Load DiffRhythm 2 model
|
| 84 |
+
repo_id = "ASLP-lab/DiffRhythm2"
|
| 85 |
+
|
| 86 |
+
# Download model files
|
| 87 |
+
model_ckpt = hf_hub_download(
|
| 88 |
+
repo_id=repo_id,
|
| 89 |
+
filename="model.safetensors",
|
| 90 |
+
local_dir=self.model_path,
|
| 91 |
+
local_files_only=False,
|
| 92 |
+
)
|
| 93 |
+
model_config_path = hf_hub_download(
|
| 94 |
+
repo_id=repo_id,
|
| 95 |
+
filename="config.json",
|
| 96 |
+
local_dir=self.model_path,
|
| 97 |
+
local_files_only=False,
|
| 98 |
+
)
|
| 99 |
+
|
| 100 |
+
# Load config
|
| 101 |
+
with open(model_config_path) as f:
|
| 102 |
+
model_config = json.load(f)
|
| 103 |
+
|
| 104 |
+
model_config['use_flex_attn'] = False
|
| 105 |
+
|
| 106 |
+
# Create model
|
| 107 |
+
self.model = CFM(
|
| 108 |
+
transformer=DiT(**model_config),
|
| 109 |
+
num_channels=model_config['mel_dim'],
|
| 110 |
+
block_size=model_config['block_size'],
|
| 111 |
+
)
|
| 112 |
+
|
| 113 |
+
# Load weights
|
| 114 |
+
ckpt = load_file(model_ckpt)
|
| 115 |
+
self.model.load_state_dict(ckpt)
|
| 116 |
+
self.model = self.model.to(self.device)
|
| 117 |
+
|
| 118 |
+
# Load MuLan for style encoding
|
| 119 |
+
self.mulan = MuQMuLan.from_pretrained(
|
| 120 |
+
"OpenMuQ/MuQ-MuLan-large",
|
| 121 |
+
cache_dir=os.path.join(self.model_path, "mulan")
|
| 122 |
+
).to(self.device)
|
| 123 |
+
|
| 124 |
+
# Load tokenizer
|
| 125 |
+
from g2p.g2p_generation import chn_eng_g2p
|
| 126 |
+
vocab_path = os.path.join(self.model_path, "vocab.json")
|
| 127 |
+
if not os.path.exists(vocab_path):
|
| 128 |
+
# Download vocab
|
| 129 |
+
vocab_path = hf_hub_download(
|
| 130 |
+
repo_id=repo_id,
|
| 131 |
+
filename="g2p/g2p/vocab.json",
|
| 132 |
+
local_dir=self.model_path,
|
| 133 |
+
local_files_only=False,
|
| 134 |
+
)
|
| 135 |
+
|
| 136 |
+
with open(vocab_path, 'r') as f:
|
| 137 |
+
phone2id = json.load(f)['vocab']
|
| 138 |
+
|
| 139 |
+
self.lrc_tokenizer = {
|
| 140 |
+
'phone2id': phone2id,
|
| 141 |
+
'g2p': chn_eng_g2p
|
| 142 |
+
}
|
| 143 |
+
|
| 144 |
+
# Load decoder (BigVGAN vocoder)
|
| 145 |
+
decoder_ckpt = hf_hub_download(
|
| 146 |
+
repo_id=repo_id,
|
| 147 |
+
filename="decoder.bin",
|
| 148 |
+
local_dir=self.model_path,
|
| 149 |
+
local_files_only=False,
|
| 150 |
+
)
|
| 151 |
+
decoder_config = hf_hub_download(
|
| 152 |
+
repo_id=repo_id,
|
| 153 |
+
filename="decoder.json",
|
| 154 |
+
local_dir=self.model_path,
|
| 155 |
+
local_files_only=False,
|
| 156 |
+
)
|
| 157 |
+
|
| 158 |
+
self.decoder = Generator(decoder_config, decoder_ckpt)
|
| 159 |
+
self.decoder = self.decoder.to(self.device)
|
| 160 |
+
|
| 161 |
+
logger.info("β
DiffRhythm 2 model loaded successfully")
|
| 162 |
+
|
| 163 |
+
self.is_initialized = True
|
| 164 |
+
logger.info("DiffRhythm 2 service initialized")
|
| 165 |
+
|
| 166 |
+
except Exception as e:
|
| 167 |
+
logger.error(f"Failed to initialize DiffRhythm 2: {str(e)}", exc_info=True)
|
| 168 |
+
raise RuntimeError(f"Could not load DiffRhythm 2 model: {str(e)}")
|
| 169 |
+
|
| 170 |
+
def generate(
|
| 171 |
+
self,
|
| 172 |
+
prompt: str,
|
| 173 |
+
duration: int = 30,
|
| 174 |
+
sample_rate: int = 44100,
|
| 175 |
+
lyrics: Optional[str] = None,
|
| 176 |
+
reference_audio: Optional[str] = None
|
| 177 |
+
) -> str:
|
| 178 |
+
"""
|
| 179 |
+
Generate music from text prompt with optional vocals/lyrics and style reference
|
| 180 |
+
|
| 181 |
+
Args:
|
| 182 |
+
prompt: Text description of desired music
|
| 183 |
+
duration: Length in seconds
|
| 184 |
+
sample_rate: Audio sample rate
|
| 185 |
+
lyrics: Optional lyrics for vocals
|
| 186 |
+
reference_audio: Optional path to reference audio for style consistency
|
| 187 |
+
|
| 188 |
+
Returns:
|
| 189 |
+
Path to generated audio file
|
| 190 |
+
"""
|
| 191 |
+
try:
|
| 192 |
+
self._initialize_model()
|
| 193 |
+
|
| 194 |
+
if lyrics:
|
| 195 |
+
logger.info(f"Generating music with vocals: prompt='{prompt}', lyrics_length={len(lyrics)}")
|
| 196 |
+
else:
|
| 197 |
+
logger.info(f"Generating instrumental music: prompt='{prompt}'")
|
| 198 |
+
|
| 199 |
+
if reference_audio and os.path.exists(reference_audio):
|
| 200 |
+
logger.info(f"Using style reference: {reference_audio}")
|
| 201 |
+
|
| 202 |
+
logger.info(f"Duration={duration}s")
|
| 203 |
+
|
| 204 |
+
# Try to generate with DiffRhythm 2
|
| 205 |
+
if self.model is not None:
|
| 206 |
+
audio = self._generate_with_diffrhythm2(prompt, lyrics, duration, sample_rate, reference_audio)
|
| 207 |
+
else:
|
| 208 |
+
# Fallback: Generate placeholder
|
| 209 |
+
logger.warning("Using placeholder audio generation")
|
| 210 |
+
audio = self._generate_placeholder(duration, sample_rate)
|
| 211 |
+
|
| 212 |
+
# Save to file
|
| 213 |
+
output_dir = os.path.join('outputs', 'music')
|
| 214 |
+
os.makedirs(output_dir, exist_ok=True)
|
| 215 |
+
|
| 216 |
+
clip_id = str(uuid.uuid4())
|
| 217 |
+
output_path = os.path.join(output_dir, f"{clip_id}.wav")
|
| 218 |
+
|
| 219 |
+
# Ensure audio is in correct format (channels, samples) for soundfile
|
| 220 |
+
# If audio is 1D (mono), keep it as is. If 2D, ensure it's (samples, channels)
|
| 221 |
+
if audio.ndim == 1:
|
| 222 |
+
# Mono audio - soundfile expects (samples,) shape
|
| 223 |
+
sf.write(output_path, audio, sample_rate)
|
| 224 |
+
else:
|
| 225 |
+
# Stereo/multi-channel - soundfile expects (samples, channels)
|
| 226 |
+
sf.write(output_path, audio, sample_rate)
|
| 227 |
+
|
| 228 |
+
logger.info(f"Music generated successfully: {output_path}")
|
| 229 |
+
|
| 230 |
+
return output_path
|
| 231 |
+
|
| 232 |
+
except Exception as e:
|
| 233 |
+
logger.error(f"Music generation failed: {str(e)}", exc_info=True)
|
| 234 |
+
raise RuntimeError(f"Failed to generate music: {str(e)}")
|
| 235 |
+
|
| 236 |
+
def _generate_with_diffrhythm2(
|
| 237 |
+
self,
|
| 238 |
+
prompt: str,
|
| 239 |
+
lyrics: Optional[str],
|
| 240 |
+
duration: int,
|
| 241 |
+
sample_rate: int,
|
| 242 |
+
reference_audio: Optional[str] = None
|
| 243 |
+
) -> np.ndarray:
|
| 244 |
+
"""
|
| 245 |
+
Generate music using DiffRhythm 2 model with optional style reference
|
| 246 |
+
|
| 247 |
+
Args:
|
| 248 |
+
prompt: Music description (used as style prompt)
|
| 249 |
+
lyrics: Lyrics for vocals (required for vocal generation)
|
| 250 |
+
duration: Duration in seconds
|
| 251 |
+
sample_rate: Sample rate
|
| 252 |
+
reference_audio: Optional path to reference audio for style guidance
|
| 253 |
+
|
| 254 |
+
Returns:
|
| 255 |
+
Audio array
|
| 256 |
+
"""
|
| 257 |
+
try:
|
| 258 |
+
logger.info("Generating with DiffRhythm 2 model...")
|
| 259 |
+
|
| 260 |
+
# Prepare lyrics tokens
|
| 261 |
+
if lyrics:
|
| 262 |
+
lyrics_token = self._tokenize_lyrics(lyrics)
|
| 263 |
+
else:
|
| 264 |
+
# For instrumental, use empty structure
|
| 265 |
+
lyrics_token = torch.tensor([500, 511], dtype=torch.long, device=self.device) # [start][stop]
|
| 266 |
+
|
| 267 |
+
# Encode style prompt with optional reference audio blending
|
| 268 |
+
with torch.no_grad():
|
| 269 |
+
if reference_audio and os.path.exists(reference_audio):
|
| 270 |
+
try:
|
| 271 |
+
import torchaudio
|
| 272 |
+
# Load reference audio
|
| 273 |
+
ref_waveform, ref_sr = torchaudio.load(reference_audio)
|
| 274 |
+
if ref_sr != 24000: # MuLan expects 24kHz
|
| 275 |
+
ref_waveform = torchaudio.functional.resample(ref_waveform, ref_sr, 24000)
|
| 276 |
+
|
| 277 |
+
# Encode reference audio with MuLan
|
| 278 |
+
ref_waveform = ref_waveform.to(self.device)
|
| 279 |
+
audio_style_embed = self.mulan(audios=ref_waveform.unsqueeze(0))
|
| 280 |
+
text_style_embed = self.mulan(texts=[prompt])
|
| 281 |
+
|
| 282 |
+
# Blend reference audio style with text prompt (70% audio, 30% text)
|
| 283 |
+
style_prompt_embed = 0.7 * audio_style_embed + 0.3 * text_style_embed
|
| 284 |
+
logger.info("Using blended style: 70% reference audio + 30% text prompt")
|
| 285 |
+
except Exception as e:
|
| 286 |
+
logger.warning(f"Failed to use reference audio, using text prompt only: {e}")
|
| 287 |
+
style_prompt_embed = self.mulan(texts=[prompt])
|
| 288 |
+
else:
|
| 289 |
+
style_prompt_embed = self.mulan(texts=[prompt])
|
| 290 |
+
|
| 291 |
+
style_prompt_embed = style_prompt_embed.to(self.device).squeeze(0)
|
| 292 |
+
|
| 293 |
+
# Use FP16 if on GPU
|
| 294 |
+
if self.device.type != 'cpu':
|
| 295 |
+
self.model = self.model.half()
|
| 296 |
+
self.decoder = self.decoder.half()
|
| 297 |
+
style_prompt_embed = style_prompt_embed.half()
|
| 298 |
+
|
| 299 |
+
# Generate latent representation
|
| 300 |
+
with torch.inference_mode():
|
| 301 |
+
latent = self.model.sample_block_cache(
|
| 302 |
+
text=lyrics_token.unsqueeze(0),
|
| 303 |
+
duration=int(duration * 5), # DiffRhythm uses 5 frames per second
|
| 304 |
+
style_prompt=style_prompt_embed.unsqueeze(0),
|
| 305 |
+
steps=16, # Sampling steps
|
| 306 |
+
cfg_strength=2.0, # Classifier-free guidance
|
| 307 |
+
process_bar=False
|
| 308 |
+
)
|
| 309 |
+
|
| 310 |
+
# Decode to audio
|
| 311 |
+
latent = latent.transpose(1, 2)
|
| 312 |
+
audio = self.decoder.decode_audio(latent, overlap=5, chunk_size=20)
|
| 313 |
+
|
| 314 |
+
# Convert to numpy
|
| 315 |
+
audio = audio.float().cpu().numpy().squeeze()
|
| 316 |
+
|
| 317 |
+
# Ensure correct length
|
| 318 |
+
target_length = int(duration * sample_rate)
|
| 319 |
+
if len(audio) > target_length:
|
| 320 |
+
audio = audio[:target_length]
|
| 321 |
+
elif len(audio) < target_length:
|
| 322 |
+
audio = np.pad(audio, (0, target_length - len(audio)))
|
| 323 |
+
|
| 324 |
+
# Resample if needed
|
| 325 |
+
if sample_rate != 24000: # DiffRhythm 2 native sample rate
|
| 326 |
+
import scipy.signal as signal
|
| 327 |
+
audio = signal.resample(audio, target_length)
|
| 328 |
+
|
| 329 |
+
logger.info("β
DiffRhythm 2 generation successful")
|
| 330 |
+
return audio.astype(np.float32)
|
| 331 |
+
|
| 332 |
+
except Exception as e:
|
| 333 |
+
logger.error(f"DiffRhythm 2 generation failed: {str(e)}")
|
| 334 |
+
return self._generate_placeholder(duration, sample_rate)
|
| 335 |
+
|
| 336 |
+
def _tokenize_lyrics(self, lyrics: str) -> torch.Tensor:
|
| 337 |
+
"""
|
| 338 |
+
Tokenize lyrics for DiffRhythm 2
|
| 339 |
+
|
| 340 |
+
Args:
|
| 341 |
+
lyrics: Lyrics text
|
| 342 |
+
|
| 343 |
+
Returns:
|
| 344 |
+
Tokenized lyrics tensor
|
| 345 |
+
"""
|
| 346 |
+
try:
|
| 347 |
+
# Structure tags
|
| 348 |
+
STRUCT_INFO = {
|
| 349 |
+
"[start]": 500,
|
| 350 |
+
"[end]": 501,
|
| 351 |
+
"[intro]": 502,
|
| 352 |
+
"[verse]": 503,
|
| 353 |
+
"[chorus]": 504,
|
| 354 |
+
"[outro]": 505,
|
| 355 |
+
"[inst]": 506,
|
| 356 |
+
"[solo]": 507,
|
| 357 |
+
"[bridge]": 508,
|
| 358 |
+
"[hook]": 509,
|
| 359 |
+
"[break]": 510,
|
| 360 |
+
"[stop]": 511,
|
| 361 |
+
"[space]": 512
|
| 362 |
+
}
|
| 363 |
+
|
| 364 |
+
# Convert lyrics to phonemes and tokens
|
| 365 |
+
phone, tokens = self.lrc_tokenizer['g2p'](lyrics)
|
| 366 |
+
tokens = [x + 1 for x in tokens] # Offset by 1
|
| 367 |
+
|
| 368 |
+
# Add structure: [start] + lyrics + [stop]
|
| 369 |
+
lyrics_tokens = [STRUCT_INFO['[start]']] + tokens + [STRUCT_INFO['[stop]']]
|
| 370 |
+
|
| 371 |
+
return torch.tensor(lyrics_tokens, dtype=torch.long, device=self.device)
|
| 372 |
+
|
| 373 |
+
except Exception as e:
|
| 374 |
+
logger.error(f"Lyrics tokenization failed: {str(e)}")
|
| 375 |
+
# Return minimal structure
|
| 376 |
+
return torch.tensor([500, 511], dtype=torch.long, device=self.device)
|
| 377 |
+
|
| 378 |
+
def _generate_placeholder(self, duration: int, sample_rate: int) -> np.ndarray:
|
| 379 |
+
"""
|
| 380 |
+
Generate placeholder audio (for testing without actual model)
|
| 381 |
+
|
| 382 |
+
Args:
|
| 383 |
+
duration: Length in seconds
|
| 384 |
+
sample_rate: Sample rate
|
| 385 |
+
|
| 386 |
+
Returns:
|
| 387 |
+
Audio array
|
| 388 |
+
"""
|
| 389 |
+
logger.warning("Using placeholder audio - DiffRhythm 2 model not loaded")
|
| 390 |
+
|
| 391 |
+
# Generate simple sine wave as placeholder
|
| 392 |
+
t = np.linspace(0, duration, int(duration * sample_rate))
|
| 393 |
+
frequency = 440 # A4 note
|
| 394 |
+
audio = 0.3 * np.sin(2 * np.pi * frequency * t)
|
| 395 |
+
|
| 396 |
+
return audio.astype(np.float32)
|
| 397 |
+
|
backend/services/export_service.py
ADDED
|
@@ -0,0 +1,123 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Export and merge service
|
| 3 |
+
"""
|
| 4 |
+
import os
|
| 5 |
+
import logging
|
| 6 |
+
from typing import Optional, List
|
| 7 |
+
import numpy as np
|
| 8 |
+
import soundfile as sf
|
| 9 |
+
from services.timeline_service import TimelineService
|
| 10 |
+
|
| 11 |
+
logger = logging.getLogger(__name__)
|
| 12 |
+
|
| 13 |
+
class ExportService:
|
| 14 |
+
"""Service for exporting and merging audio"""
|
| 15 |
+
|
| 16 |
+
def __init__(self):
|
| 17 |
+
"""Initialize export service"""
|
| 18 |
+
self.timeline_service = TimelineService()
|
| 19 |
+
logger.info("Export service initialized")
|
| 20 |
+
|
| 21 |
+
def merge_clips(
|
| 22 |
+
self,
|
| 23 |
+
filename: str = "output",
|
| 24 |
+
export_format: str = "wav"
|
| 25 |
+
) -> Optional[str]:
|
| 26 |
+
"""
|
| 27 |
+
Merge all timeline clips into a single file
|
| 28 |
+
|
| 29 |
+
Args:
|
| 30 |
+
filename: Output filename (without extension)
|
| 31 |
+
export_format: Output format (wav, mp3, flac)
|
| 32 |
+
|
| 33 |
+
Returns:
|
| 34 |
+
Path to merged file, or None if no clips
|
| 35 |
+
"""
|
| 36 |
+
try:
|
| 37 |
+
clips = self.timeline_service.get_all_clips()
|
| 38 |
+
|
| 39 |
+
if not clips:
|
| 40 |
+
logger.warning("No clips to merge")
|
| 41 |
+
return None
|
| 42 |
+
|
| 43 |
+
logger.info(f"Merging {len(clips)} clips")
|
| 44 |
+
|
| 45 |
+
# Load all clips
|
| 46 |
+
audio_data = []
|
| 47 |
+
sample_rate = None
|
| 48 |
+
|
| 49 |
+
for clip in clips:
|
| 50 |
+
audio, sr = sf.read(clip['file_path'])
|
| 51 |
+
|
| 52 |
+
if sample_rate is None:
|
| 53 |
+
sample_rate = sr
|
| 54 |
+
elif sr != sample_rate:
|
| 55 |
+
logger.warning(f"Sample rate mismatch: {sr} vs {sample_rate}")
|
| 56 |
+
# Could resample here if needed
|
| 57 |
+
|
| 58 |
+
audio_data.append(audio)
|
| 59 |
+
|
| 60 |
+
# Concatenate all clips
|
| 61 |
+
merged_audio = np.concatenate(audio_data)
|
| 62 |
+
|
| 63 |
+
# Normalize
|
| 64 |
+
max_val = np.abs(merged_audio).max()
|
| 65 |
+
if max_val > 0:
|
| 66 |
+
merged_audio = merged_audio / max_val * 0.95
|
| 67 |
+
|
| 68 |
+
# Save merged file
|
| 69 |
+
output_dir = 'outputs'
|
| 70 |
+
os.makedirs(output_dir, exist_ok=True)
|
| 71 |
+
|
| 72 |
+
output_path = os.path.join(output_dir, f"{filename}.{export_format}")
|
| 73 |
+
|
| 74 |
+
sf.write(output_path, merged_audio, sample_rate)
|
| 75 |
+
|
| 76 |
+
logger.info(f"Clips merged successfully: {output_path}")
|
| 77 |
+
return output_path
|
| 78 |
+
|
| 79 |
+
except Exception as e:
|
| 80 |
+
logger.error(f"Failed to merge clips: {str(e)}", exc_info=True)
|
| 81 |
+
raise
|
| 82 |
+
|
| 83 |
+
def export_clip(
|
| 84 |
+
self,
|
| 85 |
+
clip_id: str,
|
| 86 |
+
export_format: str = "wav"
|
| 87 |
+
) -> Optional[str]:
|
| 88 |
+
"""
|
| 89 |
+
Export a single clip
|
| 90 |
+
|
| 91 |
+
Args:
|
| 92 |
+
clip_id: ID of clip to export
|
| 93 |
+
export_format: Output format
|
| 94 |
+
|
| 95 |
+
Returns:
|
| 96 |
+
Path to exported file, or None if clip not found
|
| 97 |
+
"""
|
| 98 |
+
try:
|
| 99 |
+
clip = self.timeline_service.get_clip(clip_id)
|
| 100 |
+
|
| 101 |
+
if not clip:
|
| 102 |
+
logger.warning(f"Clip not found: {clip_id}")
|
| 103 |
+
return None
|
| 104 |
+
|
| 105 |
+
logger.info(f"Exporting clip: {clip_id}")
|
| 106 |
+
|
| 107 |
+
# Load clip
|
| 108 |
+
audio, sr = sf.read(clip.file_path)
|
| 109 |
+
|
| 110 |
+
# Export with requested format
|
| 111 |
+
output_dir = 'outputs'
|
| 112 |
+
os.makedirs(output_dir, exist_ok=True)
|
| 113 |
+
|
| 114 |
+
output_path = os.path.join(output_dir, f"{clip_id}.{export_format}")
|
| 115 |
+
|
| 116 |
+
sf.write(output_path, audio, sr)
|
| 117 |
+
|
| 118 |
+
logger.info(f"Clip exported: {output_path}")
|
| 119 |
+
return output_path
|
| 120 |
+
|
| 121 |
+
except Exception as e:
|
| 122 |
+
logger.error(f"Failed to export clip: {str(e)}", exc_info=True)
|
| 123 |
+
raise
|
backend/services/fish_speech_service.py
ADDED
|
@@ -0,0 +1,148 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Fish Speech TTS/vocals service
|
| 3 |
+
"""
|
| 4 |
+
import os
|
| 5 |
+
import logging
|
| 6 |
+
import uuid
|
| 7 |
+
import torch
|
| 8 |
+
from typing import Optional
|
| 9 |
+
import numpy as np
|
| 10 |
+
import soundfile as sf
|
| 11 |
+
|
| 12 |
+
logger = logging.getLogger(__name__)
|
| 13 |
+
|
| 14 |
+
class FishSpeechService:
|
| 15 |
+
"""Service for Fish Speech TTS and vocal synthesis"""
|
| 16 |
+
|
| 17 |
+
def __init__(self, model_path: str):
|
| 18 |
+
"""
|
| 19 |
+
Initialize Fish Speech service
|
| 20 |
+
|
| 21 |
+
Args:
|
| 22 |
+
model_path: Path to Fish Speech model files
|
| 23 |
+
"""
|
| 24 |
+
self.model_path = model_path
|
| 25 |
+
self.model = None
|
| 26 |
+
self.vocoder = None
|
| 27 |
+
self.is_initialized = False
|
| 28 |
+
self.device = self._get_device()
|
| 29 |
+
logger.info(f"Fish Speech service created with model path: {model_path}")
|
| 30 |
+
logger.info(f"Using device: {self.device}")
|
| 31 |
+
|
| 32 |
+
def _get_device(self):
|
| 33 |
+
"""Get compute device (AMD GPU via DirectML or CPU)"""
|
| 34 |
+
try:
|
| 35 |
+
from utils.amd_gpu import DEFAULT_DEVICE
|
| 36 |
+
return DEFAULT_DEVICE
|
| 37 |
+
except:
|
| 38 |
+
return torch.device("cpu")
|
| 39 |
+
|
| 40 |
+
def _initialize_model(self):
|
| 41 |
+
"""Lazy load the model when first needed"""
|
| 42 |
+
if self.is_initialized:
|
| 43 |
+
return
|
| 44 |
+
|
| 45 |
+
try:
|
| 46 |
+
logger.info("Initializing Fish Speech model...")
|
| 47 |
+
# TODO: Load actual Fish Speech model
|
| 48 |
+
# from fish_speech import FishSpeechModel
|
| 49 |
+
# self.model = FishSpeechModel.load(self.model_path)
|
| 50 |
+
|
| 51 |
+
self.is_initialized = True
|
| 52 |
+
logger.info("Fish Speech model initialized successfully")
|
| 53 |
+
|
| 54 |
+
except Exception as e:
|
| 55 |
+
logger.error(f"Failed to initialize Fish Speech model: {str(e)}", exc_info=True)
|
| 56 |
+
raise RuntimeError(f"Could not load Fish Speech model: {str(e)}")
|
| 57 |
+
|
| 58 |
+
def synthesize_vocals(
|
| 59 |
+
self,
|
| 60 |
+
lyrics: str,
|
| 61 |
+
duration: int = 30,
|
| 62 |
+
sample_rate: int = 44100
|
| 63 |
+
) -> str:
|
| 64 |
+
"""
|
| 65 |
+
Synthesize vocals from lyrics
|
| 66 |
+
|
| 67 |
+
Args:
|
| 68 |
+
lyrics: Lyrics text to sing
|
| 69 |
+
duration: Target duration in seconds
|
| 70 |
+
sample_rate: Audio sample rate
|
| 71 |
+
|
| 72 |
+
Returns:
|
| 73 |
+
Path to generated vocals file
|
| 74 |
+
"""
|
| 75 |
+
try:
|
| 76 |
+
self._initialize_model()
|
| 77 |
+
|
| 78 |
+
logger.info(f"Synthesizing vocals: {len(lyrics)} characters")
|
| 79 |
+
|
| 80 |
+
# TODO: Replace with actual Fish Speech synthesis
|
| 81 |
+
# vocals = self.model.synthesize(lyrics, duration=duration, sample_rate=sample_rate)
|
| 82 |
+
|
| 83 |
+
# Placeholder: Generate silence
|
| 84 |
+
vocals = np.zeros(int(duration * sample_rate), dtype=np.float32)
|
| 85 |
+
|
| 86 |
+
# Save to file
|
| 87 |
+
output_dir = os.path.join('outputs', 'vocals')
|
| 88 |
+
os.makedirs(output_dir, exist_ok=True)
|
| 89 |
+
|
| 90 |
+
vocals_id = str(uuid.uuid4())
|
| 91 |
+
output_path = os.path.join(output_dir, f"{vocals_id}.wav")
|
| 92 |
+
|
| 93 |
+
sf.write(output_path, vocals, sample_rate)
|
| 94 |
+
logger.info(f"Vocals synthesized: {output_path}")
|
| 95 |
+
|
| 96 |
+
return output_path
|
| 97 |
+
|
| 98 |
+
except Exception as e:
|
| 99 |
+
logger.error(f"Vocal synthesis failed: {str(e)}", exc_info=True)
|
| 100 |
+
raise RuntimeError(f"Failed to synthesize vocals: {str(e)}")
|
| 101 |
+
|
| 102 |
+
def add_vocals(
|
| 103 |
+
self,
|
| 104 |
+
music_path: str,
|
| 105 |
+
lyrics: str,
|
| 106 |
+
duration: int = 30
|
| 107 |
+
) -> str:
|
| 108 |
+
"""
|
| 109 |
+
Add synthesized vocals to music track
|
| 110 |
+
|
| 111 |
+
Args:
|
| 112 |
+
music_path: Path to music audio file
|
| 113 |
+
lyrics: Lyrics to sing
|
| 114 |
+
duration: Duration in seconds
|
| 115 |
+
|
| 116 |
+
Returns:
|
| 117 |
+
Path to mixed audio file
|
| 118 |
+
"""
|
| 119 |
+
try:
|
| 120 |
+
logger.info(f"Adding vocals to music: {music_path}")
|
| 121 |
+
|
| 122 |
+
# Load music
|
| 123 |
+
music_audio, sr = sf.read(music_path)
|
| 124 |
+
|
| 125 |
+
# Synthesize vocals
|
| 126 |
+
vocals_path = self.synthesize_vocals(lyrics, duration, sr)
|
| 127 |
+
vocals_audio, _ = sf.read(vocals_path)
|
| 128 |
+
|
| 129 |
+
# Mix vocals with music
|
| 130 |
+
# Ensure same length
|
| 131 |
+
min_len = min(len(music_audio), len(vocals_audio))
|
| 132 |
+
mixed = music_audio[:min_len] * 0.7 + vocals_audio[:min_len] * 0.3
|
| 133 |
+
|
| 134 |
+
# Save mixed audio
|
| 135 |
+
output_dir = os.path.join('outputs', 'mixed')
|
| 136 |
+
os.makedirs(output_dir, exist_ok=True)
|
| 137 |
+
|
| 138 |
+
mixed_id = str(uuid.uuid4())
|
| 139 |
+
output_path = os.path.join(output_dir, f"{mixed_id}.wav")
|
| 140 |
+
|
| 141 |
+
sf.write(output_path, mixed, sr)
|
| 142 |
+
logger.info(f"Vocals added successfully: {output_path}")
|
| 143 |
+
|
| 144 |
+
return output_path
|
| 145 |
+
|
| 146 |
+
except Exception as e:
|
| 147 |
+
logger.error(f"Adding vocals failed: {str(e)}", exc_info=True)
|
| 148 |
+
raise RuntimeError(f"Failed to add vocals: {str(e)}")
|
backend/services/lyricmind_service.py
ADDED
|
@@ -0,0 +1,220 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
LyricMind AI lyrics generation service
|
| 3 |
+
"""
|
| 4 |
+
import os
|
| 5 |
+
import logging
|
| 6 |
+
import torch
|
| 7 |
+
from typing import Optional
|
| 8 |
+
|
| 9 |
+
logger = logging.getLogger(__name__)
|
| 10 |
+
|
| 11 |
+
class LyricMindService:
|
| 12 |
+
"""Service for LyricMind AI lyrics generation"""
|
| 13 |
+
|
| 14 |
+
def __init__(self, model_path: str):
|
| 15 |
+
"""
|
| 16 |
+
Initialize LyricMind service
|
| 17 |
+
|
| 18 |
+
Args:
|
| 19 |
+
model_path: Path to LyricMind model files
|
| 20 |
+
"""
|
| 21 |
+
self.model_path = model_path
|
| 22 |
+
self.model = None
|
| 23 |
+
self.tokenizer = None
|
| 24 |
+
self.is_initialized = False
|
| 25 |
+
self.device = self._get_device()
|
| 26 |
+
logger.info(f"LyricMind service created with model path: {model_path}")
|
| 27 |
+
logger.info(f"Using device: {self.device}")
|
| 28 |
+
|
| 29 |
+
def _get_device(self):
|
| 30 |
+
"""Get compute device (AMD GPU via DirectML or CPU)"""
|
| 31 |
+
try:
|
| 32 |
+
from utils.amd_gpu import DEFAULT_DEVICE
|
| 33 |
+
return DEFAULT_DEVICE
|
| 34 |
+
except:
|
| 35 |
+
return torch.device("cpu")
|
| 36 |
+
|
| 37 |
+
def _initialize_model(self):
|
| 38 |
+
"""Lazy load the model when first needed"""
|
| 39 |
+
if self.is_initialized:
|
| 40 |
+
return
|
| 41 |
+
|
| 42 |
+
try:
|
| 43 |
+
logger.info("Initializing LyricMind model...")
|
| 44 |
+
|
| 45 |
+
# Try to load text generation model as fallback
|
| 46 |
+
try:
|
| 47 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 48 |
+
|
| 49 |
+
fallback_path = os.path.join(os.path.dirname(self.model_path), "text_generator")
|
| 50 |
+
|
| 51 |
+
if os.path.exists(fallback_path):
|
| 52 |
+
logger.info(f"Loading text generation model from {fallback_path}")
|
| 53 |
+
self.tokenizer = AutoTokenizer.from_pretrained(fallback_path, trust_remote_code=True)
|
| 54 |
+
self.model = AutoModelForCausalLM.from_pretrained(
|
| 55 |
+
fallback_path,
|
| 56 |
+
trust_remote_code=True,
|
| 57 |
+
torch_dtype=torch.float32 # Use FP32 for AMD GPU compatibility
|
| 58 |
+
)
|
| 59 |
+
self.model.to(self.device)
|
| 60 |
+
logger.info("β
Text generation model loaded successfully")
|
| 61 |
+
else:
|
| 62 |
+
logger.warning("Text generation model not found, using placeholder")
|
| 63 |
+
|
| 64 |
+
except Exception as e:
|
| 65 |
+
logger.warning(f"Could not load text model: {str(e)}")
|
| 66 |
+
|
| 67 |
+
self.is_initialized = True
|
| 68 |
+
logger.info("LyricMind service initialized")
|
| 69 |
+
|
| 70 |
+
except Exception as e:
|
| 71 |
+
logger.error(f"Failed to initialize LyricMind model: {str(e)}", exc_info=True)
|
| 72 |
+
raise RuntimeError(f"Could not load LyricMind model: {str(e)}")
|
| 73 |
+
|
| 74 |
+
def generate(
|
| 75 |
+
self,
|
| 76 |
+
prompt: str,
|
| 77 |
+
style: Optional[str] = None,
|
| 78 |
+
duration: int = 30,
|
| 79 |
+
prompt_analysis: Optional[dict] = None
|
| 80 |
+
) -> str:
|
| 81 |
+
"""
|
| 82 |
+
Generate lyrics from prompt using analysis context
|
| 83 |
+
|
| 84 |
+
Args:
|
| 85 |
+
prompt: Description of desired lyrics theme
|
| 86 |
+
style: Music style (optional, will be detected if not provided)
|
| 87 |
+
duration: Target song duration (affects lyrics length)
|
| 88 |
+
prompt_analysis: Pre-computed prompt analysis (optional)
|
| 89 |
+
|
| 90 |
+
Returns:
|
| 91 |
+
Generated lyrics text
|
| 92 |
+
"""
|
| 93 |
+
try:
|
| 94 |
+
self._initialize_model()
|
| 95 |
+
|
| 96 |
+
# Use prompt analysis for better context
|
| 97 |
+
from utils.prompt_analyzer import PromptAnalyzer
|
| 98 |
+
|
| 99 |
+
if prompt_analysis is None:
|
| 100 |
+
analysis = PromptAnalyzer.analyze(prompt)
|
| 101 |
+
else:
|
| 102 |
+
analysis = prompt_analysis
|
| 103 |
+
|
| 104 |
+
# Use detected genre/style if not explicitly provided
|
| 105 |
+
effective_style = style or analysis.get('genre', 'pop')
|
| 106 |
+
mood = analysis.get('mood', 'neutral')
|
| 107 |
+
|
| 108 |
+
logger.info(f"Generating lyrics: prompt='{prompt}', style={effective_style}, mood={mood}")
|
| 109 |
+
|
| 110 |
+
# Try to generate with text model
|
| 111 |
+
if self.model is not None and self.tokenizer is not None:
|
| 112 |
+
lyrics = self._generate_with_model(prompt, effective_style, duration, analysis)
|
| 113 |
+
else:
|
| 114 |
+
# Fallback: placeholder lyrics
|
| 115 |
+
lyrics = self._generate_placeholder(prompt, effective_style, duration)
|
| 116 |
+
|
| 117 |
+
logger.info("Lyrics generated successfully")
|
| 118 |
+
return lyrics
|
| 119 |
+
|
| 120 |
+
except Exception as e:
|
| 121 |
+
logger.error(f"Lyrics generation failed: {str(e)}", exc_info=True)
|
| 122 |
+
raise RuntimeError(f"Failed to generate lyrics: {str(e)}")
|
| 123 |
+
|
| 124 |
+
def _generate_with_model(self, prompt: str, style: str, duration: int, analysis: dict) -> str:
|
| 125 |
+
"""
|
| 126 |
+
Generate lyrics using text generation model with analysis context
|
| 127 |
+
|
| 128 |
+
Args:
|
| 129 |
+
prompt: Theme prompt
|
| 130 |
+
style: Music style
|
| 131 |
+
duration: Duration in seconds
|
| 132 |
+
analysis: Prompt analysis with genre, mood, etc.
|
| 133 |
+
|
| 134 |
+
Returns:
|
| 135 |
+
Generated lyrics
|
| 136 |
+
"""
|
| 137 |
+
try:
|
| 138 |
+
logger.info("Generating lyrics with AI model...")
|
| 139 |
+
|
| 140 |
+
# Create structured prompt with analysis context
|
| 141 |
+
mood = analysis.get('mood', 'neutral')
|
| 142 |
+
bpm = analysis.get('bpm', 120)
|
| 143 |
+
|
| 144 |
+
full_prompt = f"""Write song lyrics in {style} style about: {prompt}
|
| 145 |
+
Mood: {mood}
|
| 146 |
+
Tempo: {bpm} BPM
|
| 147 |
+
|
| 148 |
+
Lyrics:
|
| 149 |
+
"""
|
| 150 |
+
|
| 151 |
+
# Tokenize
|
| 152 |
+
inputs = self.tokenizer(full_prompt, return_tensors="pt")
|
| 153 |
+
inputs = {k: v.to(self.device) for k, v in inputs.items()}
|
| 154 |
+
|
| 155 |
+
# Calculate max length based on duration
|
| 156 |
+
max_length = min(200 + inputs["input_ids"].shape[1], 512)
|
| 157 |
+
|
| 158 |
+
# Generate
|
| 159 |
+
with torch.no_grad():
|
| 160 |
+
outputs = self.model.generate(
|
| 161 |
+
**inputs,
|
| 162 |
+
max_length=max_length,
|
| 163 |
+
temperature=0.9,
|
| 164 |
+
top_p=0.95,
|
| 165 |
+
do_sample=True,
|
| 166 |
+
pad_token_id=self.tokenizer.eos_token_id
|
| 167 |
+
)
|
| 168 |
+
|
| 169 |
+
# Decode
|
| 170 |
+
generated_text = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
|
| 171 |
+
|
| 172 |
+
# Extract lyrics (remove prompt)
|
| 173 |
+
lyrics = generated_text.split("Lyrics:")[-1].strip()
|
| 174 |
+
|
| 175 |
+
logger.info("β
AI lyrics generation successful")
|
| 176 |
+
return lyrics if lyrics else self._generate_placeholder(prompt, style, duration)
|
| 177 |
+
|
| 178 |
+
except Exception as e:
|
| 179 |
+
logger.error(f"Model generation failed: {str(e)}")
|
| 180 |
+
return self._generate_placeholder(prompt, style, duration)
|
| 181 |
+
|
| 182 |
+
def _generate_placeholder(
|
| 183 |
+
self,
|
| 184 |
+
prompt: str,
|
| 185 |
+
style: str,
|
| 186 |
+
duration: int
|
| 187 |
+
) -> str:
|
| 188 |
+
"""
|
| 189 |
+
Generate placeholder lyrics for testing
|
| 190 |
+
|
| 191 |
+
Args:
|
| 192 |
+
prompt: Theme prompt
|
| 193 |
+
style: Music style
|
| 194 |
+
duration: Duration in seconds
|
| 195 |
+
|
| 196 |
+
Returns:
|
| 197 |
+
Placeholder lyrics
|
| 198 |
+
"""
|
| 199 |
+
logger.warning("Using placeholder lyrics - LyricMind model not loaded")
|
| 200 |
+
|
| 201 |
+
# Estimate number of lines based on duration
|
| 202 |
+
lines_per_30s = 8
|
| 203 |
+
num_lines = int((duration / 30) * lines_per_30s)
|
| 204 |
+
|
| 205 |
+
lyrics_lines = [
|
| 206 |
+
f"[Verse 1]",
|
| 207 |
+
f"Theme: {prompt}",
|
| 208 |
+
f"Style: {style}",
|
| 209 |
+
"",
|
| 210 |
+
"[Chorus]",
|
| 211 |
+
"This is a placeholder",
|
| 212 |
+
"Generated by LyricMind AI",
|
| 213 |
+
"Replace with actual model output",
|
| 214 |
+
]
|
| 215 |
+
|
| 216 |
+
# Pad to desired length
|
| 217 |
+
while len(lyrics_lines) < num_lines:
|
| 218 |
+
lyrics_lines.append("La la la...")
|
| 219 |
+
|
| 220 |
+
return "\n".join(lyrics_lines[:num_lines])
|
backend/services/mastering_service.py
ADDED
|
@@ -0,0 +1,641 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Audio mastering service with industry-standard presets using Pedalboard
|
| 3 |
+
"""
|
| 4 |
+
import os
|
| 5 |
+
import logging
|
| 6 |
+
import numpy as np
|
| 7 |
+
from pathlib import Path
|
| 8 |
+
from typing import Dict, List, Optional
|
| 9 |
+
import soundfile as sf
|
| 10 |
+
from pedalboard import (
|
| 11 |
+
Pedalboard,
|
| 12 |
+
Compressor,
|
| 13 |
+
Limiter,
|
| 14 |
+
Gain,
|
| 15 |
+
HighpassFilter,
|
| 16 |
+
LowpassFilter,
|
| 17 |
+
PeakFilter,
|
| 18 |
+
LowShelfFilter,
|
| 19 |
+
HighShelfFilter,
|
| 20 |
+
Reverb,
|
| 21 |
+
Chorus,
|
| 22 |
+
Delay
|
| 23 |
+
)
|
| 24 |
+
|
| 25 |
+
logger = logging.getLogger(__name__)
|
| 26 |
+
|
| 27 |
+
class MasteringPreset:
|
| 28 |
+
"""Mastering preset configuration"""
|
| 29 |
+
|
| 30 |
+
def __init__(self, name: str, description: str, chain: List):
|
| 31 |
+
self.name = name
|
| 32 |
+
self.description = description
|
| 33 |
+
self.chain = chain
|
| 34 |
+
|
| 35 |
+
class MasteringService:
|
| 36 |
+
"""Audio mastering and EQ service"""
|
| 37 |
+
|
| 38 |
+
# Industry-standard mastering presets
|
| 39 |
+
PRESETS = {
|
| 40 |
+
# Clean/Transparent Presets
|
| 41 |
+
"clean_master": MasteringPreset(
|
| 42 |
+
"Clean Master",
|
| 43 |
+
"Transparent mastering with gentle compression",
|
| 44 |
+
[
|
| 45 |
+
HighpassFilter(cutoff_frequency_hz=30),
|
| 46 |
+
PeakFilter(cutoff_frequency_hz=100, gain_db=-1, q=0.7),
|
| 47 |
+
PeakFilter(cutoff_frequency_hz=3000, gain_db=0.5, q=1.0),
|
| 48 |
+
PeakFilter(cutoff_frequency_hz=10000, gain_db=1.0, q=0.7),
|
| 49 |
+
Compressor(threshold_db=-12, ratio=2.0, attack_ms=5, release_ms=100),
|
| 50 |
+
Limiter(threshold_db=-1.0, release_ms=100)
|
| 51 |
+
]
|
| 52 |
+
),
|
| 53 |
+
|
| 54 |
+
"subtle_warmth": MasteringPreset(
|
| 55 |
+
"Subtle Warmth",
|
| 56 |
+
"Gentle low-end enhancement with smooth highs",
|
| 57 |
+
[
|
| 58 |
+
HighpassFilter(cutoff_frequency_hz=25),
|
| 59 |
+
LowShelfFilter(cutoff_frequency_hz=100, gain_db=1.5, q=0.7),
|
| 60 |
+
PeakFilter(cutoff_frequency_hz=200, gain_db=0.8, q=0.5),
|
| 61 |
+
PeakFilter(cutoff_frequency_hz=8000, gain_db=-0.5, q=1.0),
|
| 62 |
+
HighShelfFilter(cutoff_frequency_hz=12000, gain_db=1.0, q=0.7),
|
| 63 |
+
Compressor(threshold_db=-15, ratio=2.5, attack_ms=10, release_ms=150),
|
| 64 |
+
Limiter(threshold_db=-0.5, release_ms=100)
|
| 65 |
+
]
|
| 66 |
+
),
|
| 67 |
+
|
| 68 |
+
# Pop/Commercial Presets
|
| 69 |
+
"modern_pop": MasteringPreset(
|
| 70 |
+
"Modern Pop",
|
| 71 |
+
"Radio-ready pop sound with punchy compression",
|
| 72 |
+
[
|
| 73 |
+
HighpassFilter(cutoff_frequency_hz=35),
|
| 74 |
+
PeakFilter(cutoff_frequency_hz=80, gain_db=-1.5, q=0.8),
|
| 75 |
+
LowShelfFilter(cutoff_frequency_hz=120, gain_db=2.0, q=0.7),
|
| 76 |
+
PeakFilter(cutoff_frequency_hz=2500, gain_db=1.5, q=1.2),
|
| 77 |
+
HighShelfFilter(cutoff_frequency_hz=10000, gain_db=2.5, q=0.7),
|
| 78 |
+
Compressor(threshold_db=-10, ratio=4.0, attack_ms=3, release_ms=80),
|
| 79 |
+
Limiter(threshold_db=-0.3, release_ms=50)
|
| 80 |
+
]
|
| 81 |
+
),
|
| 82 |
+
|
| 83 |
+
"radio_ready": MasteringPreset(
|
| 84 |
+
"Radio Ready",
|
| 85 |
+
"Maximum loudness for commercial radio",
|
| 86 |
+
[
|
| 87 |
+
HighpassFilter(cutoff_frequency_hz=40),
|
| 88 |
+
PeakFilter(cutoff_frequency_hz=60, gain_db=-2.0, q=1.0),
|
| 89 |
+
LowShelfFilter(cutoff_frequency_hz=150, gain_db=1.5, q=0.8),
|
| 90 |
+
PeakFilter(cutoff_frequency_hz=3000, gain_db=2.0, q=1.5),
|
| 91 |
+
PeakFilter(cutoff_frequency_hz=8000, gain_db=1.5, q=1.0),
|
| 92 |
+
HighShelfFilter(cutoff_frequency_hz=12000, gain_db=3.0, q=0.7),
|
| 93 |
+
Compressor(threshold_db=-8, ratio=6.0, attack_ms=2, release_ms=60),
|
| 94 |
+
Limiter(threshold_db=-0.1, release_ms=30)
|
| 95 |
+
]
|
| 96 |
+
),
|
| 97 |
+
|
| 98 |
+
"punchy_commercial": MasteringPreset(
|
| 99 |
+
"Punchy Commercial",
|
| 100 |
+
"Aggressive punch for mainstream appeal",
|
| 101 |
+
[
|
| 102 |
+
HighpassFilter(cutoff_frequency_hz=30),
|
| 103 |
+
PeakFilter(cutoff_frequency_hz=100, gain_db=-2.0, q=1.2),
|
| 104 |
+
LowShelfFilter(cutoff_frequency_hz=200, gain_db=2.5, q=0.7),
|
| 105 |
+
PeakFilter(cutoff_frequency_hz=1000, gain_db=-1.0, q=0.8),
|
| 106 |
+
PeakFilter(cutoff_frequency_hz=4000, gain_db=2.5, q=1.5),
|
| 107 |
+
HighShelfFilter(cutoff_frequency_hz=10000, gain_db=2.0, q=0.8),
|
| 108 |
+
Compressor(threshold_db=-9, ratio=5.0, attack_ms=1, release_ms=50),
|
| 109 |
+
Limiter(threshold_db=-0.2, release_ms=40)
|
| 110 |
+
]
|
| 111 |
+
),
|
| 112 |
+
|
| 113 |
+
# Rock/Alternative Presets
|
| 114 |
+
"rock_master": MasteringPreset(
|
| 115 |
+
"Rock Master",
|
| 116 |
+
"Powerful rock sound with emphasis on mids",
|
| 117 |
+
[
|
| 118 |
+
HighpassFilter(cutoff_frequency_hz=35),
|
| 119 |
+
LowShelfFilter(cutoff_frequency_hz=100, gain_db=1.0, q=0.7),
|
| 120 |
+
PeakFilter(cutoff_frequency_hz=400, gain_db=1.5, q=1.0),
|
| 121 |
+
PeakFilter(cutoff_frequency_hz=2000, gain_db=2.0, q=1.2),
|
| 122 |
+
PeakFilter(cutoff_frequency_hz=5000, gain_db=1.5, q=1.0),
|
| 123 |
+
HighShelfFilter(cutoff_frequency_hz=8000, gain_db=1.0, q=0.8),
|
| 124 |
+
Compressor(threshold_db=-12, ratio=3.5, attack_ms=5, release_ms=120),
|
| 125 |
+
Limiter(threshold_db=-0.5, release_ms=80)
|
| 126 |
+
]
|
| 127 |
+
),
|
| 128 |
+
|
| 129 |
+
"metal_aggressive": MasteringPreset(
|
| 130 |
+
"Metal Aggressive",
|
| 131 |
+
"Heavy, aggressive metal mastering",
|
| 132 |
+
[
|
| 133 |
+
HighpassFilter(cutoff_frequency_hz=40),
|
| 134 |
+
PeakFilter(cutoff_frequency_hz=80, gain_db=-1.5, q=1.0),
|
| 135 |
+
LowShelfFilter(cutoff_frequency_hz=150, gain_db=2.0, q=0.8),
|
| 136 |
+
PeakFilter(cutoff_frequency_hz=800, gain_db=-1.5, q=1.2),
|
| 137 |
+
PeakFilter(cutoff_frequency_hz=3000, gain_db=3.0, q=1.5),
|
| 138 |
+
HighShelfFilter(cutoff_frequency_hz=10000, gain_db=2.5, q=0.7),
|
| 139 |
+
Compressor(threshold_db=-8, ratio=6.0, attack_ms=1, release_ms=50),
|
| 140 |
+
Limiter(threshold_db=-0.1, release_ms=30)
|
| 141 |
+
]
|
| 142 |
+
),
|
| 143 |
+
|
| 144 |
+
"indie_rock": MasteringPreset(
|
| 145 |
+
"Indie Rock",
|
| 146 |
+
"Lo-fi character with mid presence",
|
| 147 |
+
[
|
| 148 |
+
HighpassFilter(cutoff_frequency_hz=30),
|
| 149 |
+
LowShelfFilter(cutoff_frequency_hz=120, gain_db=0.5, q=0.7),
|
| 150 |
+
PeakFilter(cutoff_frequency_hz=500, gain_db=1.5, q=1.0),
|
| 151 |
+
PeakFilter(cutoff_frequency_hz=2500, gain_db=2.0, q=1.2),
|
| 152 |
+
PeakFilter(cutoff_frequency_hz=7000, gain_db=-0.5, q=1.0),
|
| 153 |
+
HighShelfFilter(cutoff_frequency_hz=10000, gain_db=0.5, q=0.8),
|
| 154 |
+
Compressor(threshold_db=-14, ratio=3.0, attack_ms=8, release_ms=150),
|
| 155 |
+
Limiter(threshold_db=-0.8, release_ms=100)
|
| 156 |
+
]
|
| 157 |
+
),
|
| 158 |
+
|
| 159 |
+
# Electronic/EDM Presets
|
| 160 |
+
"edm_club": MasteringPreset(
|
| 161 |
+
"EDM Club",
|
| 162 |
+
"Powerful club sound with deep bass",
|
| 163 |
+
[
|
| 164 |
+
HighpassFilter(cutoff_frequency_hz=25),
|
| 165 |
+
LowShelfFilter(cutoff_frequency_hz=80, gain_db=3.0, q=0.7),
|
| 166 |
+
PeakFilter(cutoff_frequency_hz=150, gain_db=2.0, q=0.8),
|
| 167 |
+
PeakFilter(cutoff_frequency_hz=1000, gain_db=-1.5, q=1.0),
|
| 168 |
+
PeakFilter(cutoff_frequency_hz=5000, gain_db=2.0, q=1.2),
|
| 169 |
+
HighShelfFilter(cutoff_frequency_hz=12000, gain_db=3.0, q=0.7),
|
| 170 |
+
Compressor(threshold_db=-6, ratio=8.0, attack_ms=0.5, release_ms=40),
|
| 171 |
+
Limiter(threshold_db=0.0, release_ms=20)
|
| 172 |
+
]
|
| 173 |
+
),
|
| 174 |
+
|
| 175 |
+
"house_groovy": MasteringPreset(
|
| 176 |
+
"House Groovy",
|
| 177 |
+
"Smooth house music with rolling bass",
|
| 178 |
+
[
|
| 179 |
+
HighpassFilter(cutoff_frequency_hz=30),
|
| 180 |
+
LowShelfFilter(cutoff_frequency_hz=100, gain_db=2.5, q=0.7),
|
| 181 |
+
PeakFilter(cutoff_frequency_hz=250, gain_db=1.0, q=0.8),
|
| 182 |
+
PeakFilter(cutoff_frequency_hz=2000, gain_db=0.5, q=1.0),
|
| 183 |
+
PeakFilter(cutoff_frequency_hz=8000, gain_db=1.5, q=1.0),
|
| 184 |
+
HighShelfFilter(cutoff_frequency_hz=10000, gain_db=2.0, q=0.7),
|
| 185 |
+
Compressor(threshold_db=-10, ratio=4.0, attack_ms=2, release_ms=60),
|
| 186 |
+
Limiter(threshold_db=-0.2, release_ms=40)
|
| 187 |
+
]
|
| 188 |
+
),
|
| 189 |
+
|
| 190 |
+
"techno_dark": MasteringPreset(
|
| 191 |
+
"Techno Dark",
|
| 192 |
+
"Dark, pounding techno master",
|
| 193 |
+
[
|
| 194 |
+
HighpassFilter(cutoff_frequency_hz=35),
|
| 195 |
+
PeakFilter(cutoff_frequency_hz=60, gain_db=2.0, q=1.0),
|
| 196 |
+
LowShelfFilter(cutoff_frequency_hz=120, gain_db=1.5, q=0.8),
|
| 197 |
+
PeakFilter(cutoff_frequency_hz=800, gain_db=-2.0, q=1.5),
|
| 198 |
+
PeakFilter(cutoff_frequency_hz=4000, gain_db=1.0, q=1.0),
|
| 199 |
+
HighShelfFilter(cutoff_frequency_hz=10000, gain_db=-0.5, q=0.8),
|
| 200 |
+
Compressor(threshold_db=-8, ratio=6.0, attack_ms=1, release_ms=50),
|
| 201 |
+
Limiter(threshold_db=-0.1, release_ms=30)
|
| 202 |
+
]
|
| 203 |
+
),
|
| 204 |
+
|
| 205 |
+
"dubstep_heavy": MasteringPreset(
|
| 206 |
+
"Dubstep Heavy",
|
| 207 |
+
"Sub-bass focused with crispy highs",
|
| 208 |
+
[
|
| 209 |
+
HighpassFilter(cutoff_frequency_hz=20),
|
| 210 |
+
PeakFilter(cutoff_frequency_hz=50, gain_db=3.5, q=1.2),
|
| 211 |
+
LowShelfFilter(cutoff_frequency_hz=100, gain_db=2.5, q=0.8),
|
| 212 |
+
PeakFilter(cutoff_frequency_hz=500, gain_db=-2.0, q=1.5),
|
| 213 |
+
PeakFilter(cutoff_frequency_hz=6000, gain_db=2.5, q=1.2),
|
| 214 |
+
HighShelfFilter(cutoff_frequency_hz=12000, gain_db=3.5, q=0.7),
|
| 215 |
+
Compressor(threshold_db=-6, ratio=10.0, attack_ms=0.3, release_ms=30),
|
| 216 |
+
Limiter(threshold_db=0.0, release_ms=20)
|
| 217 |
+
]
|
| 218 |
+
),
|
| 219 |
+
|
| 220 |
+
# Hip-Hop/R&B Presets
|
| 221 |
+
"hiphop_modern": MasteringPreset(
|
| 222 |
+
"Hip-Hop Modern",
|
| 223 |
+
"Contemporary hip-hop with deep bass",
|
| 224 |
+
[
|
| 225 |
+
HighpassFilter(cutoff_frequency_hz=25),
|
| 226 |
+
LowShelfFilter(cutoff_frequency_hz=80, gain_db=2.5, q=0.7),
|
| 227 |
+
PeakFilter(cutoff_frequency_hz=150, gain_db=1.5, q=0.8),
|
| 228 |
+
PeakFilter(cutoff_frequency_hz=1000, gain_db=-1.0, q=1.0),
|
| 229 |
+
PeakFilter(cutoff_frequency_hz=3500, gain_db=2.0, q=1.2),
|
| 230 |
+
HighShelfFilter(cutoff_frequency_hz=10000, gain_db=1.5, q=0.7),
|
| 231 |
+
Compressor(threshold_db=-10, ratio=4.0, attack_ms=5, release_ms=80),
|
| 232 |
+
Limiter(threshold_db=-0.3, release_ms=60)
|
| 233 |
+
]
|
| 234 |
+
),
|
| 235 |
+
|
| 236 |
+
"trap_808": MasteringPreset(
|
| 237 |
+
"Trap 808",
|
| 238 |
+
"808-focused trap mastering",
|
| 239 |
+
[
|
| 240 |
+
HighpassFilter(cutoff_frequency_hz=20),
|
| 241 |
+
PeakFilter(cutoff_frequency_hz=50, gain_db=3.0, q=1.0),
|
| 242 |
+
LowShelfFilter(cutoff_frequency_hz=100, gain_db=2.0, q=0.7),
|
| 243 |
+
PeakFilter(cutoff_frequency_hz=800, gain_db=-1.5, q=1.2),
|
| 244 |
+
PeakFilter(cutoff_frequency_hz=5000, gain_db=2.5, q=1.2),
|
| 245 |
+
HighShelfFilter(cutoff_frequency_hz=12000, gain_db=2.0, q=0.7),
|
| 246 |
+
Compressor(threshold_db=-8, ratio=5.0, attack_ms=3, release_ms=60),
|
| 247 |
+
Limiter(threshold_db=-0.2, release_ms=40)
|
| 248 |
+
]
|
| 249 |
+
),
|
| 250 |
+
|
| 251 |
+
"rnb_smooth": MasteringPreset(
|
| 252 |
+
"R&B Smooth",
|
| 253 |
+
"Silky smooth R&B sound",
|
| 254 |
+
[
|
| 255 |
+
HighpassFilter(cutoff_frequency_hz=30),
|
| 256 |
+
LowShelfFilter(cutoff_frequency_hz=100, gain_db=1.5, q=0.7),
|
| 257 |
+
PeakFilter(cutoff_frequency_hz=300, gain_db=1.0, q=0.8),
|
| 258 |
+
PeakFilter(cutoff_frequency_hz=2000, gain_db=0.5, q=1.0),
|
| 259 |
+
PeakFilter(cutoff_frequency_hz=6000, gain_db=1.5, q=1.0),
|
| 260 |
+
HighShelfFilter(cutoff_frequency_hz=10000, gain_db=2.0, q=0.7),
|
| 261 |
+
Compressor(threshold_db=-12, ratio=3.0, attack_ms=8, release_ms=120),
|
| 262 |
+
Limiter(threshold_db=-0.5, release_ms=80)
|
| 263 |
+
]
|
| 264 |
+
),
|
| 265 |
+
|
| 266 |
+
# Acoustic/Organic Presets
|
| 267 |
+
"acoustic_natural": MasteringPreset(
|
| 268 |
+
"Acoustic Natural",
|
| 269 |
+
"Natural, transparent acoustic sound",
|
| 270 |
+
[
|
| 271 |
+
HighpassFilter(cutoff_frequency_hz=25),
|
| 272 |
+
LowShelfFilter(cutoff_frequency_hz=100, gain_db=0.5, q=0.7),
|
| 273 |
+
PeakFilter(cutoff_frequency_hz=500, gain_db=0.8, q=0.8),
|
| 274 |
+
PeakFilter(cutoff_frequency_hz=3000, gain_db=1.0, q=1.0),
|
| 275 |
+
HighShelfFilter(cutoff_frequency_hz=8000, gain_db=1.5, q=0.7),
|
| 276 |
+
Compressor(threshold_db=-16, ratio=2.0, attack_ms=15, release_ms=200),
|
| 277 |
+
Limiter(threshold_db=-1.0, release_ms=120)
|
| 278 |
+
]
|
| 279 |
+
),
|
| 280 |
+
|
| 281 |
+
"folk_warm": MasteringPreset(
|
| 282 |
+
"Folk Warm",
|
| 283 |
+
"Warm, intimate folk sound",
|
| 284 |
+
[
|
| 285 |
+
HighpassFilter(cutoff_frequency_hz=30),
|
| 286 |
+
LowShelfFilter(cutoff_frequency_hz=150, gain_db=1.0, q=0.7),
|
| 287 |
+
PeakFilter(cutoff_frequency_hz=400, gain_db=1.5, q=0.8),
|
| 288 |
+
PeakFilter(cutoff_frequency_hz=2500, gain_db=1.0, q=1.0),
|
| 289 |
+
PeakFilter(cutoff_frequency_hz=7000, gain_db=-0.5, q=1.0),
|
| 290 |
+
HighShelfFilter(cutoff_frequency_hz=10000, gain_db=1.0, q=0.8),
|
| 291 |
+
Compressor(threshold_db=-18, ratio=2.5, attack_ms=20, release_ms=250),
|
| 292 |
+
Limiter(threshold_db=-1.5, release_ms=150)
|
| 293 |
+
]
|
| 294 |
+
),
|
| 295 |
+
|
| 296 |
+
"jazz_vintage": MasteringPreset(
|
| 297 |
+
"Jazz Vintage",
|
| 298 |
+
"Classic jazz warmth and space",
|
| 299 |
+
[
|
| 300 |
+
HighpassFilter(cutoff_frequency_hz=35),
|
| 301 |
+
LowShelfFilter(cutoff_frequency_hz=120, gain_db=1.0, q=0.7),
|
| 302 |
+
PeakFilter(cutoff_frequency_hz=500, gain_db=1.0, q=0.8),
|
| 303 |
+
PeakFilter(cutoff_frequency_hz=2000, gain_db=0.5, q=0.8),
|
| 304 |
+
PeakFilter(cutoff_frequency_hz=8000, gain_db=-1.0, q=1.0),
|
| 305 |
+
HighShelfFilter(cutoff_frequency_hz=12000, gain_db=0.5, q=0.8),
|
| 306 |
+
Compressor(threshold_db=-20, ratio=2.0, attack_ms=25, release_ms=300),
|
| 307 |
+
Limiter(threshold_db=-2.0, release_ms=180)
|
| 308 |
+
]
|
| 309 |
+
),
|
| 310 |
+
|
| 311 |
+
# Classical/Orchestral Presets
|
| 312 |
+
"orchestral_wide": MasteringPreset(
|
| 313 |
+
"Orchestral Wide",
|
| 314 |
+
"Wide, natural orchestral sound",
|
| 315 |
+
[
|
| 316 |
+
HighpassFilter(cutoff_frequency_hz=20),
|
| 317 |
+
LowShelfFilter(cutoff_frequency_hz=80, gain_db=0.5, q=0.7),
|
| 318 |
+
PeakFilter(cutoff_frequency_hz=300, gain_db=0.5, q=0.7),
|
| 319 |
+
PeakFilter(cutoff_frequency_hz=4000, gain_db=0.8, q=0.8),
|
| 320 |
+
HighShelfFilter(cutoff_frequency_hz=10000, gain_db=1.0, q=0.7),
|
| 321 |
+
Compressor(threshold_db=-24, ratio=1.5, attack_ms=30, release_ms=400),
|
| 322 |
+
Limiter(threshold_db=-3.0, release_ms=250)
|
| 323 |
+
]
|
| 324 |
+
),
|
| 325 |
+
|
| 326 |
+
"classical_concert": MasteringPreset(
|
| 327 |
+
"Classical Concert",
|
| 328 |
+
"Concert hall ambience and dynamics",
|
| 329 |
+
[
|
| 330 |
+
HighpassFilter(cutoff_frequency_hz=25),
|
| 331 |
+
PeakFilter(cutoff_frequency_hz=200, gain_db=0.5, q=0.7),
|
| 332 |
+
PeakFilter(cutoff_frequency_hz=1000, gain_db=0.3, q=0.8),
|
| 333 |
+
PeakFilter(cutoff_frequency_hz=6000, gain_db=0.8, q=0.8),
|
| 334 |
+
HighShelfFilter(cutoff_frequency_hz=12000, gain_db=0.5, q=0.7),
|
| 335 |
+
Compressor(threshold_db=-30, ratio=1.2, attack_ms=50, release_ms=500),
|
| 336 |
+
Limiter(threshold_db=-4.0, release_ms=300)
|
| 337 |
+
]
|
| 338 |
+
),
|
| 339 |
+
|
| 340 |
+
# Ambient/Atmospheric Presets
|
| 341 |
+
"ambient_spacious": MasteringPreset(
|
| 342 |
+
"Ambient Spacious",
|
| 343 |
+
"Wide, spacious ambient master",
|
| 344 |
+
[
|
| 345 |
+
HighpassFilter(cutoff_frequency_hz=25),
|
| 346 |
+
LowShelfFilter(cutoff_frequency_hz=100, gain_db=0.5, q=0.7),
|
| 347 |
+
PeakFilter(cutoff_frequency_hz=500, gain_db=-0.5, q=0.8),
|
| 348 |
+
PeakFilter(cutoff_frequency_hz=3000, gain_db=0.5, q=1.0),
|
| 349 |
+
HighShelfFilter(cutoff_frequency_hz=8000, gain_db=1.5, q=0.7),
|
| 350 |
+
Compressor(threshold_db=-20, ratio=2.0, attack_ms=50, release_ms=400),
|
| 351 |
+
Limiter(threshold_db=-2.0, release_ms=200)
|
| 352 |
+
]
|
| 353 |
+
),
|
| 354 |
+
|
| 355 |
+
"cinematic_epic": MasteringPreset(
|
| 356 |
+
"Cinematic Epic",
|
| 357 |
+
"Big, powerful cinematic sound",
|
| 358 |
+
[
|
| 359 |
+
HighpassFilter(cutoff_frequency_hz=30),
|
| 360 |
+
LowShelfFilter(cutoff_frequency_hz=100, gain_db=2.0, q=0.7),
|
| 361 |
+
PeakFilter(cutoff_frequency_hz=250, gain_db=1.0, q=0.8),
|
| 362 |
+
PeakFilter(cutoff_frequency_hz=2000, gain_db=1.5, q=1.0),
|
| 363 |
+
PeakFilter(cutoff_frequency_hz=6000, gain_db=2.0, q=1.0),
|
| 364 |
+
HighShelfFilter(cutoff_frequency_hz=10000, gain_db=2.5, q=0.7),
|
| 365 |
+
Compressor(threshold_db=-14, ratio=3.0, attack_ms=10, release_ms=150),
|
| 366 |
+
Limiter(threshold_db=-0.5, release_ms=100)
|
| 367 |
+
]
|
| 368 |
+
),
|
| 369 |
+
|
| 370 |
+
# Vintage/Lo-Fi Presets
|
| 371 |
+
"lofi_chill": MasteringPreset(
|
| 372 |
+
"Lo-Fi Chill",
|
| 373 |
+
"Vintage lo-fi character",
|
| 374 |
+
[
|
| 375 |
+
HighpassFilter(cutoff_frequency_hz=50),
|
| 376 |
+
LowpassFilter(cutoff_frequency_hz=10000),
|
| 377 |
+
LowShelfFilter(cutoff_frequency_hz=150, gain_db=1.5, q=0.7),
|
| 378 |
+
PeakFilter(cutoff_frequency_hz=800, gain_db=-1.0, q=1.2),
|
| 379 |
+
PeakFilter(cutoff_frequency_hz=4000, gain_db=-1.5, q=1.0),
|
| 380 |
+
Compressor(threshold_db=-12, ratio=3.0, attack_ms=15, release_ms=180),
|
| 381 |
+
Limiter(threshold_db=-1.0, release_ms=120)
|
| 382 |
+
]
|
| 383 |
+
),
|
| 384 |
+
|
| 385 |
+
"vintage_vinyl": MasteringPreset(
|
| 386 |
+
"Vintage Vinyl",
|
| 387 |
+
"Classic vinyl record warmth",
|
| 388 |
+
[
|
| 389 |
+
HighpassFilter(cutoff_frequency_hz=40),
|
| 390 |
+
LowpassFilter(cutoff_frequency_hz=12000),
|
| 391 |
+
LowShelfFilter(cutoff_frequency_hz=120, gain_db=2.0, q=0.7),
|
| 392 |
+
PeakFilter(cutoff_frequency_hz=1000, gain_db=-0.5, q=0.8),
|
| 393 |
+
PeakFilter(cutoff_frequency_hz=5000, gain_db=-1.0, q=1.0),
|
| 394 |
+
HighShelfFilter(cutoff_frequency_hz=8000, gain_db=-1.5, q=0.8),
|
| 395 |
+
Compressor(threshold_db=-16, ratio=2.5, attack_ms=20, release_ms=200),
|
| 396 |
+
Limiter(threshold_db=-1.5, release_ms=150)
|
| 397 |
+
]
|
| 398 |
+
),
|
| 399 |
+
|
| 400 |
+
"retro_80s": MasteringPreset(
|
| 401 |
+
"Retro 80s",
|
| 402 |
+
"80s digital warmth and punch",
|
| 403 |
+
[
|
| 404 |
+
HighpassFilter(cutoff_frequency_hz=35),
|
| 405 |
+
LowShelfFilter(cutoff_frequency_hz=100, gain_db=1.5, q=0.7),
|
| 406 |
+
PeakFilter(cutoff_frequency_hz=800, gain_db=1.0, q=1.0),
|
| 407 |
+
PeakFilter(cutoff_frequency_hz=3000, gain_db=2.0, q=1.2),
|
| 408 |
+
PeakFilter(cutoff_frequency_hz=8000, gain_db=1.5, q=1.0),
|
| 409 |
+
HighShelfFilter(cutoff_frequency_hz=10000, gain_db=1.0, q=0.8),
|
| 410 |
+
Compressor(threshold_db=-10, ratio=4.0, attack_ms=5, release_ms=100),
|
| 411 |
+
Limiter(threshold_db=-0.5, release_ms=80)
|
| 412 |
+
]
|
| 413 |
+
),
|
| 414 |
+
|
| 415 |
+
# Specialized Presets
|
| 416 |
+
"vocal_focused": MasteringPreset(
|
| 417 |
+
"Vocal Focused",
|
| 418 |
+
"Emphasizes vocal clarity and presence",
|
| 419 |
+
[
|
| 420 |
+
HighpassFilter(cutoff_frequency_hz=30),
|
| 421 |
+
PeakFilter(cutoff_frequency_hz=200, gain_db=-1.0, q=0.8),
|
| 422 |
+
PeakFilter(cutoff_frequency_hz=1000, gain_db=1.0, q=1.0),
|
| 423 |
+
PeakFilter(cutoff_frequency_hz=3000, gain_db=2.5, q=1.2),
|
| 424 |
+
PeakFilter(cutoff_frequency_hz=5000, gain_db=1.5, q=1.0),
|
| 425 |
+
HighShelfFilter(cutoff_frequency_hz=10000, gain_db=1.0, q=0.7),
|
| 426 |
+
Compressor(threshold_db=-12, ratio=3.0, attack_ms=5, release_ms=100),
|
| 427 |
+
Limiter(threshold_db=-0.5, release_ms=80)
|
| 428 |
+
]
|
| 429 |
+
),
|
| 430 |
+
|
| 431 |
+
"bass_heavy": MasteringPreset(
|
| 432 |
+
"Bass Heavy",
|
| 433 |
+
"Maximum low-end power",
|
| 434 |
+
[
|
| 435 |
+
HighpassFilter(cutoff_frequency_hz=20),
|
| 436 |
+
LowShelfFilter(cutoff_frequency_hz=60, gain_db=4.0, q=0.7),
|
| 437 |
+
PeakFilter(cutoff_frequency_hz=100, gain_db=2.5, q=0.8),
|
| 438 |
+
PeakFilter(cutoff_frequency_hz=500, gain_db=-1.5, q=1.0),
|
| 439 |
+
PeakFilter(cutoff_frequency_hz=4000, gain_db=1.0, q=1.0),
|
| 440 |
+
HighShelfFilter(cutoff_frequency_hz=10000, gain_db=1.5, q=0.7),
|
| 441 |
+
Compressor(threshold_db=-10, ratio=4.0, attack_ms=10, release_ms=100),
|
| 442 |
+
Limiter(threshold_db=-0.3, release_ms=60)
|
| 443 |
+
]
|
| 444 |
+
),
|
| 445 |
+
|
| 446 |
+
"bright_airy": MasteringPreset(
|
| 447 |
+
"Bright & Airy",
|
| 448 |
+
"Crystal clear highs with airiness",
|
| 449 |
+
[
|
| 450 |
+
HighpassFilter(cutoff_frequency_hz=30),
|
| 451 |
+
LowShelfFilter(cutoff_frequency_hz=100, gain_db=-0.5, q=0.7),
|
| 452 |
+
PeakFilter(cutoff_frequency_hz=500, gain_db=-1.0, q=0.8),
|
| 453 |
+
PeakFilter(cutoff_frequency_hz=5000, gain_db=2.0, q=1.0),
|
| 454 |
+
PeakFilter(cutoff_frequency_hz=10000, gain_db=2.5, q=1.0),
|
| 455 |
+
HighShelfFilter(cutoff_frequency_hz=12000, gain_db=3.0, q=0.7),
|
| 456 |
+
Compressor(threshold_db=-14, ratio=2.5, attack_ms=8, release_ms=120),
|
| 457 |
+
Limiter(threshold_db=-0.8, release_ms=100)
|
| 458 |
+
]
|
| 459 |
+
),
|
| 460 |
+
|
| 461 |
+
"midrange_punch": MasteringPreset(
|
| 462 |
+
"Midrange Punch",
|
| 463 |
+
"Powerful mids for presence",
|
| 464 |
+
[
|
| 465 |
+
HighpassFilter(cutoff_frequency_hz=30),
|
| 466 |
+
LowShelfFilter(cutoff_frequency_hz=100, gain_db=0.5, q=0.7),
|
| 467 |
+
PeakFilter(cutoff_frequency_hz=500, gain_db=2.0, q=1.0),
|
| 468 |
+
PeakFilter(cutoff_frequency_hz=1500, gain_db=2.5, q=1.2),
|
| 469 |
+
PeakFilter(cutoff_frequency_hz=3000, gain_db=2.0, q=1.0),
|
| 470 |
+
HighShelfFilter(cutoff_frequency_hz=8000, gain_db=0.5, q=0.7),
|
| 471 |
+
Compressor(threshold_db=-11, ratio=3.5, attack_ms=5, release_ms=90),
|
| 472 |
+
Limiter(threshold_db=-0.5, release_ms=70)
|
| 473 |
+
]
|
| 474 |
+
),
|
| 475 |
+
|
| 476 |
+
"dynamic_range": MasteringPreset(
|
| 477 |
+
"Dynamic Range",
|
| 478 |
+
"Preserves maximum dynamics",
|
| 479 |
+
[
|
| 480 |
+
HighpassFilter(cutoff_frequency_hz=25),
|
| 481 |
+
PeakFilter(cutoff_frequency_hz=100, gain_db=-0.5, q=0.7),
|
| 482 |
+
PeakFilter(cutoff_frequency_hz=3000, gain_db=0.5, q=0.8),
|
| 483 |
+
HighShelfFilter(cutoff_frequency_hz=10000, gain_db=1.0, q=0.7),
|
| 484 |
+
Compressor(threshold_db=-20, ratio=1.5, attack_ms=20, release_ms=250),
|
| 485 |
+
Limiter(threshold_db=-2.0, release_ms=200)
|
| 486 |
+
]
|
| 487 |
+
),
|
| 488 |
+
|
| 489 |
+
"streaming_optimized": MasteringPreset(
|
| 490 |
+
"Streaming Optimized",
|
| 491 |
+
"Optimized for streaming platforms (Spotify, Apple Music)",
|
| 492 |
+
[
|
| 493 |
+
HighpassFilter(cutoff_frequency_hz=30),
|
| 494 |
+
LowShelfFilter(cutoff_frequency_hz=100, gain_db=1.0, q=0.7),
|
| 495 |
+
PeakFilter(cutoff_frequency_hz=500, gain_db=0.5, q=0.8),
|
| 496 |
+
PeakFilter(cutoff_frequency_hz=3000, gain_db=1.5, q=1.0),
|
| 497 |
+
HighShelfFilter(cutoff_frequency_hz=10000, gain_db=1.5, q=0.7),
|
| 498 |
+
Compressor(threshold_db=-14, ratio=3.0, attack_ms=5, release_ms=100),
|
| 499 |
+
Limiter(threshold_db=-1.0, release_ms=100)
|
| 500 |
+
]
|
| 501 |
+
)
|
| 502 |
+
}
|
| 503 |
+
|
| 504 |
+
def __init__(self):
|
| 505 |
+
"""Initialize mastering service"""
|
| 506 |
+
logger.info("Mastering service initialized with 32 presets")
|
| 507 |
+
|
| 508 |
+
def apply_preset(self, audio_path: str, preset_name: str, output_path: str) -> str:
|
| 509 |
+
"""
|
| 510 |
+
Apply mastering preset to audio file
|
| 511 |
+
|
| 512 |
+
Args:
|
| 513 |
+
audio_path: Path to input audio file
|
| 514 |
+
preset_name: Name of preset to apply
|
| 515 |
+
output_path: Path to save processed audio
|
| 516 |
+
|
| 517 |
+
Returns:
|
| 518 |
+
Path to processed audio file
|
| 519 |
+
"""
|
| 520 |
+
try:
|
| 521 |
+
if preset_name not in self.PRESETS:
|
| 522 |
+
raise ValueError(f"Unknown preset: {preset_name}")
|
| 523 |
+
|
| 524 |
+
preset = self.PRESETS[preset_name]
|
| 525 |
+
logger.info(f"Applying preset '{preset.name}' to {audio_path}")
|
| 526 |
+
|
| 527 |
+
# Load audio
|
| 528 |
+
audio, sr = sf.read(audio_path)
|
| 529 |
+
|
| 530 |
+
# Ensure stereo
|
| 531 |
+
if len(audio.shape) == 1:
|
| 532 |
+
audio = np.stack([audio, audio], axis=1)
|
| 533 |
+
|
| 534 |
+
# Create pedalboard with preset chain
|
| 535 |
+
board = Pedalboard(preset.chain)
|
| 536 |
+
|
| 537 |
+
# Process audio
|
| 538 |
+
processed = board(audio.T, sr)
|
| 539 |
+
|
| 540 |
+
# Save processed audio
|
| 541 |
+
sf.write(output_path, processed.T, sr)
|
| 542 |
+
logger.info(f"Saved mastered audio to {output_path}")
|
| 543 |
+
|
| 544 |
+
return output_path
|
| 545 |
+
|
| 546 |
+
except Exception as e:
|
| 547 |
+
logger.error(f"Error applying preset: {str(e)}", exc_info=True)
|
| 548 |
+
raise
|
| 549 |
+
|
| 550 |
+
def apply_custom_eq(
|
| 551 |
+
self,
|
| 552 |
+
audio_path: str,
|
| 553 |
+
output_path: str,
|
| 554 |
+
eq_bands: List[Dict],
|
| 555 |
+
compression: Optional[Dict] = None,
|
| 556 |
+
limiting: Optional[Dict] = None
|
| 557 |
+
) -> str:
|
| 558 |
+
"""
|
| 559 |
+
Apply custom EQ settings to audio file
|
| 560 |
+
|
| 561 |
+
Args:
|
| 562 |
+
audio_path: Path to input audio file
|
| 563 |
+
output_path: Path to save processed audio
|
| 564 |
+
eq_bands: List of EQ band settings
|
| 565 |
+
compression: Compression settings (optional)
|
| 566 |
+
limiting: Limiter settings (optional)
|
| 567 |
+
|
| 568 |
+
Returns:
|
| 569 |
+
Path to processed audio file
|
| 570 |
+
"""
|
| 571 |
+
try:
|
| 572 |
+
logger.info(f"Applying custom EQ to {audio_path}")
|
| 573 |
+
|
| 574 |
+
# Load audio
|
| 575 |
+
audio, sr = sf.read(audio_path)
|
| 576 |
+
|
| 577 |
+
# Ensure stereo
|
| 578 |
+
if len(audio.shape) == 1:
|
| 579 |
+
audio = np.stack([audio, audio], axis=1)
|
| 580 |
+
|
| 581 |
+
# Build processing chain
|
| 582 |
+
chain = []
|
| 583 |
+
|
| 584 |
+
# Add EQ bands
|
| 585 |
+
for band in eq_bands:
|
| 586 |
+
band_type = band.get('type', 'peak')
|
| 587 |
+
freq = band.get('frequency', 1000)
|
| 588 |
+
gain = band.get('gain', 0)
|
| 589 |
+
q = band.get('q', 1.0)
|
| 590 |
+
|
| 591 |
+
if band_type == 'highpass':
|
| 592 |
+
chain.append(HighpassFilter(cutoff_frequency_hz=freq))
|
| 593 |
+
elif band_type == 'lowpass':
|
| 594 |
+
chain.append(LowpassFilter(cutoff_frequency_hz=freq))
|
| 595 |
+
elif band_type == 'lowshelf':
|
| 596 |
+
chain.append(LowShelfFilter(cutoff_frequency_hz=freq, gain_db=gain, q=q))
|
| 597 |
+
elif band_type == 'highshelf':
|
| 598 |
+
chain.append(HighShelfFilter(cutoff_frequency_hz=freq, gain_db=gain, q=q))
|
| 599 |
+
else: # peak
|
| 600 |
+
chain.append(PeakFilter(cutoff_frequency_hz=freq, gain_db=gain, q=q))
|
| 601 |
+
|
| 602 |
+
# Add compression if specified
|
| 603 |
+
if compression:
|
| 604 |
+
chain.append(Compressor(
|
| 605 |
+
threshold_db=compression.get('threshold', -12),
|
| 606 |
+
ratio=compression.get('ratio', 2.0),
|
| 607 |
+
attack_ms=compression.get('attack', 5),
|
| 608 |
+
release_ms=compression.get('release', 100)
|
| 609 |
+
))
|
| 610 |
+
|
| 611 |
+
# Add limiting if specified
|
| 612 |
+
if limiting:
|
| 613 |
+
chain.append(Limiter(
|
| 614 |
+
threshold_db=limiting.get('threshold', -1.0),
|
| 615 |
+
release_ms=limiting.get('release', 100)
|
| 616 |
+
))
|
| 617 |
+
|
| 618 |
+
# Create and apply pedalboard
|
| 619 |
+
board = Pedalboard(chain)
|
| 620 |
+
processed = board(audio.T, sr)
|
| 621 |
+
|
| 622 |
+
# Save processed audio
|
| 623 |
+
sf.write(output_path, processed.T, sr)
|
| 624 |
+
logger.info(f"Saved custom EQ audio to {output_path}")
|
| 625 |
+
|
| 626 |
+
return output_path
|
| 627 |
+
|
| 628 |
+
except Exception as e:
|
| 629 |
+
logger.error(f"Error applying custom EQ: {str(e)}", exc_info=True)
|
| 630 |
+
raise
|
| 631 |
+
|
| 632 |
+
def get_preset_list(self) -> List[Dict]:
|
| 633 |
+
"""Get list of available presets with descriptions"""
|
| 634 |
+
return [
|
| 635 |
+
{
|
| 636 |
+
'id': key,
|
| 637 |
+
'name': preset.name,
|
| 638 |
+
'description': preset.description
|
| 639 |
+
}
|
| 640 |
+
for key, preset in self.PRESETS.items()
|
| 641 |
+
]
|
backend/services/style_consistency_service.py
ADDED
|
@@ -0,0 +1,340 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Style Consistency Service
|
| 3 |
+
Uses audio feature extraction and style embeddings to ensure consistent generation
|
| 4 |
+
"""
|
| 5 |
+
import os
|
| 6 |
+
import logging
|
| 7 |
+
import numpy as np
|
| 8 |
+
import librosa
|
| 9 |
+
import soundfile as sf
|
| 10 |
+
from pathlib import Path
|
| 11 |
+
from typing import List, Optional, Dict, Tuple
|
| 12 |
+
import torch
|
| 13 |
+
|
| 14 |
+
logger = logging.getLogger(__name__)
|
| 15 |
+
|
| 16 |
+
class StyleConsistencyService:
|
| 17 |
+
"""
|
| 18 |
+
Ensures style consistency across generated clips by analyzing existing audio
|
| 19 |
+
and providing style guidance for new generations
|
| 20 |
+
"""
|
| 21 |
+
|
| 22 |
+
def __init__(self):
|
| 23 |
+
self.sample_rate = 44100
|
| 24 |
+
logger.info("Style Consistency Service initialized")
|
| 25 |
+
|
| 26 |
+
def extract_audio_features(self, audio_path: str) -> Dict[str, np.ndarray]:
|
| 27 |
+
"""
|
| 28 |
+
Extract comprehensive audio features for style analysis
|
| 29 |
+
|
| 30 |
+
Args:
|
| 31 |
+
audio_path: Path to audio file
|
| 32 |
+
|
| 33 |
+
Returns:
|
| 34 |
+
Dictionary of extracted features
|
| 35 |
+
"""
|
| 36 |
+
try:
|
| 37 |
+
# Load audio
|
| 38 |
+
audio, sr = librosa.load(audio_path, sr=self.sample_rate)
|
| 39 |
+
|
| 40 |
+
# Extract features
|
| 41 |
+
features = {}
|
| 42 |
+
|
| 43 |
+
# Spectral features
|
| 44 |
+
features['mel_spectrogram'] = librosa.feature.melspectrogram(
|
| 45 |
+
y=audio, sr=sr, n_mels=128, n_fft=2048, hop_length=512
|
| 46 |
+
)
|
| 47 |
+
features['spectral_centroid'] = librosa.feature.spectral_centroid(
|
| 48 |
+
y=audio, sr=sr, n_fft=2048, hop_length=512
|
| 49 |
+
)
|
| 50 |
+
features['spectral_bandwidth'] = librosa.feature.spectral_bandwidth(
|
| 51 |
+
y=audio, sr=sr, n_fft=2048, hop_length=512
|
| 52 |
+
)
|
| 53 |
+
features['spectral_contrast'] = librosa.feature.spectral_contrast(
|
| 54 |
+
y=audio, sr=sr, n_fft=2048, hop_length=512, n_bands=6
|
| 55 |
+
)
|
| 56 |
+
features['spectral_rolloff'] = librosa.feature.spectral_rolloff(
|
| 57 |
+
y=audio, sr=sr, n_fft=2048, hop_length=512
|
| 58 |
+
)
|
| 59 |
+
|
| 60 |
+
# Temporal features
|
| 61 |
+
features['zero_crossing_rate'] = librosa.feature.zero_crossing_rate(
|
| 62 |
+
audio, frame_length=2048, hop_length=512
|
| 63 |
+
)
|
| 64 |
+
features['rms'] = librosa.feature.rms(
|
| 65 |
+
y=audio, frame_length=2048, hop_length=512
|
| 66 |
+
)
|
| 67 |
+
|
| 68 |
+
# Harmonic/percussive
|
| 69 |
+
harmonic, percussive = librosa.effects.hpss(audio)
|
| 70 |
+
features['harmonic_ratio'] = np.mean(np.abs(harmonic)) / (np.mean(np.abs(audio)) + 1e-10)
|
| 71 |
+
features['percussive_ratio'] = np.mean(np.abs(percussive)) / (np.mean(np.abs(audio)) + 1e-10)
|
| 72 |
+
|
| 73 |
+
# Chroma features
|
| 74 |
+
features['chroma'] = librosa.feature.chroma_stft(
|
| 75 |
+
y=audio, sr=sr, n_chroma=12, n_fft=2048, hop_length=512
|
| 76 |
+
)
|
| 77 |
+
|
| 78 |
+
# MFCC
|
| 79 |
+
features['mfcc'] = librosa.feature.mfcc(
|
| 80 |
+
y=audio, sr=sr, n_mfcc=20
|
| 81 |
+
)
|
| 82 |
+
|
| 83 |
+
# Tempo and rhythm
|
| 84 |
+
tempo, beats = librosa.beat.beat_track(y=audio, sr=sr)
|
| 85 |
+
features['tempo'] = tempo
|
| 86 |
+
features['beat_frames'] = beats
|
| 87 |
+
|
| 88 |
+
logger.info(f"Extracted features from {audio_path}")
|
| 89 |
+
return features
|
| 90 |
+
|
| 91 |
+
except Exception as e:
|
| 92 |
+
logger.error(f"Failed to extract features from {audio_path}: {e}")
|
| 93 |
+
return {}
|
| 94 |
+
|
| 95 |
+
def compute_style_statistics(self, features: Dict[str, np.ndarray]) -> Dict[str, float]:
|
| 96 |
+
"""
|
| 97 |
+
Compute statistical summaries of audio features for style matching
|
| 98 |
+
|
| 99 |
+
Args:
|
| 100 |
+
features: Dictionary of extracted features
|
| 101 |
+
|
| 102 |
+
Returns:
|
| 103 |
+
Dictionary of style statistics
|
| 104 |
+
"""
|
| 105 |
+
stats = {}
|
| 106 |
+
|
| 107 |
+
# Compute mean/std for spectral features
|
| 108 |
+
for key in ['spectral_centroid', 'spectral_bandwidth', 'spectral_rolloff',
|
| 109 |
+
'zero_crossing_rate', 'rms']:
|
| 110 |
+
if key in features:
|
| 111 |
+
stats[f'{key}_mean'] = float(np.mean(features[key]))
|
| 112 |
+
stats[f'{key}_std'] = float(np.std(features[key]))
|
| 113 |
+
|
| 114 |
+
# Spectral contrast summary
|
| 115 |
+
if 'spectral_contrast' in features:
|
| 116 |
+
stats['spectral_contrast_mean'] = float(np.mean(features['spectral_contrast']))
|
| 117 |
+
stats['spectral_contrast_std'] = float(np.std(features['spectral_contrast']))
|
| 118 |
+
|
| 119 |
+
# Harmonic/percussive balance
|
| 120 |
+
stats['harmonic_ratio'] = float(features.get('harmonic_ratio', 0.5))
|
| 121 |
+
stats['percussive_ratio'] = float(features.get('percussive_ratio', 0.5))
|
| 122 |
+
|
| 123 |
+
# Tempo
|
| 124 |
+
stats['tempo'] = float(features.get('tempo', 120.0))
|
| 125 |
+
|
| 126 |
+
# Chroma energy distribution
|
| 127 |
+
if 'chroma' in features:
|
| 128 |
+
chroma_mean = np.mean(features['chroma'], axis=1)
|
| 129 |
+
stats['chroma_energy'] = chroma_mean.tolist()
|
| 130 |
+
|
| 131 |
+
# MFCC summary (timbre)
|
| 132 |
+
if 'mfcc' in features:
|
| 133 |
+
mfcc_mean = np.mean(features['mfcc'], axis=1)
|
| 134 |
+
stats['timbre_signature'] = mfcc_mean[:13].tolist() # First 13 MFCCs
|
| 135 |
+
|
| 136 |
+
return stats
|
| 137 |
+
|
| 138 |
+
def analyze_timeline_style(self, clip_paths: List[str]) -> Dict[str, any]:
|
| 139 |
+
"""
|
| 140 |
+
Analyze style characteristics of all clips on timeline
|
| 141 |
+
|
| 142 |
+
Args:
|
| 143 |
+
clip_paths: List of audio file paths from timeline
|
| 144 |
+
|
| 145 |
+
Returns:
|
| 146 |
+
Aggregate style profile
|
| 147 |
+
"""
|
| 148 |
+
if not clip_paths:
|
| 149 |
+
return {}
|
| 150 |
+
|
| 151 |
+
all_features = []
|
| 152 |
+
all_stats = []
|
| 153 |
+
|
| 154 |
+
for path in clip_paths:
|
| 155 |
+
if os.path.exists(path):
|
| 156 |
+
features = self.extract_audio_features(path)
|
| 157 |
+
if features:
|
| 158 |
+
stats = self.compute_style_statistics(features)
|
| 159 |
+
all_features.append(features)
|
| 160 |
+
all_stats.append(stats)
|
| 161 |
+
|
| 162 |
+
if not all_stats:
|
| 163 |
+
return {}
|
| 164 |
+
|
| 165 |
+
# Aggregate statistics across all clips
|
| 166 |
+
aggregate_style = {}
|
| 167 |
+
|
| 168 |
+
# Average numerical features
|
| 169 |
+
numeric_keys = [k for k in all_stats[0].keys() if isinstance(all_stats[0][k], (int, float))]
|
| 170 |
+
for key in numeric_keys:
|
| 171 |
+
values = [stats[key] for stats in all_stats if key in stats]
|
| 172 |
+
aggregate_style[key] = float(np.mean(values))
|
| 173 |
+
|
| 174 |
+
# Average chroma and timbre
|
| 175 |
+
if 'chroma_energy' in all_stats[0]:
|
| 176 |
+
chroma_arrays = [np.array(stats['chroma_energy']) for stats in all_stats if 'chroma_energy' in stats]
|
| 177 |
+
if chroma_arrays:
|
| 178 |
+
aggregate_style['chroma_energy'] = np.mean(chroma_arrays, axis=0).tolist()
|
| 179 |
+
|
| 180 |
+
if 'timbre_signature' in all_stats[0]:
|
| 181 |
+
timbre_arrays = [np.array(stats['timbre_signature']) for stats in all_stats if 'timbre_signature' in stats]
|
| 182 |
+
if timbre_arrays:
|
| 183 |
+
aggregate_style['timbre_signature'] = np.mean(timbre_arrays, axis=0).tolist()
|
| 184 |
+
|
| 185 |
+
logger.info(f"Analyzed style from {len(clip_paths)} clips")
|
| 186 |
+
return aggregate_style
|
| 187 |
+
|
| 188 |
+
def create_style_reference_audio(self, clip_paths: List[str], output_path: str) -> str:
|
| 189 |
+
"""
|
| 190 |
+
Mix all timeline clips into a single reference audio for style guidance
|
| 191 |
+
|
| 192 |
+
Args:
|
| 193 |
+
clip_paths: List of audio file paths
|
| 194 |
+
output_path: Where to save the reference audio
|
| 195 |
+
|
| 196 |
+
Returns:
|
| 197 |
+
Path to created reference audio
|
| 198 |
+
"""
|
| 199 |
+
if not clip_paths:
|
| 200 |
+
raise ValueError("No clips provided for style reference")
|
| 201 |
+
|
| 202 |
+
try:
|
| 203 |
+
# Load all clips and find max duration
|
| 204 |
+
clips_audio = []
|
| 205 |
+
max_length = 0
|
| 206 |
+
|
| 207 |
+
for path in clip_paths:
|
| 208 |
+
if os.path.exists(path):
|
| 209 |
+
audio, sr = librosa.load(path, sr=self.sample_rate)
|
| 210 |
+
clips_audio.append(audio)
|
| 211 |
+
max_length = max(max_length, len(audio))
|
| 212 |
+
|
| 213 |
+
if not clips_audio:
|
| 214 |
+
raise ValueError("No valid audio files found")
|
| 215 |
+
|
| 216 |
+
# Pad all clips to same length
|
| 217 |
+
padded_clips = []
|
| 218 |
+
for audio in clips_audio:
|
| 219 |
+
if len(audio) < max_length:
|
| 220 |
+
audio = np.pad(audio, (0, max_length - len(audio)))
|
| 221 |
+
padded_clips.append(audio)
|
| 222 |
+
|
| 223 |
+
# Mix clips (average them)
|
| 224 |
+
mixed_audio = np.mean(padded_clips, axis=0)
|
| 225 |
+
|
| 226 |
+
# Normalize
|
| 227 |
+
mixed_audio = librosa.util.normalize(mixed_audio)
|
| 228 |
+
|
| 229 |
+
# Save reference audio
|
| 230 |
+
os.makedirs(os.path.dirname(output_path), exist_ok=True)
|
| 231 |
+
sf.write(output_path, mixed_audio, self.sample_rate)
|
| 232 |
+
|
| 233 |
+
logger.info(f"Created style reference audio: {output_path}")
|
| 234 |
+
return output_path
|
| 235 |
+
|
| 236 |
+
except Exception as e:
|
| 237 |
+
logger.error(f"Failed to create style reference: {e}")
|
| 238 |
+
raise
|
| 239 |
+
|
| 240 |
+
def enhance_prompt_with_style(
|
| 241 |
+
self,
|
| 242 |
+
base_prompt: str,
|
| 243 |
+
style_profile: Dict[str, any]
|
| 244 |
+
) -> str:
|
| 245 |
+
"""
|
| 246 |
+
Enhance generation prompt with style characteristics
|
| 247 |
+
|
| 248 |
+
Args:
|
| 249 |
+
base_prompt: User's original prompt
|
| 250 |
+
style_profile: Style analysis from timeline
|
| 251 |
+
|
| 252 |
+
Returns:
|
| 253 |
+
Enhanced prompt
|
| 254 |
+
"""
|
| 255 |
+
if not style_profile:
|
| 256 |
+
return base_prompt
|
| 257 |
+
|
| 258 |
+
style_descriptors = []
|
| 259 |
+
|
| 260 |
+
# Tempo descriptor
|
| 261 |
+
tempo = style_profile.get('tempo', 120)
|
| 262 |
+
if tempo < 90:
|
| 263 |
+
style_descriptors.append("slow tempo")
|
| 264 |
+
elif tempo > 140:
|
| 265 |
+
style_descriptors.append("fast tempo")
|
| 266 |
+
|
| 267 |
+
# Energy/dynamics descriptor
|
| 268 |
+
rms_mean = style_profile.get('rms_mean', 0.1)
|
| 269 |
+
if rms_mean > 0.15:
|
| 270 |
+
style_descriptors.append("energetic")
|
| 271 |
+
elif rms_mean < 0.08:
|
| 272 |
+
style_descriptors.append("gentle")
|
| 273 |
+
|
| 274 |
+
# Harmonic/percussive balance
|
| 275 |
+
harmonic_ratio = style_profile.get('harmonic_ratio', 0.5)
|
| 276 |
+
percussive_ratio = style_profile.get('percussive_ratio', 0.5)
|
| 277 |
+
|
| 278 |
+
if harmonic_ratio > percussive_ratio * 1.3:
|
| 279 |
+
style_descriptors.append("melodic")
|
| 280 |
+
elif percussive_ratio > harmonic_ratio * 1.3:
|
| 281 |
+
style_descriptors.append("rhythmic")
|
| 282 |
+
|
| 283 |
+
# Spectral brightness
|
| 284 |
+
centroid_mean = style_profile.get('spectral_centroid_mean', 2000)
|
| 285 |
+
if centroid_mean > 3000:
|
| 286 |
+
style_descriptors.append("bright")
|
| 287 |
+
elif centroid_mean < 1500:
|
| 288 |
+
style_descriptors.append("warm")
|
| 289 |
+
|
| 290 |
+
# Combine with base prompt
|
| 291 |
+
if style_descriptors:
|
| 292 |
+
enhanced = f"{base_prompt}, consistent with existing style: {', '.join(style_descriptors)}"
|
| 293 |
+
logger.info(f"Enhanced prompt: {enhanced}")
|
| 294 |
+
return enhanced
|
| 295 |
+
|
| 296 |
+
return base_prompt
|
| 297 |
+
|
| 298 |
+
def get_style_guidance_for_generation(
|
| 299 |
+
self,
|
| 300 |
+
timeline_clips: List[Dict]
|
| 301 |
+
) -> Tuple[Optional[str], Dict[str, any]]:
|
| 302 |
+
"""
|
| 303 |
+
Prepare style guidance for new generation
|
| 304 |
+
|
| 305 |
+
Args:
|
| 306 |
+
timeline_clips: List of clip dictionaries from timeline
|
| 307 |
+
|
| 308 |
+
Returns:
|
| 309 |
+
Tuple of (reference_audio_path, style_profile)
|
| 310 |
+
"""
|
| 311 |
+
if not timeline_clips:
|
| 312 |
+
logger.info("No existing clips - no style guidance available")
|
| 313 |
+
return None, {}
|
| 314 |
+
|
| 315 |
+
# Get audio paths from clips
|
| 316 |
+
clip_paths = []
|
| 317 |
+
for clip in timeline_clips:
|
| 318 |
+
audio_path = clip.get('music_path') or clip.get('mixed_path') or clip.get('file_path')
|
| 319 |
+
if audio_path and os.path.exists(audio_path):
|
| 320 |
+
clip_paths.append(audio_path)
|
| 321 |
+
|
| 322 |
+
if not clip_paths:
|
| 323 |
+
return None, {}
|
| 324 |
+
|
| 325 |
+
# Analyze timeline style
|
| 326 |
+
style_profile = self.analyze_timeline_style(clip_paths)
|
| 327 |
+
|
| 328 |
+
# Create reference audio (mix of all clips)
|
| 329 |
+
try:
|
| 330 |
+
ref_dir = os.path.join('outputs', 'style_reference')
|
| 331 |
+
os.makedirs(ref_dir, exist_ok=True)
|
| 332 |
+
ref_path = os.path.join(ref_dir, 'timeline_reference.wav')
|
| 333 |
+
|
| 334 |
+
reference_audio = self.create_style_reference_audio(clip_paths, ref_path)
|
| 335 |
+
logger.info(f"Style guidance ready: {len(clip_paths)} clips analyzed")
|
| 336 |
+
return reference_audio, style_profile
|
| 337 |
+
|
| 338 |
+
except Exception as e:
|
| 339 |
+
logger.error(f"Failed to create reference audio: {e}")
|
| 340 |
+
return None, style_profile
|
backend/services/timeline_service.py
ADDED
|
@@ -0,0 +1,186 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Timeline management service
|
| 3 |
+
"""
|
| 4 |
+
import logging
|
| 5 |
+
from typing import List, Dict, Optional
|
| 6 |
+
from models.schemas import ClipPosition, TimelineClip
|
| 7 |
+
|
| 8 |
+
logger = logging.getLogger(__name__)
|
| 9 |
+
|
| 10 |
+
class TimelineService:
|
| 11 |
+
"""Service for managing timeline clips"""
|
| 12 |
+
|
| 13 |
+
def __init__(self):
|
| 14 |
+
"""Initialize timeline service"""
|
| 15 |
+
self.clips: List[TimelineClip] = []
|
| 16 |
+
logger.info("Timeline service initialized")
|
| 17 |
+
|
| 18 |
+
def add_clip(
|
| 19 |
+
self,
|
| 20 |
+
clip_id: str,
|
| 21 |
+
file_path: str,
|
| 22 |
+
duration: float,
|
| 23 |
+
position: ClipPosition
|
| 24 |
+
) -> Dict:
|
| 25 |
+
"""
|
| 26 |
+
Add a clip to the timeline
|
| 27 |
+
|
| 28 |
+
Args:
|
| 29 |
+
clip_id: Unique clip identifier
|
| 30 |
+
file_path: Path to audio file
|
| 31 |
+
duration: Clip duration in seconds
|
| 32 |
+
position: Where to place the clip
|
| 33 |
+
|
| 34 |
+
Returns:
|
| 35 |
+
Clip information with timeline position
|
| 36 |
+
"""
|
| 37 |
+
try:
|
| 38 |
+
# Calculate timeline position based on requested position
|
| 39 |
+
if position == ClipPosition.INTRO:
|
| 40 |
+
timeline_position = 0
|
| 41 |
+
start_time = 0.0
|
| 42 |
+
# Shift all existing clips
|
| 43 |
+
for clip in self.clips:
|
| 44 |
+
clip.timeline_position += 1
|
| 45 |
+
clip.start_time += duration
|
| 46 |
+
|
| 47 |
+
elif position == ClipPosition.PREVIOUS:
|
| 48 |
+
if len(self.clips) == 0:
|
| 49 |
+
timeline_position = 0
|
| 50 |
+
start_time = 0.0
|
| 51 |
+
else:
|
| 52 |
+
timeline_position = len(self.clips) - 1
|
| 53 |
+
start_time = self.clips[-1].start_time
|
| 54 |
+
# Shift last clip
|
| 55 |
+
self.clips[-1].timeline_position += 1
|
| 56 |
+
self.clips[-1].start_time += duration
|
| 57 |
+
|
| 58 |
+
elif position == ClipPosition.NEXT:
|
| 59 |
+
timeline_position = len(self.clips)
|
| 60 |
+
start_time = self.get_total_duration()
|
| 61 |
+
|
| 62 |
+
else: # OUTRO
|
| 63 |
+
timeline_position = len(self.clips)
|
| 64 |
+
start_time = self.get_total_duration()
|
| 65 |
+
|
| 66 |
+
# Create clip
|
| 67 |
+
clip = TimelineClip(
|
| 68 |
+
clip_id=clip_id,
|
| 69 |
+
file_path=file_path,
|
| 70 |
+
duration=duration,
|
| 71 |
+
timeline_position=timeline_position,
|
| 72 |
+
start_time=start_time,
|
| 73 |
+
music_path=file_path # Store as music_path for consistent access
|
| 74 |
+
)
|
| 75 |
+
|
| 76 |
+
# Insert clip at correct position
|
| 77 |
+
self.clips.insert(timeline_position, clip)
|
| 78 |
+
|
| 79 |
+
logger.info(f"Clip added: {clip_id} at position {timeline_position}")
|
| 80 |
+
|
| 81 |
+
return {
|
| 82 |
+
'clip_id': clip_id,
|
| 83 |
+
'timeline_position': timeline_position,
|
| 84 |
+
'start_time': start_time,
|
| 85 |
+
'duration': duration
|
| 86 |
+
}
|
| 87 |
+
|
| 88 |
+
except Exception as e:
|
| 89 |
+
logger.error(f"Failed to add clip: {str(e)}", exc_info=True)
|
| 90 |
+
raise
|
| 91 |
+
|
| 92 |
+
def remove_clip(self, clip_id: str):
|
| 93 |
+
"""
|
| 94 |
+
Remove a clip from timeline
|
| 95 |
+
|
| 96 |
+
Args:
|
| 97 |
+
clip_id: Clip to remove
|
| 98 |
+
"""
|
| 99 |
+
try:
|
| 100 |
+
# Find and remove clip
|
| 101 |
+
clip_index = None
|
| 102 |
+
for i, clip in enumerate(self.clips):
|
| 103 |
+
if clip.clip_id == clip_id:
|
| 104 |
+
clip_index = i
|
| 105 |
+
break
|
| 106 |
+
|
| 107 |
+
if clip_index is None:
|
| 108 |
+
raise ValueError(f"Clip not found: {clip_id}")
|
| 109 |
+
|
| 110 |
+
removed_clip = self.clips.pop(clip_index)
|
| 111 |
+
|
| 112 |
+
# Recalculate positions
|
| 113 |
+
self._recalculate_positions()
|
| 114 |
+
|
| 115 |
+
logger.info(f"Clip removed: {clip_id}")
|
| 116 |
+
|
| 117 |
+
except Exception as e:
|
| 118 |
+
logger.error(f"Failed to remove clip: {str(e)}", exc_info=True)
|
| 119 |
+
raise
|
| 120 |
+
|
| 121 |
+
def reorder_clips(self, clip_ids: List[str]):
|
| 122 |
+
"""
|
| 123 |
+
Reorder clips on timeline
|
| 124 |
+
|
| 125 |
+
Args:
|
| 126 |
+
clip_ids: New order of clip IDs
|
| 127 |
+
"""
|
| 128 |
+
try:
|
| 129 |
+
# Validate all clip IDs exist
|
| 130 |
+
existing_ids = {clip.clip_id for clip in self.clips}
|
| 131 |
+
requested_ids = set(clip_ids)
|
| 132 |
+
|
| 133 |
+
if existing_ids != requested_ids:
|
| 134 |
+
raise ValueError("Clip IDs don't match existing clips")
|
| 135 |
+
|
| 136 |
+
# Create new order
|
| 137 |
+
clip_dict = {clip.clip_id: clip for clip in self.clips}
|
| 138 |
+
self.clips = [clip_dict[cid] for cid in clip_ids]
|
| 139 |
+
|
| 140 |
+
# Recalculate positions
|
| 141 |
+
self._recalculate_positions()
|
| 142 |
+
|
| 143 |
+
logger.info("Clips reordered")
|
| 144 |
+
|
| 145 |
+
except Exception as e:
|
| 146 |
+
logger.error(f"Failed to reorder clips: {str(e)}", exc_info=True)
|
| 147 |
+
raise
|
| 148 |
+
|
| 149 |
+
def get_all_clips(self) -> List[Dict]:
|
| 150 |
+
"""Get all clips with their information"""
|
| 151 |
+
return [
|
| 152 |
+
{
|
| 153 |
+
'clip_id': clip.clip_id,
|
| 154 |
+
'file_path': clip.file_path,
|
| 155 |
+
'duration': clip.duration,
|
| 156 |
+
'timeline_position': clip.timeline_position,
|
| 157 |
+
'start_time': clip.start_time
|
| 158 |
+
}
|
| 159 |
+
for clip in self.clips
|
| 160 |
+
]
|
| 161 |
+
|
| 162 |
+
def get_clip(self, clip_id: str) -> Optional[TimelineClip]:
|
| 163 |
+
"""Get a specific clip"""
|
| 164 |
+
for clip in self.clips:
|
| 165 |
+
if clip.clip_id == clip_id:
|
| 166 |
+
return clip
|
| 167 |
+
return None
|
| 168 |
+
|
| 169 |
+
def get_total_duration(self) -> float:
|
| 170 |
+
"""Get total duration of all clips"""
|
| 171 |
+
if not self.clips:
|
| 172 |
+
return 0.0
|
| 173 |
+
return sum(clip.duration for clip in self.clips)
|
| 174 |
+
|
| 175 |
+
def clear(self):
|
| 176 |
+
"""Clear all clips from timeline"""
|
| 177 |
+
self.clips = []
|
| 178 |
+
logger.info("Timeline cleared")
|
| 179 |
+
|
| 180 |
+
def _recalculate_positions(self):
|
| 181 |
+
"""Recalculate all clip positions and start times"""
|
| 182 |
+
current_time = 0.0
|
| 183 |
+
for i, clip in enumerate(self.clips):
|
| 184 |
+
clip.timeline_position = i
|
| 185 |
+
clip.start_time = current_time
|
| 186 |
+
current_time += clip.duration
|
backend/start_with_env.py
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Wrapper script to start the backend with required environment variables.
|
| 3 |
+
This is used by the PowerShell launcher to ensure environment variables are set.
|
| 4 |
+
"""
|
| 5 |
+
import os
|
| 6 |
+
import sys
|
| 7 |
+
import subprocess
|
| 8 |
+
from pathlib import Path
|
| 9 |
+
|
| 10 |
+
# Get project root (parent of backend directory)
|
| 11 |
+
project_root = Path(__file__).parent.parent
|
| 12 |
+
|
| 13 |
+
# Set required environment variables
|
| 14 |
+
os.environ['PHONEMIZER_ESPEAK_LIBRARY'] = str(project_root / 'external' / 'espeak-ng' / 'libespeak-ng.dll')
|
| 15 |
+
os.environ['PHONEMIZER_ESPEAK_PATH'] = str(project_root / 'external' / 'espeak-ng')
|
| 16 |
+
|
| 17 |
+
# Run the backend run.py script
|
| 18 |
+
backend_script = project_root / 'backend' / 'run.py'
|
| 19 |
+
|
| 20 |
+
# Execute run.py in the same interpreter
|
| 21 |
+
exec(open(backend_script).read())
|
backend/utils/__init__.py
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Utilities package"""
|
| 2 |
+
from .logger import setup_logger
|
| 3 |
+
from .validators import validate_generation_params, validate_clip_data
|
| 4 |
+
|
| 5 |
+
__all__ = ['setup_logger', 'validate_generation_params', 'validate_clip_data']
|
backend/utils/amd_gpu.py
ADDED
|
@@ -0,0 +1,96 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
AMD GPU Detection and Configuration
|
| 3 |
+
Enables DirectML support for AMD GPUs (including Vega 8)
|
| 4 |
+
Note: DirectML may not be compatible with Python 3.13+ - CPU fallback will be used
|
| 5 |
+
"""
|
| 6 |
+
import os
|
| 7 |
+
import logging
|
| 8 |
+
import torch
|
| 9 |
+
|
| 10 |
+
logger = logging.getLogger(__name__)
|
| 11 |
+
|
| 12 |
+
def setup_amd_gpu():
|
| 13 |
+
"""
|
| 14 |
+
Configure DirectML for AMD GPU support
|
| 15 |
+
Returns device to use for model inference
|
| 16 |
+
"""
|
| 17 |
+
try:
|
| 18 |
+
# Check if torch-directml is available
|
| 19 |
+
try:
|
| 20 |
+
import torch_directml
|
| 21 |
+
|
| 22 |
+
# Get DirectML device
|
| 23 |
+
if torch_directml.is_available():
|
| 24 |
+
device = torch_directml.device()
|
| 25 |
+
logger.info(f"β
AMD GPU detected via DirectML")
|
| 26 |
+
logger.info(f"Device: {device}")
|
| 27 |
+
|
| 28 |
+
# Set default device
|
| 29 |
+
torch.set_default_device(device)
|
| 30 |
+
|
| 31 |
+
return device
|
| 32 |
+
else:
|
| 33 |
+
logger.warning("DirectML available but no compatible GPU found")
|
| 34 |
+
return torch.device("cpu")
|
| 35 |
+
except ImportError:
|
| 36 |
+
logger.warning("torch-directml not available (may not support Python 3.13+)")
|
| 37 |
+
logger.info("Using CPU mode - consider Python 3.11 for DirectML support")
|
| 38 |
+
return torch.device("cpu")
|
| 39 |
+
|
| 40 |
+
except Exception as e:
|
| 41 |
+
logger.error(f"Error setting up AMD GPU: {str(e)}")
|
| 42 |
+
return torch.device("cpu")
|
| 43 |
+
|
| 44 |
+
def get_device_info():
|
| 45 |
+
"""Get detailed information about available compute devices"""
|
| 46 |
+
info = {
|
| 47 |
+
"device": "cpu",
|
| 48 |
+
"device_name": "CPU",
|
| 49 |
+
"directml_available": False,
|
| 50 |
+
"cuda_available": torch.cuda.is_available()
|
| 51 |
+
}
|
| 52 |
+
|
| 53 |
+
try:
|
| 54 |
+
try:
|
| 55 |
+
import torch_directml
|
| 56 |
+
|
| 57 |
+
if torch_directml.is_available():
|
| 58 |
+
info["directml_available"] = True
|
| 59 |
+
info["device"] = "directml"
|
| 60 |
+
info["device_name"] = "AMD GPU (DirectML)"
|
| 61 |
+
|
| 62 |
+
# Get device
|
| 63 |
+
device = torch_directml.device()
|
| 64 |
+
info["device_object"] = device
|
| 65 |
+
except ImportError:
|
| 66 |
+
logger.info("DirectML not available - Python 3.13+ may not support torch-directml")
|
| 67 |
+
logger.info("For AMD GPU support, consider using Python 3.11 with torch-directml")
|
| 68 |
+
|
| 69 |
+
except Exception as e:
|
| 70 |
+
logger.error(f"Error getting device info: {str(e)}")
|
| 71 |
+
|
| 72 |
+
return info
|
| 73 |
+
|
| 74 |
+
def optimize_for_amd():
|
| 75 |
+
"""Apply optimizations for AMD GPU inference"""
|
| 76 |
+
try:
|
| 77 |
+
# Disable CUDA if present (prefer DirectML for AMD)
|
| 78 |
+
os.environ["CUDA_VISIBLE_DEVICES"] = ""
|
| 79 |
+
|
| 80 |
+
# Set DirectML memory management
|
| 81 |
+
os.environ["PYTORCH_DIRECTML_FORCE_FP32_OPS"] = "0" # Allow FP16
|
| 82 |
+
|
| 83 |
+
# Enable TensorFloat-32 for better performance
|
| 84 |
+
torch.backends.cudnn.allow_tf32 = True
|
| 85 |
+
torch.backends.cuda.matmul.allow_tf32 = True
|
| 86 |
+
|
| 87 |
+
logger.info("β
AMD GPU optimizations applied")
|
| 88 |
+
|
| 89 |
+
except Exception as e:
|
| 90 |
+
logger.error(f"Error applying AMD optimizations: {str(e)}")
|
| 91 |
+
|
| 92 |
+
# Auto-configure on module import
|
| 93 |
+
DEFAULT_DEVICE = setup_amd_gpu()
|
| 94 |
+
optimize_for_amd()
|
| 95 |
+
|
| 96 |
+
logger.info(f"Default compute device: {DEFAULT_DEVICE}")
|
backend/utils/logger.py
ADDED
|
@@ -0,0 +1,57 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Logging configuration
|
| 3 |
+
"""
|
| 4 |
+
import logging
|
| 5 |
+
import os
|
| 6 |
+
from logging.handlers import RotatingFileHandler
|
| 7 |
+
|
| 8 |
+
def setup_logger(app):
|
| 9 |
+
"""
|
| 10 |
+
Configure application logging
|
| 11 |
+
|
| 12 |
+
Args:
|
| 13 |
+
app: Flask application instance
|
| 14 |
+
"""
|
| 15 |
+
# Create logs directory
|
| 16 |
+
log_dir = 'logs'
|
| 17 |
+
os.makedirs(log_dir, exist_ok=True)
|
| 18 |
+
|
| 19 |
+
# Set log level
|
| 20 |
+
log_level = getattr(logging, app.config.get('LOG_LEVEL', 'INFO'))
|
| 21 |
+
|
| 22 |
+
# Configure root logger
|
| 23 |
+
logging.basicConfig(
|
| 24 |
+
level=log_level,
|
| 25 |
+
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
|
| 26 |
+
datefmt='%Y-%m-%d %H:%M:%S'
|
| 27 |
+
)
|
| 28 |
+
|
| 29 |
+
# File handler with rotation
|
| 30 |
+
log_file = app.config.get('LOG_FILE', os.path.join(log_dir, 'app.log'))
|
| 31 |
+
file_handler = RotatingFileHandler(
|
| 32 |
+
log_file,
|
| 33 |
+
maxBytes=10 * 1024 * 1024, # 10MB
|
| 34 |
+
backupCount=5
|
| 35 |
+
)
|
| 36 |
+
file_handler.setLevel(log_level)
|
| 37 |
+
file_handler.setFormatter(logging.Formatter(
|
| 38 |
+
'%(asctime)s - %(name)s - %(levelname)s - %(message)s'
|
| 39 |
+
))
|
| 40 |
+
|
| 41 |
+
# Console handler
|
| 42 |
+
console_handler = logging.StreamHandler()
|
| 43 |
+
console_handler.setLevel(log_level)
|
| 44 |
+
console_handler.setFormatter(logging.Formatter(
|
| 45 |
+
'%(asctime)s - %(levelname)s - %(message)s'
|
| 46 |
+
))
|
| 47 |
+
|
| 48 |
+
# Add handlers to app logger
|
| 49 |
+
app.logger.addHandler(file_handler)
|
| 50 |
+
app.logger.addHandler(console_handler)
|
| 51 |
+
app.logger.setLevel(log_level)
|
| 52 |
+
|
| 53 |
+
# Set library log levels
|
| 54 |
+
logging.getLogger('werkzeug').setLevel(logging.WARNING)
|
| 55 |
+
logging.getLogger('urllib3').setLevel(logging.WARNING)
|
| 56 |
+
|
| 57 |
+
app.logger.info("Logging configured successfully")
|
backend/utils/prompt_analyzer.py
ADDED
|
@@ -0,0 +1,291 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Prompt analysis utility for extracting music attributes
|
| 3 |
+
Analyzes user prompts to extract genre, style, BPM, mood, and other musical attributes
|
| 4 |
+
"""
|
| 5 |
+
import re
|
| 6 |
+
import logging
|
| 7 |
+
from typing import Dict, Optional, List, Any
|
| 8 |
+
|
| 9 |
+
logger = logging.getLogger(__name__)
|
| 10 |
+
|
| 11 |
+
class PromptAnalyzer:
|
| 12 |
+
"""Analyzes music prompts to extract musical attributes"""
|
| 13 |
+
|
| 14 |
+
# Genre/style keywords
|
| 15 |
+
GENRES = {
|
| 16 |
+
'pop': ['pop', 'mainstream', 'catchy', 'radio-friendly'],
|
| 17 |
+
'rock': ['rock', 'guitar', 'electric', 'distortion', 'power chords'],
|
| 18 |
+
'hip-hop': ['hip-hop', 'rap', 'trap', 'beats', 'rhymes', 'flow'],
|
| 19 |
+
'electronic': ['edm', 'electronic', 'synth', 'techno', 'house', 'trance'],
|
| 20 |
+
'jazz': ['jazz', 'swing', 'bebop', 'saxophone', 'improvisation'],
|
| 21 |
+
'classical': ['classical', 'orchestra', 'symphony', 'piano', 'strings'],
|
| 22 |
+
'country': ['country', 'folk', 'acoustic', 'banjo', 'bluegrass'],
|
| 23 |
+
'r&b': ['r&b', 'soul', 'rnb', 'rhythm and blues', 'groove'],
|
| 24 |
+
'metal': ['metal', 'heavy', 'headbanging', 'aggressive', 'brutal'],
|
| 25 |
+
'indie': ['indie', 'alternative', 'underground', 'experimental'],
|
| 26 |
+
'reggae': ['reggae', 'ska', 'dub', 'jamaican', 'offbeat'],
|
| 27 |
+
'blues': ['blues', 'twelve bar', 'soulful', 'melancholic']
|
| 28 |
+
}
|
| 29 |
+
|
| 30 |
+
# BPM keywords and ranges
|
| 31 |
+
BPM_KEYWORDS = {
|
| 32 |
+
'slow': (60, 80),
|
| 33 |
+
'ballad': (60, 80),
|
| 34 |
+
'moderate': (80, 120),
|
| 35 |
+
'medium': (90, 110),
|
| 36 |
+
'upbeat': (120, 140),
|
| 37 |
+
'fast': (140, 180),
|
| 38 |
+
'energetic': (130, 150),
|
| 39 |
+
'intense': (150, 180)
|
| 40 |
+
}
|
| 41 |
+
|
| 42 |
+
# Mood/emotion keywords
|
| 43 |
+
MOODS = {
|
| 44 |
+
'happy': ['happy', 'joyful', 'cheerful', 'uplifting', 'bright'],
|
| 45 |
+
'sad': ['sad', 'melancholic', 'sorrowful', 'emotional', 'tearful'],
|
| 46 |
+
'energetic': ['energetic', 'powerful', 'dynamic', 'intense', 'vigorous'],
|
| 47 |
+
'calm': ['calm', 'peaceful', 'relaxing', 'soothing', 'tranquil'],
|
| 48 |
+
'dark': ['dark', 'ominous', 'mysterious', 'sinister', 'haunting'],
|
| 49 |
+
'romantic': ['romantic', 'love', 'passionate', 'tender', 'intimate'],
|
| 50 |
+
'angry': ['angry', 'aggressive', 'fierce', 'furious', 'rage'],
|
| 51 |
+
'nostalgic': ['nostalgic', 'reminiscent', 'wistful', 'longing']
|
| 52 |
+
}
|
| 53 |
+
|
| 54 |
+
# Instrumental keywords
|
| 55 |
+
INSTRUMENTS = [
|
| 56 |
+
'guitar', 'piano', 'drums', 'bass', 'synth', 'violin', 'saxophone',
|
| 57 |
+
'trumpet', 'flute', 'organ', 'keyboard', 'strings', 'brass', 'percussion'
|
| 58 |
+
]
|
| 59 |
+
|
| 60 |
+
@classmethod
|
| 61 |
+
def analyze(cls, prompt: str) -> Dict[str, Any]:
|
| 62 |
+
"""
|
| 63 |
+
Analyze a music prompt to extract attributes
|
| 64 |
+
|
| 65 |
+
Args:
|
| 66 |
+
prompt: User's music description
|
| 67 |
+
|
| 68 |
+
Returns:
|
| 69 |
+
Dictionary containing:
|
| 70 |
+
- genre: Detected genre(s)
|
| 71 |
+
- bpm: Estimated BPM or range
|
| 72 |
+
- mood: Detected mood(s)
|
| 73 |
+
- instruments: Mentioned instruments
|
| 74 |
+
- style_tags: Additional style descriptors
|
| 75 |
+
- analysis_text: Formatted analysis for AI models
|
| 76 |
+
"""
|
| 77 |
+
if not prompt:
|
| 78 |
+
return cls._get_default_analysis()
|
| 79 |
+
|
| 80 |
+
prompt_lower = prompt.lower()
|
| 81 |
+
|
| 82 |
+
# Detect genre
|
| 83 |
+
detected_genres = cls._detect_genres(prompt_lower)
|
| 84 |
+
|
| 85 |
+
# Detect BPM
|
| 86 |
+
bpm_info = cls._detect_bpm(prompt_lower)
|
| 87 |
+
|
| 88 |
+
# Detect mood
|
| 89 |
+
detected_moods = cls._detect_moods(prompt_lower)
|
| 90 |
+
|
| 91 |
+
# Detect instruments
|
| 92 |
+
detected_instruments = cls._detect_instruments(prompt_lower)
|
| 93 |
+
|
| 94 |
+
# Extract additional style tags
|
| 95 |
+
style_tags = cls._extract_style_tags(prompt_lower)
|
| 96 |
+
|
| 97 |
+
# Build structured analysis
|
| 98 |
+
analysis = {
|
| 99 |
+
'genre': detected_genres[0] if detected_genres else 'pop',
|
| 100 |
+
'genres': detected_genres,
|
| 101 |
+
'bpm': bpm_info['bpm'],
|
| 102 |
+
'bpm_range': bpm_info['range'],
|
| 103 |
+
'mood': detected_moods[0] if detected_moods else 'neutral',
|
| 104 |
+
'moods': detected_moods,
|
| 105 |
+
'instruments': detected_instruments,
|
| 106 |
+
'style_tags': style_tags,
|
| 107 |
+
'has_vocals': cls._should_have_vocals(prompt_lower),
|
| 108 |
+
'analysis_text': cls._format_analysis_text(
|
| 109 |
+
detected_genres, bpm_info, detected_moods, detected_instruments
|
| 110 |
+
)
|
| 111 |
+
}
|
| 112 |
+
|
| 113 |
+
logger.info(f"Prompt analysis: genre={analysis['genre']}, bpm={analysis['bpm']}, mood={analysis['mood']}")
|
| 114 |
+
|
| 115 |
+
return analysis
|
| 116 |
+
|
| 117 |
+
@classmethod
|
| 118 |
+
def _detect_genres(cls, prompt: str) -> List[str]:
|
| 119 |
+
"""Detect genres from prompt"""
|
| 120 |
+
detected = []
|
| 121 |
+
for genre, keywords in cls.GENRES.items():
|
| 122 |
+
if any(keyword in prompt for keyword in keywords):
|
| 123 |
+
detected.append(genre)
|
| 124 |
+
return detected[:3] # Top 3 genres
|
| 125 |
+
|
| 126 |
+
@classmethod
|
| 127 |
+
def _detect_bpm(cls, prompt: str) -> Dict[str, Any]:
|
| 128 |
+
"""Detect BPM or BPM range from prompt"""
|
| 129 |
+
# Check for explicit BPM numbers
|
| 130 |
+
bpm_match = re.search(r'\b(\d{2,3})\s*bpm\b', prompt)
|
| 131 |
+
if bpm_match:
|
| 132 |
+
bpm_value = int(bpm_match.group(1))
|
| 133 |
+
return {
|
| 134 |
+
'bpm': bpm_value,
|
| 135 |
+
'range': (bpm_value - 5, bpm_value + 5)
|
| 136 |
+
}
|
| 137 |
+
|
| 138 |
+
# Check for BPM keywords
|
| 139 |
+
for keyword, (min_bpm, max_bpm) in cls.BPM_KEYWORDS.items():
|
| 140 |
+
if keyword in prompt:
|
| 141 |
+
return {
|
| 142 |
+
'bpm': (min_bpm + max_bpm) // 2,
|
| 143 |
+
'range': (min_bpm, max_bpm)
|
| 144 |
+
}
|
| 145 |
+
|
| 146 |
+
# Default: moderate tempo
|
| 147 |
+
return {'bpm': 120, 'range': (100, 140)}
|
| 148 |
+
|
| 149 |
+
@classmethod
|
| 150 |
+
def _detect_moods(cls, prompt: str) -> List[str]:
|
| 151 |
+
"""Detect moods from prompt"""
|
| 152 |
+
detected = []
|
| 153 |
+
for mood, keywords in cls.MOODS.items():
|
| 154 |
+
if any(keyword in prompt for keyword in keywords):
|
| 155 |
+
detected.append(mood)
|
| 156 |
+
return detected[:2] # Top 2 moods
|
| 157 |
+
|
| 158 |
+
@classmethod
|
| 159 |
+
def _detect_instruments(cls, prompt: str) -> List[str]:
|
| 160 |
+
"""Detect mentioned instruments"""
|
| 161 |
+
detected = []
|
| 162 |
+
for instrument in cls.INSTRUMENTS:
|
| 163 |
+
if instrument in prompt:
|
| 164 |
+
detected.append(instrument)
|
| 165 |
+
return detected
|
| 166 |
+
|
| 167 |
+
@classmethod
|
| 168 |
+
def _extract_style_tags(cls, prompt: str) -> List[str]:
|
| 169 |
+
"""Extract additional style descriptors"""
|
| 170 |
+
tags = []
|
| 171 |
+
style_keywords = [
|
| 172 |
+
'vintage', 'modern', 'retro', 'futuristic', 'minimal', 'complex',
|
| 173 |
+
'acoustic', 'electric', 'orchestral', 'ambient', 'rhythmic',
|
| 174 |
+
'melodic', 'harmonic', 'atmospheric', 'driving', 'groovy'
|
| 175 |
+
]
|
| 176 |
+
|
| 177 |
+
for tag in style_keywords:
|
| 178 |
+
if tag in prompt:
|
| 179 |
+
tags.append(tag)
|
| 180 |
+
|
| 181 |
+
return tags
|
| 182 |
+
|
| 183 |
+
@classmethod
|
| 184 |
+
def _should_have_vocals(cls, prompt: str) -> bool:
|
| 185 |
+
"""Determine if music should have vocals"""
|
| 186 |
+
vocal_keywords = ['vocal', 'singing', 'voice', 'lyrics', 'song', 'sung']
|
| 187 |
+
instrumental_keywords = ['instrumental', 'no vocals', 'no voice', 'without vocals']
|
| 188 |
+
|
| 189 |
+
has_vocal_mention = any(keyword in prompt for keyword in vocal_keywords)
|
| 190 |
+
has_instrumental_mention = any(keyword in prompt for keyword in instrumental_keywords)
|
| 191 |
+
|
| 192 |
+
# Default to vocals unless explicitly instrumental
|
| 193 |
+
if has_instrumental_mention:
|
| 194 |
+
return False
|
| 195 |
+
|
| 196 |
+
return True # Default to vocals
|
| 197 |
+
|
| 198 |
+
@classmethod
|
| 199 |
+
def _format_analysis_text(
|
| 200 |
+
cls,
|
| 201 |
+
genres: List[str],
|
| 202 |
+
bpm_info: Dict,
|
| 203 |
+
moods: List[str],
|
| 204 |
+
instruments: List[str]
|
| 205 |
+
) -> str:
|
| 206 |
+
"""Format analysis into text for AI model context"""
|
| 207 |
+
parts = []
|
| 208 |
+
|
| 209 |
+
if genres:
|
| 210 |
+
parts.append(f"Genre: {', '.join(genres)}")
|
| 211 |
+
|
| 212 |
+
if bpm_info.get('bpm'):
|
| 213 |
+
parts.append(f"BPM: {bpm_info['bpm']}")
|
| 214 |
+
|
| 215 |
+
if moods:
|
| 216 |
+
parts.append(f"Mood: {', '.join(moods)}")
|
| 217 |
+
|
| 218 |
+
if instruments:
|
| 219 |
+
parts.append(f"Instruments: {', '.join(instruments)}")
|
| 220 |
+
|
| 221 |
+
return '; '.join(parts) if parts else "General music"
|
| 222 |
+
|
| 223 |
+
@classmethod
|
| 224 |
+
def _get_default_analysis(cls) -> Dict[str, Any]:
|
| 225 |
+
"""Return default analysis when prompt is empty"""
|
| 226 |
+
return {
|
| 227 |
+
'genre': 'pop',
|
| 228 |
+
'genres': ['pop'],
|
| 229 |
+
'bpm': 120,
|
| 230 |
+
'bpm_range': (100, 140),
|
| 231 |
+
'mood': 'neutral',
|
| 232 |
+
'moods': [],
|
| 233 |
+
'instruments': [],
|
| 234 |
+
'style_tags': [],
|
| 235 |
+
'has_vocals': True,
|
| 236 |
+
'analysis_text': 'General pop music at moderate tempo'
|
| 237 |
+
}
|
| 238 |
+
|
| 239 |
+
@classmethod
|
| 240 |
+
def format_for_diffrhythm(cls, prompt: str, lyrics: Optional[str] = None, analysis: Optional[Dict] = None) -> str:
|
| 241 |
+
"""
|
| 242 |
+
Format prompt for DiffRhythm model
|
| 243 |
+
|
| 244 |
+
Args:
|
| 245 |
+
prompt: Original user prompt
|
| 246 |
+
lyrics: Optional lyrics
|
| 247 |
+
analysis: Optional pre-computed analysis
|
| 248 |
+
|
| 249 |
+
Returns:
|
| 250 |
+
Formatted prompt for DiffRhythm
|
| 251 |
+
"""
|
| 252 |
+
if analysis is None:
|
| 253 |
+
analysis = cls.analyze(prompt)
|
| 254 |
+
|
| 255 |
+
parts = [prompt]
|
| 256 |
+
|
| 257 |
+
# Add analysis context
|
| 258 |
+
if analysis.get('analysis_text'):
|
| 259 |
+
parts.append(f"[{analysis['analysis_text']}]")
|
| 260 |
+
|
| 261 |
+
# Add lyrics if provided
|
| 262 |
+
if lyrics:
|
| 263 |
+
parts.append(f"Lyrics: {lyrics}")
|
| 264 |
+
|
| 265 |
+
return ' '.join(parts)
|
| 266 |
+
|
| 267 |
+
@classmethod
|
| 268 |
+
def format_for_lyrics_generation(cls, prompt: str, analysis: Optional[Dict] = None) -> str:
|
| 269 |
+
"""
|
| 270 |
+
Format prompt for lyrics generation
|
| 271 |
+
|
| 272 |
+
Args:
|
| 273 |
+
prompt: Original user prompt
|
| 274 |
+
analysis: Optional pre-computed analysis
|
| 275 |
+
|
| 276 |
+
Returns:
|
| 277 |
+
Formatted prompt for LyricsMind
|
| 278 |
+
"""
|
| 279 |
+
if analysis is None:
|
| 280 |
+
analysis = cls.analyze(prompt)
|
| 281 |
+
|
| 282 |
+
genre = analysis.get('genre', 'pop')
|
| 283 |
+
mood = analysis.get('mood', 'neutral')
|
| 284 |
+
|
| 285 |
+
formatted = f"Write {genre} song lyrics with a {mood} mood about: {prompt}"
|
| 286 |
+
|
| 287 |
+
# Add additional context
|
| 288 |
+
if analysis.get('style_tags'):
|
| 289 |
+
formatted += f" (Style: {', '.join(analysis['style_tags'][:2])})"
|
| 290 |
+
|
| 291 |
+
return formatted
|
backend/utils/validators.py
ADDED
|
@@ -0,0 +1,64 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Request validation utilities
|
| 3 |
+
"""
|
| 4 |
+
from typing import Dict, Optional
|
| 5 |
+
|
| 6 |
+
def validate_generation_params(data: Dict) -> Optional[str]:
|
| 7 |
+
"""
|
| 8 |
+
Validate music generation parameters
|
| 9 |
+
|
| 10 |
+
Args:
|
| 11 |
+
data: Request data dictionary
|
| 12 |
+
|
| 13 |
+
Returns:
|
| 14 |
+
Error message if validation fails, None otherwise
|
| 15 |
+
"""
|
| 16 |
+
if not data:
|
| 17 |
+
return "Request body is required"
|
| 18 |
+
|
| 19 |
+
if 'prompt' not in data:
|
| 20 |
+
return "Missing required field: prompt"
|
| 21 |
+
|
| 22 |
+
if not data['prompt'] or not data['prompt'].strip():
|
| 23 |
+
return "Prompt cannot be empty"
|
| 24 |
+
|
| 25 |
+
if 'duration' in data:
|
| 26 |
+
duration = data['duration']
|
| 27 |
+
if not isinstance(duration, (int, float)):
|
| 28 |
+
return "Duration must be a number"
|
| 29 |
+
if duration < 10 or duration > 120:
|
| 30 |
+
return "Duration must be between 10 and 120 seconds"
|
| 31 |
+
|
| 32 |
+
if 'use_vocals' in data:
|
| 33 |
+
if not isinstance(data['use_vocals'], bool):
|
| 34 |
+
return "use_vocals must be a boolean"
|
| 35 |
+
|
| 36 |
+
if data['use_vocals'] and not data.get('lyrics'):
|
| 37 |
+
return "Lyrics are required when use_vocals is true"
|
| 38 |
+
|
| 39 |
+
return None
|
| 40 |
+
|
| 41 |
+
def validate_clip_data(data: Dict) -> Optional[str]:
|
| 42 |
+
"""
|
| 43 |
+
Validate timeline clip data
|
| 44 |
+
|
| 45 |
+
Args:
|
| 46 |
+
data: Clip data dictionary
|
| 47 |
+
|
| 48 |
+
Returns:
|
| 49 |
+
Error message if validation fails, None otherwise
|
| 50 |
+
"""
|
| 51 |
+
required_fields = ['clip_id', 'file_path', 'duration', 'position']
|
| 52 |
+
|
| 53 |
+
for field in required_fields:
|
| 54 |
+
if field not in data:
|
| 55 |
+
return f"Missing required field: {field}"
|
| 56 |
+
|
| 57 |
+
if not isinstance(data['duration'], (int, float)) or data['duration'] <= 0:
|
| 58 |
+
return "Duration must be a positive number"
|
| 59 |
+
|
| 60 |
+
valid_positions = ['intro', 'previous', 'next', 'outro']
|
| 61 |
+
if data['position'] not in valid_positions:
|
| 62 |
+
return f"Invalid position. Must be one of: {', '.join(valid_positions)}"
|
| 63 |
+
|
| 64 |
+
return None
|
hf_config.py
ADDED
|
@@ -0,0 +1,30 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Configuration for HuggingFace Spaces deployment
|
| 3 |
+
Handles espeak-ng and model paths for cloud environment
|
| 4 |
+
"""
|
| 5 |
+
import os
|
| 6 |
+
from pathlib import Path
|
| 7 |
+
|
| 8 |
+
# Detect if running on HuggingFace Spaces
|
| 9 |
+
IS_SPACES = os.getenv("SPACE_ID") is not None
|
| 10 |
+
|
| 11 |
+
# Configure espeak-ng for HuggingFace Spaces
|
| 12 |
+
if IS_SPACES:
|
| 13 |
+
# On Spaces, espeak-ng is installed via packages.txt
|
| 14 |
+
# It's available system-wide
|
| 15 |
+
if os.path.exists("/usr/bin/espeak-ng"):
|
| 16 |
+
os.environ["PHONEMIZER_ESPEAK_PATH"] = "/usr/bin/espeak-ng"
|
| 17 |
+
if os.path.exists("/usr/lib/x86_64-linux-gnu/libespeak-ng.so"):
|
| 18 |
+
os.environ["PHONEMIZER_ESPEAK_LIBRARY"] = "/usr/lib/x86_64-linux-gnu/libespeak-ng.so"
|
| 19 |
+
elif os.path.exists("/usr/lib/libespeak-ng.so"):
|
| 20 |
+
os.environ["PHONEMIZER_ESPEAK_LIBRARY"] = "/usr/lib/libespeak-ng.so"
|
| 21 |
+
else:
|
| 22 |
+
# Local development - use bundled espeak-ng
|
| 23 |
+
espeak_path = Path(__file__).parent.parent / "external" / "espeak-ng"
|
| 24 |
+
if espeak_path.exists():
|
| 25 |
+
os.environ["PHONEMIZER_ESPEAK_LIBRARY"] = str(espeak_path / "libespeak-ng.dll")
|
| 26 |
+
os.environ["PHONEMIZER_ESPEAK_PATH"] = str(espeak_path)
|
| 27 |
+
|
| 28 |
+
print(f"π§ Environment: {'HuggingFace Spaces' if IS_SPACES else 'Local'}")
|
| 29 |
+
print(f"π PHONEMIZER_ESPEAK_PATH: {os.getenv('PHONEMIZER_ESPEAK_PATH', 'Not set')}")
|
| 30 |
+
print(f"π PHONEMIZER_ESPEAK_LIBRARY: {os.getenv('PHONEMIZER_ESPEAK_LIBRARY', 'Not set')}")
|
packages.txt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
espeak-ng
|
| 2 |
+
ffmpeg
|
| 3 |
+
libsndfile1
|
pre_startup.sh
ADDED
|
@@ -0,0 +1,42 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/bin/bash
|
| 2 |
+
|
| 3 |
+
# Pre-startup script for HuggingFace Spaces
|
| 4 |
+
# This runs before the main application
|
| 5 |
+
|
| 6 |
+
echo "π Initializing Music Generation Studio..."
|
| 7 |
+
|
| 8 |
+
# Verify espeak-ng installation
|
| 9 |
+
if command -v espeak-ng &> /dev/null; then
|
| 10 |
+
echo "β
espeak-ng is installed"
|
| 11 |
+
espeak-ng --version
|
| 12 |
+
else
|
| 13 |
+
echo "β espeak-ng not found"
|
| 14 |
+
exit 1
|
| 15 |
+
fi
|
| 16 |
+
|
| 17 |
+
# Verify ffmpeg
|
| 18 |
+
if command -v ffmpeg &> /dev/null; then
|
| 19 |
+
echo "β
ffmpeg is installed"
|
| 20 |
+
ffmpeg -version | head -1
|
| 21 |
+
else
|
| 22 |
+
echo "β ffmpeg not found"
|
| 23 |
+
fi
|
| 24 |
+
|
| 25 |
+
# Create necessary directories
|
| 26 |
+
mkdir -p outputs/music
|
| 27 |
+
mkdir -p outputs/mixed
|
| 28 |
+
mkdir -p models
|
| 29 |
+
mkdir -p logs
|
| 30 |
+
|
| 31 |
+
echo "β
Directories created"
|
| 32 |
+
|
| 33 |
+
# Check Python version
|
| 34 |
+
python --version
|
| 35 |
+
|
| 36 |
+
# Verify key dependencies
|
| 37 |
+
echo "π¦ Verifying Python packages..."
|
| 38 |
+
python -c "import torch; print(f'β
PyTorch {torch.__version__}')" || echo "β PyTorch not found"
|
| 39 |
+
python -c "import gradio; print(f'β
Gradio {gradio.__version__}')" || echo "β Gradio not found"
|
| 40 |
+
python -c "import phonemizer; print('β
phonemizer OK')" || echo "β phonemizer not found"
|
| 41 |
+
|
| 42 |
+
echo "β
Pre-startup checks complete"
|
requirements.txt
ADDED
|
@@ -0,0 +1,46 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Core dependencies for HuggingFace Spaces deployment
|
| 2 |
+
gradio==4.44.0
|
| 3 |
+
numpy>=1.24.0,<2.0.0
|
| 4 |
+
scipy>=1.10.0
|
| 5 |
+
librosa>=0.10.0
|
| 6 |
+
soundfile>=0.12.0
|
| 7 |
+
pydantic>=2.0.0
|
| 8 |
+
pyyaml>=6.0
|
| 9 |
+
|
| 10 |
+
# PyTorch - CPU mode for HuggingFace Spaces
|
| 11 |
+
torch>=2.4.0,<2.5.0
|
| 12 |
+
torchaudio>=2.4.0,<2.5.0
|
| 13 |
+
|
| 14 |
+
# DiffRhythm2 dependencies
|
| 15 |
+
torchdiffeq>=0.2.4
|
| 16 |
+
phonemizer>=3.2.0
|
| 17 |
+
muq>=0.1.0
|
| 18 |
+
jieba>=0.42.0
|
| 19 |
+
pypinyin>=0.50.0
|
| 20 |
+
cn2an>=0.5.0
|
| 21 |
+
onnxruntime>=1.15.0
|
| 22 |
+
pykakasi>=2.3.0
|
| 23 |
+
unidecode>=1.3.0
|
| 24 |
+
py3langid>=0.2.2
|
| 25 |
+
|
| 26 |
+
# AI Model dependencies
|
| 27 |
+
transformers==4.47.1
|
| 28 |
+
diffusers>=0.21.0
|
| 29 |
+
sentencepiece>=0.1.99
|
| 30 |
+
protobuf>=3.20.0,<5.0.0
|
| 31 |
+
accelerate>=0.20.0
|
| 32 |
+
einops>=0.7.0
|
| 33 |
+
omegaconf>=2.3.0
|
| 34 |
+
|
| 35 |
+
# Audio processing
|
| 36 |
+
pedalboard>=0.7.0
|
| 37 |
+
pydub>=0.25.1
|
| 38 |
+
resampy>=0.4.2
|
| 39 |
+
|
| 40 |
+
# Utilities
|
| 41 |
+
tqdm>=4.65.0
|
| 42 |
+
huggingface-hub>=0.17.0
|
| 43 |
+
safetensors>=0.3.0
|
| 44 |
+
|
| 45 |
+
# System dependencies note:
|
| 46 |
+
# espeak-ng is required by phonemizer and should be installed via packages.txt
|
setup_diffrhythm2_src.sh
ADDED
|
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/bin/bash
|
| 2 |
+
# Setup script for HuggingFace Spaces
|
| 3 |
+
# Clones DiffRhythm2 source code if not present
|
| 4 |
+
|
| 5 |
+
set -e
|
| 6 |
+
|
| 7 |
+
echo "π§ Setting up DiffRhythm2 source code..."
|
| 8 |
+
|
| 9 |
+
MODELS_DIR="models"
|
| 10 |
+
DR2_SRC_DIR="$MODELS_DIR/diffrhythm2_source"
|
| 11 |
+
|
| 12 |
+
# Create models directory
|
| 13 |
+
mkdir -p "$MODELS_DIR"
|
| 14 |
+
|
| 15 |
+
# Check if DiffRhythm2 source exists
|
| 16 |
+
if [ ! -d "$DR2_SRC_DIR" ]; then
|
| 17 |
+
echo "π₯ Cloning DiffRhythm2 source repository..."
|
| 18 |
+
git clone https://github.com/ASLP-lab/DiffRhythm2.git "$DR2_SRC_DIR"
|
| 19 |
+
echo "β
DiffRhythm2 source cloned"
|
| 20 |
+
else
|
| 21 |
+
echo "β
DiffRhythm2 source already exists"
|
| 22 |
+
fi
|
| 23 |
+
|
| 24 |
+
echo "β
Setup complete"
|