Gamahea commited on
Commit
aad9d66
Β·
1 Parent(s): 7d5476d

Deploy Music Generation Studio - 2025-12-12 16:01

Browse files
.gitignore ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ __pycache__/
2
+ *.pyc
3
+ *.pyo
4
+ .Python
5
+ *.log
6
+ models/
7
+ outputs/
8
+ logs/
9
+ .env
README.md CHANGED
@@ -1,14 +1,74 @@
1
  ---
2
- title: Lemm Test 100
3
- emoji: πŸ‘
4
- colorFrom: blue
5
- colorTo: green
6
  sdk: gradio
7
- sdk_version: 6.1.0
8
  app_file: app.py
9
  pinned: false
10
- license: apache-2.0
11
- short_description: Testing new LEMM version with new pipeline and models
12
  ---
13
 
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Music Generation Studio
3
+ emoji: 🎡
4
+ colorFrom: purple
5
+ colorTo: pink
6
  sdk: gradio
7
+ sdk_version: 4.44.0
8
  app_file: app.py
9
  pinned: false
10
+ license: mit
11
+ python_version: 3.11
12
  ---
13
 
14
+ # 🎡 Music Generation Studio
15
+
16
+ Create AI-powered music with intelligent prompt analysis and context-aware generation using DiffRhythm2 and LyricMind AI.
17
+
18
+ ## Features
19
+
20
+ - **Intelligent Music Generation**: DiffRhythm2 model for high-quality music with vocals
21
+ - **Smart Lyrics Generation**: LyricMind AI for context-aware lyric creation
22
+ - **Prompt Analysis**: Automatically detects genre, BPM, and mood from your description
23
+ - **Flexible Vocal Modes**:
24
+ - Instrumental: Pure music without vocals
25
+ - User Lyrics: Provide your own lyrics
26
+ - Auto Lyrics: AI-generated lyrics based on prompt
27
+ - **Timeline Management**: Build complete songs clip-by-clip
28
+ - **Export**: Download your creations in WAV, MP3, or FLAC formats
29
+
30
+ ## How to Use
31
+
32
+ 1. **Generate Music**:
33
+ - Enter a descriptive prompt (e.g., "energetic rock song with electric guitar at 140 BPM")
34
+ - Choose vocal mode (Instrumental, User Lyrics, or Auto Lyrics)
35
+ - Set duration (10-120 seconds)
36
+ - Click "Generate Music Clip"
37
+
38
+ 2. **Manage Timeline**:
39
+ - View all generated clips in the timeline
40
+ - Remove specific clips or clear all
41
+ - Clips are arranged sequentially
42
+
43
+ 3. **Export**:
44
+ - Enter a filename
45
+ - Choose format (WAV recommended for best quality)
46
+ - Download your complete song
47
+
48
+ ## Models
49
+
50
+ - **DiffRhythm2**: Music generation with integrated vocals ([ASLP-lab/DiffRhythm2](https://huggingface.co/ASLP-lab/DiffRhythm2))
51
+ - **MuQ-MuLan**: Music style encoding ([OpenMuQ/MuQ-MuLan-large](https://huggingface.co/OpenMuQ/MuQ-MuLan-large))
52
+
53
+ ## Performance
54
+
55
+ ⏱️ Generation time: ~2-4 minutes per 30-second clip on CPU (HuggingFace Spaces free tier)
56
+
57
+ πŸ’‘ Tip: Start with shorter durations (10-20 seconds) for faster results
58
+
59
+ ## Technical Details
60
+
61
+ - Built with Gradio and PyTorch
62
+ - Uses DiffRhythm2 for music generation with vocals
63
+ - Employs flow-matching techniques for high-quality audio synthesis
64
+ - Supports multiple languages for lyrics (English, Chinese, Japanese)
65
+
66
+ ## Credits
67
+
68
+ - DiffRhythm2 by ASLP-lab
69
+ - MuQ-MuLan by OpenMuQ
70
+ - Application interface and integration by Music Generation App Team
71
+
72
+ ## License
73
+
74
+ MIT License - See LICENSE file for details
app.py ADDED
@@ -0,0 +1,487 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Music Generation Studio - HuggingFace Spaces Deployment
3
+ Main application file for Gradio interface
4
+ """
5
+ import os
6
+ import sys
7
+ import gradio as gr
8
+ import logging
9
+ from pathlib import Path
10
+ import shutil
11
+ import subprocess
12
+
13
+ # Run DiffRhythm2 source setup if needed
14
+ setup_script = Path(__file__).parent / "setup_diffrhythm2_src.sh"
15
+ if setup_script.exists():
16
+ try:
17
+ subprocess.run(["bash", str(setup_script)], check=True)
18
+ except Exception as e:
19
+ print(f"Warning: Failed to run setup script: {e}")
20
+
21
+ # Configure environment for HuggingFace Spaces (espeak-ng paths, etc.)
22
+ import hf_config
23
+
24
+ # Setup paths for HuggingFace Spaces
25
+ SPACE_DIR = Path(__file__).parent
26
+ sys.path.insert(0, str(SPACE_DIR / 'backend'))
27
+
28
+ # Configure logging
29
+ logging.basicConfig(
30
+ level=logging.INFO,
31
+ format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
32
+ )
33
+ logger = logging.getLogger(__name__)
34
+
35
+ # Import services
36
+ try:
37
+ from services.diffrhythm_service import DiffRhythmService
38
+ from services.lyricmind_service import LyricMindService
39
+ from services.timeline_service import TimelineService
40
+ from services.export_service import ExportService
41
+ from config.settings import Config
42
+ from utils.prompt_analyzer import PromptAnalyzer
43
+ except ImportError as e:
44
+ logger.error(f"Import error: {e}")
45
+ raise
46
+
47
+ # Initialize configuration
48
+ config = Config()
49
+
50
+ # Create necessary directories
51
+ os.makedirs("outputs", exist_ok=True)
52
+ os.makedirs("outputs/music", exist_ok=True)
53
+ os.makedirs("outputs/mixed", exist_ok=True)
54
+ os.makedirs("models", exist_ok=True)
55
+ os.makedirs("logs", exist_ok=True)
56
+
57
+ # Initialize services
58
+ timeline_service = TimelineService()
59
+ export_service = ExportService()
60
+
61
+ # Lazy-load AI services (heavy models)
62
+ diffrhythm_service = None
63
+ lyricmind_service = None
64
+
65
+ def get_diffrhythm_service():
66
+ """Lazy load DiffRhythm service"""
67
+ global diffrhythm_service
68
+ if diffrhythm_service is None:
69
+ logger.info("Loading DiffRhythm2 model...")
70
+ diffrhythm_service = DiffRhythmService(model_path=config.DIFFRHYTHM_MODEL_PATH)
71
+ logger.info("DiffRhythm2 model loaded")
72
+ return diffrhythm_service
73
+
74
+ def get_lyricmind_service():
75
+ """Lazy load LyricMind service"""
76
+ global lyricmind_service
77
+ if lyricmind_service is None:
78
+ logger.info("Loading LyricMind model...")
79
+ lyricmind_service = LyricMindService(model_path=config.LYRICMIND_MODEL_PATH)
80
+ logger.info("LyricMind model loaded")
81
+ return lyricmind_service
82
+
83
+ def generate_lyrics(prompt: str, duration: int, progress=gr.Progress()):
84
+ """Generate lyrics from prompt using analysis"""
85
+ try:
86
+ if not prompt or not prompt.strip():
87
+ return "❌ Please enter a prompt"
88
+
89
+ progress(0, desc="πŸ” Analyzing prompt...")
90
+ logger.info(f"Generating lyrics for: {prompt}")
91
+
92
+ # Analyze prompt
93
+ analysis = PromptAnalyzer.analyze(prompt)
94
+ genre = analysis.get('genres', ['general'])[0] if analysis.get('genres') else 'general'
95
+ mood = analysis.get('mood', 'unknown')
96
+
97
+ logger.info(f"Analysis - Genre: {genre}, Mood: {mood}")
98
+
99
+ progress(0.3, desc=f"✍️ Generating {genre} lyrics...")
100
+
101
+ service = get_lyricmind_service()
102
+ lyrics = service.generate(
103
+ prompt=prompt,
104
+ duration=duration,
105
+ prompt_analysis=analysis
106
+ )
107
+
108
+ progress(1.0, desc="βœ… Lyrics generated!")
109
+ return lyrics
110
+
111
+ except Exception as e:
112
+ logger.error(f"Error generating lyrics: {e}", exc_info=True)
113
+ return f"❌ Error: {str(e)}"
114
+
115
+ def generate_music(prompt: str, lyrics: str, lyrics_mode: str, duration: int, position: str, progress=gr.Progress()):
116
+ """Generate music clip and add to timeline"""
117
+ try:
118
+ if not prompt or not prompt.strip():
119
+ return "❌ Please enter a music prompt", get_timeline_display(), None
120
+
121
+ # Estimate time (CPU on HF Spaces)
122
+ est_time = int(duration * 4) # Conservative estimate for CPU
123
+
124
+ progress(0, desc=f"πŸ” Analyzing prompt... (Est. {est_time}s)")
125
+ logger.info(f"Generating music: {prompt}, mode={lyrics_mode}, duration={duration}s")
126
+
127
+ # Analyze prompt
128
+ analysis = PromptAnalyzer.analyze(prompt)
129
+ genre = analysis.get('genres', ['general'])[0] if analysis.get('genres') else 'general'
130
+ bpm = analysis.get('bpm', 120)
131
+ mood = analysis.get('mood', 'neutral')
132
+
133
+ logger.info(f"Analysis - Genre: {genre}, BPM: {bpm}, Mood: {mood}")
134
+
135
+ # Determine lyrics based on mode
136
+ lyrics_to_use = None
137
+
138
+ if lyrics_mode == "Instrumental":
139
+ logger.info("Generating instrumental (no vocals)")
140
+ progress(0.1, desc=f"🎹 Preparing instrumental generation... ({est_time}s)")
141
+
142
+ elif lyrics_mode == "User Lyrics":
143
+ if not lyrics or not lyrics.strip():
144
+ return "❌ Please enter lyrics or switch mode", get_timeline_display(), None
145
+ lyrics_to_use = lyrics.strip()
146
+ logger.info("Using user-provided lyrics")
147
+ progress(0.1, desc=f"🎀 Preparing vocal generation... ({est_time}s)")
148
+
149
+ elif lyrics_mode == "Auto Lyrics":
150
+ if lyrics and lyrics.strip():
151
+ lyrics_to_use = lyrics.strip()
152
+ logger.info("Using existing lyrics from textbox")
153
+ progress(0.1, desc=f"🎀 Using provided lyrics... ({est_time}s)")
154
+ else:
155
+ progress(0.1, desc="✍️ Generating lyrics...")
156
+ logger.info("Auto-generating lyrics...")
157
+ lyric_service = get_lyricmind_service()
158
+ lyrics_to_use = lyric_service.generate(
159
+ prompt=prompt,
160
+ duration=duration,
161
+ prompt_analysis=analysis
162
+ )
163
+ logger.info(f"Generated {len(lyrics_to_use)} characters of lyrics")
164
+ progress(0.25, desc=f"🎡 Lyrics ready, generating music... ({est_time}s)")
165
+
166
+ # Generate music
167
+ progress(0.3, desc=f"🎼 Generating {genre} at {bpm} BPM... ({est_time}s)")
168
+ service = get_diffrhythm_service()
169
+
170
+ final_path = service.generate(
171
+ prompt=prompt,
172
+ duration=duration,
173
+ lyrics=lyrics_to_use
174
+ )
175
+
176
+ # Add to timeline
177
+ progress(0.9, desc="πŸ“Š Adding to timeline...")
178
+ clip_id = os.path.basename(final_path).split('.')[0]
179
+
180
+ from models.schemas import ClipPosition
181
+ clip_info = timeline_service.add_clip(
182
+ clip_id=clip_id,
183
+ file_path=final_path,
184
+ duration=float(duration),
185
+ position=ClipPosition(position)
186
+ )
187
+
188
+ logger.info(f"Music added to timeline at position {clip_info['timeline_position']}")
189
+
190
+ # Build status message
191
+ progress(1.0, desc="βœ… Complete!")
192
+ status_msg = f"βœ… Music generated successfully!\n"
193
+ status_msg += f"🎸 Genre: {genre} | πŸ₯ BPM: {bpm} | 🎭 Mood: {mood}\n"
194
+ status_msg += f"🎀 Mode: {lyrics_mode} | πŸ“ Position: {position}\n"
195
+
196
+ if lyrics_mode == "Auto Lyrics" and lyrics_to_use and not lyrics:
197
+ status_msg += "✍️ (Lyrics auto-generated)"
198
+
199
+ return status_msg, get_timeline_display(), final_path
200
+
201
+ except Exception as e:
202
+ logger.error(f"Error generating music: {e}", exc_info=True)
203
+ return f"❌ Error: {str(e)}", get_timeline_display(), None
204
+
205
+ def get_timeline_display():
206
+ """Get timeline clips as formatted text"""
207
+ clips = timeline_service.get_all_clips()
208
+
209
+ if not clips:
210
+ return "πŸ“­ Timeline is empty. Generate clips to get started!"
211
+
212
+ total_duration = timeline_service.get_total_duration()
213
+
214
+ display = f"**πŸ“Š Timeline ({len(clips)} clips, {format_duration(total_duration)} total)**\n\n"
215
+
216
+ for i, clip in enumerate(clips, 1):
217
+ display += f"**{i}.** `{clip['clip_id'][:12]}...` | "
218
+ display += f"⏱️ {format_duration(clip['duration'])} | "
219
+ display += f"▢️ {format_duration(clip['start_time'])}\n"
220
+
221
+ return display
222
+
223
+ def remove_clip(clip_number: int):
224
+ """Remove a clip from timeline"""
225
+ try:
226
+ clips = timeline_service.get_all_clips()
227
+
228
+ if not clips:
229
+ return "πŸ“­ Timeline is empty", get_timeline_display()
230
+
231
+ if clip_number < 1 or clip_number > len(clips):
232
+ return f"❌ Invalid clip number. Choose 1-{len(clips)}", get_timeline_display()
233
+
234
+ clip_id = clips[clip_number - 1]['clip_id']
235
+ timeline_service.remove_clip(clip_id)
236
+
237
+ return f"βœ… Clip {clip_number} removed", get_timeline_display()
238
+
239
+ except Exception as e:
240
+ logger.error(f"Error removing clip: {e}", exc_info=True)
241
+ return f"❌ Error: {str(e)}", get_timeline_display()
242
+
243
+ def clear_timeline():
244
+ """Clear all clips from timeline"""
245
+ try:
246
+ timeline_service.clear()
247
+ return "βœ… Timeline cleared", get_timeline_display()
248
+ except Exception as e:
249
+ logger.error(f"Error clearing timeline: {e}", exc_info=True)
250
+ return f"❌ Error: {str(e)}", get_timeline_display()
251
+
252
+ def export_timeline(filename: str, export_format: str, progress=gr.Progress()):
253
+ """Export timeline to audio file"""
254
+ try:
255
+ clips = timeline_service.get_all_clips()
256
+
257
+ if not clips:
258
+ return "❌ No clips to export", None
259
+
260
+ if not filename or not filename.strip():
261
+ filename = "output"
262
+
263
+ progress(0, desc="πŸ”„ Merging clips...")
264
+ logger.info(f"Exporting timeline: {filename}.{export_format}")
265
+
266
+ export_service.timeline_service = timeline_service
267
+
268
+ progress(0.5, desc="πŸ’Ύ Encoding audio...")
269
+ output_path = export_service.merge_clips(
270
+ filename=filename,
271
+ export_format=export_format
272
+ )
273
+
274
+ if output_path:
275
+ progress(1.0, desc="βœ… Export complete!")
276
+ return f"βœ… Exported: {os.path.basename(output_path)}", output_path
277
+ else:
278
+ return "❌ Export failed", None
279
+
280
+ except Exception as e:
281
+ logger.error(f"Error exporting: {e}", exc_info=True)
282
+ return f"❌ Error: {str(e)}", None
283
+
284
+ def format_duration(seconds: float) -> str:
285
+ """Format duration as MM:SS"""
286
+ mins = int(seconds // 60)
287
+ secs = int(seconds % 60)
288
+ return f"{mins}:{secs:02d}"
289
+
290
+ # Create Gradio interface
291
+ with gr.Blocks(
292
+ title="🎡 Music Generation Studio",
293
+ theme=gr.themes.Soft(primary_hue="purple", secondary_hue="pink")
294
+ ) as app:
295
+
296
+ gr.Markdown(
297
+ """
298
+ # 🎡 Music Generation Studio
299
+
300
+ Create AI-powered music with DiffRhythm2 and LyricMind AI
301
+
302
+ πŸ’‘ **Tip**: Start with 10-20 second clips for faster generation on HuggingFace Spaces
303
+ """
304
+ )
305
+
306
+ with gr.Row():
307
+ # Left Column - Generation
308
+ with gr.Column(scale=2):
309
+ gr.Markdown("### 🎼 Music Generation")
310
+
311
+ prompt_input = gr.Textbox(
312
+ label="🎯 Music Prompt",
313
+ placeholder="energetic rock song with electric guitar at 140 BPM",
314
+ lines=3,
315
+ info="Describe the music style, instruments, tempo, and mood"
316
+ )
317
+
318
+ lyrics_mode = gr.Radio(
319
+ choices=["Instrumental", "User Lyrics", "Auto Lyrics"],
320
+ value="Instrumental",
321
+ label="🎀 Vocal Mode",
322
+ info="Instrumental: no vocals | User: provide lyrics | Auto: AI-generated"
323
+ )
324
+
325
+ with gr.Row():
326
+ auto_gen_btn = gr.Button("✍️ Generate Lyrics", size="sm")
327
+
328
+ lyrics_input = gr.Textbox(
329
+ label="πŸ“ Lyrics",
330
+ placeholder="Enter lyrics or click 'Generate Lyrics'...",
331
+ lines=6
332
+ )
333
+
334
+ with gr.Row():
335
+ duration_input = gr.Slider(
336
+ minimum=10,
337
+ maximum=60,
338
+ value=20,
339
+ step=5,
340
+ label="⏱️ Duration (seconds)",
341
+ info="Shorter = faster generation"
342
+ )
343
+ position_input = gr.Radio(
344
+ choices=["intro", "previous", "next", "outro"],
345
+ value="next",
346
+ label="πŸ“ Position"
347
+ )
348
+
349
+ generate_btn = gr.Button(
350
+ "✨ Generate Music Clip",
351
+ variant="primary",
352
+ size="lg"
353
+ )
354
+
355
+ gen_status = gr.Textbox(label="πŸ“Š Status", lines=3, interactive=False)
356
+ audio_output = gr.Audio(label="🎧 Preview", type="filepath")
357
+
358
+ # Right Column - Timeline
359
+ with gr.Column(scale=1):
360
+ gr.Markdown("### πŸ“Š Timeline")
361
+
362
+ timeline_display = gr.Textbox(
363
+ label="Clips",
364
+ value=get_timeline_display(),
365
+ lines=12,
366
+ interactive=False
367
+ )
368
+
369
+ with gr.Row():
370
+ clip_number_input = gr.Number(
371
+ label="Clip #",
372
+ precision=0,
373
+ minimum=1,
374
+ scale=1
375
+ )
376
+ remove_btn = gr.Button("πŸ—‘οΈ Remove", size="sm", scale=1)
377
+
378
+ clear_btn = gr.Button("πŸ—‘οΈ Clear All", variant="stop")
379
+ timeline_status = gr.Textbox(label="Status", lines=1, interactive=False)
380
+
381
+ # Export Section
382
+ gr.Markdown("---")
383
+ gr.Markdown("### πŸ’Ύ Export")
384
+
385
+ with gr.Row():
386
+ export_filename = gr.Textbox(
387
+ label="Filename",
388
+ value="my_song",
389
+ scale=2
390
+ )
391
+ export_format = gr.Dropdown(
392
+ choices=["wav", "mp3"],
393
+ value="wav",
394
+ label="Format",
395
+ scale=1
396
+ )
397
+ export_btn = gr.Button("πŸ’Ύ Export", variant="primary", scale=1)
398
+
399
+ export_status = gr.Textbox(label="Status", lines=1, interactive=False)
400
+ export_audio = gr.Audio(label="πŸ“₯ Download", type="filepath")
401
+
402
+ # Event handlers
403
+ auto_gen_btn.click(
404
+ fn=generate_lyrics,
405
+ inputs=[prompt_input, duration_input],
406
+ outputs=lyrics_input
407
+ )
408
+
409
+ generate_btn.click(
410
+ fn=generate_music,
411
+ inputs=[prompt_input, lyrics_input, lyrics_mode, duration_input, position_input],
412
+ outputs=[gen_status, timeline_display, audio_output]
413
+ )
414
+
415
+ remove_btn.click(
416
+ fn=remove_clip,
417
+ inputs=clip_number_input,
418
+ outputs=[timeline_status, timeline_display]
419
+ )
420
+
421
+ clear_btn.click(
422
+ fn=clear_timeline,
423
+ outputs=[timeline_status, timeline_display]
424
+ )
425
+
426
+ export_btn.click(
427
+ fn=export_timeline,
428
+ inputs=[export_filename, export_format],
429
+ outputs=[export_status, export_audio]
430
+ )
431
+
432
+ # Help section
433
+ with gr.Accordion("ℹ️ Help & Tips", open=False):
434
+ gr.Markdown(
435
+ """
436
+ ## πŸš€ Quick Start
437
+
438
+ 1. **Enter a prompt**: "upbeat pop song with synth at 128 BPM"
439
+ 2. **Choose mode**: Instrumental (fastest) or with vocals
440
+ 3. **Set duration**: Start with 10-20s for quick results
441
+ 4. **Generate**: Click the button and wait ~2-4 minutes
442
+ 5. **Export**: Download your complete song
443
+
444
+ ## ⚑ Performance Tips
445
+
446
+ - **Shorter clips = faster**: 10-20s clips generate in ~1-2 minutes
447
+ - **Instrumental mode**: ~30% faster than with vocals
448
+ - **HF Spaces uses CPU**: Expect 2-4 minutes per 30s clip
449
+ - **Build incrementally**: Generate short clips, then combine
450
+
451
+ ## 🎯 Prompt Tips
452
+
453
+ - **Be specific**: "energetic rock with distorted guitar" > "rock song"
454
+ - **Include BPM**: "at 140 BPM" helps set tempo
455
+ - **Mention instruments**: "with piano and drums"
456
+ - **Describe mood**: "melancholic", "upbeat", "aggressive"
457
+
458
+ ## 🎀 Vocal Modes
459
+
460
+ - **Instrumental**: Pure music, no vocals (fastest)
461
+ - **User Lyrics**: Provide your own lyrics
462
+ - **Auto Lyrics**: AI generates lyrics based on prompt
463
+
464
+ ## πŸ“Š Timeline
465
+
466
+ - Clips are arranged sequentially
467
+ - Remove or clear clips as needed
468
+ - Export combines all clips into one file
469
+
470
+ ---
471
+
472
+ ⏱️ **Average Generation Time**: 2-4 minutes per 30-second clip on CPU
473
+
474
+ 🎡 **Models**: DiffRhythm2 + MuQ-MuLan + LyricMind AI
475
+ """
476
+ )
477
+
478
+ # Configure and launch
479
+ if __name__ == "__main__":
480
+ logger.info("🎡 Starting Music Generation Studio on HuggingFace Spaces...")
481
+
482
+ app.queue(
483
+ default_concurrency_limit=1,
484
+ max_size=5
485
+ )
486
+
487
+ app.launch()
backend/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+ """Backend package"""
backend/app.py ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Main Flask application for the Music Generation App
3
+ """
4
+ import os
5
+ import logging
6
+ from flask import Flask, jsonify, send_from_directory
7
+ from flask_cors import CORS
8
+ from dotenv import load_dotenv
9
+
10
+ from config.settings import Config
11
+ from routes.generation import generation_bp
12
+ from routes.timeline import timeline_bp
13
+ from routes.export import export_bp
14
+ from routes.mastering import mastering_bp
15
+ from utils.logger import setup_logger
16
+
17
+ # Load environment variables
18
+ load_dotenv()
19
+
20
+ def create_app(config_class=Config):
21
+ """Application factory pattern"""
22
+ app = Flask(__name__)
23
+ app.config.from_object(config_class)
24
+
25
+ # Enable CORS
26
+ CORS(app, resources={r"/api/*": {"origins": "*"}})
27
+
28
+ # Setup logging
29
+ setup_logger(app)
30
+
31
+ # Create necessary directories
32
+ os.makedirs(app.config['UPLOAD_FOLDER'], exist_ok=True)
33
+ os.makedirs(app.config['OUTPUT_FOLDER'], exist_ok=True)
34
+ os.makedirs(app.config['MODELS_DIR'], exist_ok=True)
35
+ os.makedirs('logs', exist_ok=True)
36
+
37
+ # Register blueprints
38
+ app.register_blueprint(generation_bp, url_prefix='/api/generation')
39
+ app.register_blueprint(timeline_bp, url_prefix='/api/timeline')
40
+ app.register_blueprint(export_bp, url_prefix='/api/export')
41
+ app.register_blueprint(mastering_bp, url_prefix='/api/mastering')
42
+
43
+ # Serve static files from outputs directory with proper MIME types
44
+ @app.route('/outputs/<path:filename>')
45
+ def serve_output(filename):
46
+ response = send_from_directory(app.config['OUTPUT_FOLDER'], filename)
47
+ # Ensure WAV files have correct MIME type
48
+ if filename.lower().endswith('.wav'):
49
+ response.headers['Content-Type'] = 'audio/wav'
50
+ elif filename.lower().endswith('.mp3'):
51
+ response.headers['Content-Type'] = 'audio/mpeg'
52
+ return response
53
+
54
+ # Health check endpoint
55
+ @app.route('/api/health')
56
+ def health_check():
57
+ return jsonify({
58
+ 'status': 'healthy',
59
+ 'version': '1.0.0'
60
+ })
61
+
62
+ # Error handlers
63
+ @app.errorhandler(404)
64
+ def not_found(error):
65
+ return jsonify({'error': 'Not found'}), 404
66
+
67
+ @app.errorhandler(500)
68
+ def internal_error(error):
69
+ app.logger.error(f'Internal server error: {str(error)}')
70
+ return jsonify({'error': 'Internal server error'}), 500
71
+
72
+ return app
73
+
74
+ if __name__ == '__main__':
75
+ app = create_app()
76
+ port = int(os.getenv('PORT', 5000))
77
+ host = os.getenv('HOST', '0.0.0.0')
78
+
79
+ app.logger.info(f'Starting server on {host}:{port}')
80
+ app.run(host=host, port=port, debug=os.getenv('FLASK_DEBUG', 'False') == 'True')
backend/config/__init__.py ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ """Configuration package"""
2
+ from .settings import Config, config
3
+
4
+ __all__ = ['Config', 'config']
backend/config/settings.py ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Application configuration settings
3
+ """
4
+ import os
5
+ from pathlib import Path
6
+
7
+ class Config:
8
+ """Base configuration"""
9
+
10
+ # Base directory
11
+ BASE_DIR = Path(__file__).parent.parent.parent
12
+
13
+ # Flask settings
14
+ SECRET_KEY = os.getenv('SECRET_KEY', 'dev-secret-key-change-in-production')
15
+ DEBUG = os.getenv('FLASK_DEBUG', 'False') == 'True'
16
+
17
+ # File upload settings
18
+ UPLOAD_FOLDER = os.getenv('UPLOAD_FOLDER', str(BASE_DIR / 'uploads'))
19
+ OUTPUT_FOLDER = os.getenv('OUTPUT_FOLDER', str(BASE_DIR / 'outputs'))
20
+ MAX_CONTENT_LENGTH = int(os.getenv('MAX_CONTENT_LENGTH', 16 * 1024 * 1024)) # 16MB
21
+
22
+ # Model paths
23
+ MODELS_DIR = BASE_DIR / 'models'
24
+ DIFFRHYTHM_MODEL_PATH = os.getenv('DIFFRHYTHM_MODEL_PATH', str(MODELS_DIR / 'diffrhythm2'))
25
+ FISH_SPEECH_MODEL_PATH = os.getenv('FISH_SPEECH_MODEL_PATH', str(MODELS_DIR / 'fish_speech'))
26
+ LYRICMIND_MODEL_PATH = os.getenv('LYRICMIND_MODEL_PATH', str(MODELS_DIR / 'lyricmind'))
27
+
28
+ # Generation settings
29
+ DEFAULT_CLIP_DURATION = int(os.getenv('DEFAULT_CLIP_DURATION', 30))
30
+ SAMPLE_RATE = int(os.getenv('SAMPLE_RATE', 44100))
31
+ BIT_DEPTH = int(os.getenv('BIT_DEPTH', 16))
32
+
33
+ # Logging
34
+ LOG_LEVEL = os.getenv('LOG_LEVEL', 'INFO')
35
+ LOG_FILE = os.getenv('LOG_FILE', str(BASE_DIR / 'logs' / 'app.log'))
36
+
37
+ class DevelopmentConfig(Config):
38
+ """Development configuration"""
39
+ DEBUG = True
40
+
41
+ class ProductionConfig(Config):
42
+ """Production configuration"""
43
+ DEBUG = False
44
+
45
+ # Configuration dictionary
46
+ config = {
47
+ 'development': DevelopmentConfig,
48
+ 'production': ProductionConfig,
49
+ 'default': DevelopmentConfig
50
+ }
backend/routes/__init__.py ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ """Routes package"""
2
+ from .generation import generation_bp
3
+ from .timeline import timeline_bp
4
+ from .export import export_bp
5
+
6
+ __all__ = ['generation_bp', 'timeline_bp', 'export_bp']
backend/routes/export.py ADDED
@@ -0,0 +1,124 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Routes for exporting/downloading music
3
+ """
4
+ import logging
5
+ import os
6
+ from flask import Blueprint, request, jsonify, send_file, current_app
7
+ from services.export_service import ExportService
8
+ from models.schemas import ExportFormat
9
+
10
+ logger = logging.getLogger(__name__)
11
+
12
+ export_bp = Blueprint('export', __name__)
13
+ export_service = ExportService()
14
+
15
+ @export_bp.route('/merge', methods=['POST'])
16
+ def merge_timeline():
17
+ """
18
+ Merge all clips in the timeline into a single file
19
+
20
+ Request body:
21
+ {
22
+ "format": "wav", // wav, mp3, flac
23
+ "filename": "my_song" // optional
24
+ }
25
+ """
26
+ try:
27
+ data = request.get_json() or {}
28
+ logger.info("Merging timeline clips")
29
+
30
+ # Validate format
31
+ export_format = data.get('format', 'wav')
32
+ try:
33
+ ExportFormat(export_format)
34
+ except ValueError:
35
+ return jsonify({
36
+ 'error': f"Invalid format. Must be one of: {', '.join([f.value for f in ExportFormat])}"
37
+ }), 400
38
+
39
+ filename = data.get('filename', 'merged_output')
40
+
41
+ # Merge clips
42
+ output_path = export_service.merge_clips(
43
+ filename=filename,
44
+ export_format=export_format
45
+ )
46
+
47
+ if not output_path:
48
+ return jsonify({
49
+ 'error': 'No clips to merge. Add clips to timeline first.'
50
+ }), 400
51
+
52
+ logger.info(f"Timeline merged successfully: {output_path}")
53
+
54
+ return jsonify({
55
+ 'success': True,
56
+ 'file_path': output_path,
57
+ 'filename': os.path.basename(output_path)
58
+ })
59
+
60
+ except Exception as e:
61
+ logger.error(f"Error merging timeline: {str(e)}", exc_info=True)
62
+ return jsonify({
63
+ 'error': 'Failed to merge timeline',
64
+ 'details': str(e)
65
+ }), 500
66
+
67
+ @export_bp.route('/download/<filename>', methods=['GET'])
68
+ def download_file(filename):
69
+ """Download an exported file"""
70
+ try:
71
+ output_folder = current_app.config['OUTPUT_FOLDER']
72
+ file_path = os.path.join(output_folder, filename)
73
+
74
+ if not os.path.exists(file_path):
75
+ return jsonify({'error': 'File not found'}), 404
76
+
77
+ # Security check: ensure file is in output folder
78
+ if not os.path.abspath(file_path).startswith(os.path.abspath(output_folder)):
79
+ return jsonify({'error': 'Invalid file path'}), 403
80
+
81
+ logger.info(f"Downloading file: {filename}")
82
+
83
+ return send_file(
84
+ file_path,
85
+ as_attachment=True,
86
+ download_name=filename
87
+ )
88
+
89
+ except Exception as e:
90
+ logger.error(f"Error downloading file: {str(e)}", exc_info=True)
91
+ return jsonify({'error': str(e)}), 500
92
+
93
+ @export_bp.route('/export-clip/<clip_id>', methods=['GET'])
94
+ def export_single_clip(clip_id):
95
+ """Export a single clip"""
96
+ try:
97
+ export_format = request.args.get('format', 'wav')
98
+
99
+ try:
100
+ ExportFormat(export_format)
101
+ except ValueError:
102
+ return jsonify({
103
+ 'error': f"Invalid format. Must be one of: {', '.join([f.value for f in ExportFormat])}"
104
+ }), 400
105
+
106
+ logger.info(f"Exporting single clip: {clip_id}")
107
+
108
+ output_path = export_service.export_clip(
109
+ clip_id=clip_id,
110
+ export_format=export_format
111
+ )
112
+
113
+ if not output_path:
114
+ return jsonify({'error': 'Clip not found'}), 404
115
+
116
+ return jsonify({
117
+ 'success': True,
118
+ 'file_path': output_path,
119
+ 'filename': os.path.basename(output_path)
120
+ })
121
+
122
+ except Exception as e:
123
+ logger.error(f"Error exporting clip: {str(e)}", exc_info=True)
124
+ return jsonify({'error': str(e)}), 500
backend/routes/generation.py ADDED
@@ -0,0 +1,191 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Routes for music generation
3
+ """
4
+ import os
5
+ import logging
6
+ from flask import Blueprint, request, jsonify, current_app
7
+ from services.diffrhythm_service import DiffRhythmService
8
+ from services.lyricmind_service import LyricMindService
9
+ from services.style_consistency_service import StyleConsistencyService
10
+ from services.timeline_service import TimelineService
11
+ from models.schemas import GenerationRequest, LyricsRequest
12
+ from utils.validators import validate_generation_params
13
+ from utils.prompt_analyzer import PromptAnalyzer
14
+
15
+ logger = logging.getLogger(__name__)
16
+
17
+ generation_bp = Blueprint('generation', __name__)
18
+
19
+ # Initialize services (lazy loading)
20
+ diffrhythm_service = None
21
+ lyricmind_service = None
22
+ style_service = None
23
+ timeline_service = None
24
+
25
+ def get_diffrhythm_service():
26
+ """Get or create DiffRhythm service instance"""
27
+ global diffrhythm_service
28
+ if diffrhythm_service is None:
29
+ diffrhythm_service = DiffRhythmService(
30
+ model_path=current_app.config['DIFFRHYTHM_MODEL_PATH']
31
+ )
32
+ return diffrhythm_service
33
+
34
+ def get_lyricmind_service():
35
+ """Get or create LyricMind service instance"""
36
+ global lyricmind_service
37
+ if lyricmind_service is None:
38
+ lyricmind_service = LyricMindService(
39
+ model_path=current_app.config['LYRICMIND_MODEL_PATH']
40
+ )
41
+ return lyricmind_service
42
+
43
+ def get_style_service():
44
+ """Get or create Style Consistency service instance"""
45
+ global style_service
46
+ if style_service is None:
47
+ style_service = StyleConsistencyService()
48
+ return style_service
49
+
50
+ def get_timeline_service():
51
+ """Get or create Timeline service instance"""
52
+ global timeline_service
53
+ if timeline_service is None:
54
+ timeline_service = TimelineService()
55
+ return timeline_service
56
+
57
+ @generation_bp.route('/generate-lyrics', methods=['POST'])
58
+ def generate_lyrics():
59
+ """Generate lyrics from prompt using LyricMind AI with prompt analysis"""
60
+ try:
61
+ data = LyricsRequest(**request.json)
62
+
63
+ # Analyze prompt for better context
64
+ logger.info(f"Analyzing prompt for lyrics generation: {data.prompt}")
65
+ prompt_analysis = PromptAnalyzer.analyze(data.prompt)
66
+ logger.info(f"Prompt analysis: {prompt_analysis}")
67
+
68
+ # Get lyrics service
69
+ lyrics_service = get_lyricmind_service()
70
+
71
+ # Generate lyrics with analysis context
72
+ style = data.style or (prompt_analysis.get('genres', [''])[0] if prompt_analysis.get('genres') else None)
73
+ logger.info(f"Generating lyrics with style: {style}")
74
+
75
+ lyrics = lyrics_service.generate(
76
+ prompt=data.prompt,
77
+ style=style,
78
+ duration=data.duration,
79
+ prompt_analysis=prompt_analysis
80
+ )
81
+
82
+ return jsonify({
83
+ 'lyrics': lyrics,
84
+ 'analysis': prompt_analysis
85
+ })
86
+
87
+ except ValueError as e:
88
+ logger.error(f"Validation error: {str(e)}")
89
+ return jsonify({'error': str(e)}), 400
90
+ except Exception as e:
91
+ logger.error(f"Error generating lyrics: {str(e)}", exc_info=True)
92
+ return jsonify({'error': f'Failed to generate lyrics: {str(e)}'}), 500
93
+
94
+ @generation_bp.route('/generate-music', methods=['POST'])
95
+ def generate_music():
96
+ """
97
+ Generate music clip from prompt with optional vocals
98
+
99
+ Request body:
100
+ {
101
+ "prompt": "upbeat pop song with drums",
102
+ "lyrics": "optional lyrics text",
103
+ "duration": 30
104
+ }
105
+ """
106
+ try:
107
+ data = request.get_json()
108
+ logger.info(f"Received music generation request: {data.get('prompt', 'No prompt')}")
109
+
110
+ # Validate request
111
+ validation_error = validate_generation_params(data)
112
+ if validation_error:
113
+ return jsonify({'error': validation_error}), 400
114
+
115
+ # Parse request
116
+ gen_request = GenerationRequest(**data)
117
+
118
+ # Analyze prompt for musical attributes
119
+ prompt_analysis = PromptAnalyzer.analyze(gen_request.prompt)
120
+ logger.info(f"Prompt analysis: {prompt_analysis['analysis_text']}")
121
+
122
+ # Get timeline clips for style consistency
123
+ timeline_svc = get_timeline_service()
124
+ existing_clips = timeline_svc.get_all_clips()
125
+
126
+ # Prepare style guidance if clips exist
127
+ reference_audio = None
128
+ style_profile = {}
129
+ enhanced_prompt = gen_request.prompt
130
+
131
+ if existing_clips:
132
+ logger.info(f"Found {len(existing_clips)} existing clips - applying style consistency")
133
+ style_svc = get_style_service()
134
+ reference_audio, style_profile = style_svc.get_style_guidance_for_generation(existing_clips)
135
+
136
+ # Enhance prompt with style characteristics
137
+ enhanced_prompt = style_svc.enhance_prompt_with_style(gen_request.prompt, style_profile)
138
+ logger.info(f"Enhanced prompt for style consistency: {enhanced_prompt}")
139
+ else:
140
+ logger.info("No existing clips - generating without style guidance")
141
+
142
+ # Generate music with DiffRhythm2 (includes vocals if lyrics provided)
143
+ service = get_diffrhythm_service()
144
+ lyrics_to_use = gen_request.lyrics if gen_request.lyrics else None
145
+
146
+ final_path = service.generate(
147
+ prompt=enhanced_prompt,
148
+ duration=gen_request.duration,
149
+ lyrics=lyrics_to_use,
150
+ reference_audio=reference_audio
151
+ )
152
+
153
+ logger.info(f"Music generation successful: {final_path}")
154
+
155
+ # Convert filesystem path to URL path (forward slashes, relative to outputs)
156
+ relative_path = os.path.relpath(final_path, 'outputs')
157
+ url_path = f"/outputs/{relative_path.replace(os.sep, '/')}"
158
+
159
+ return jsonify({
160
+ 'success': True,
161
+ 'clip_id': os.path.basename(final_path).split('.')[0],
162
+ 'file_path': url_path,
163
+ 'duration': gen_request.duration,
164
+ 'analysis': prompt_analysis,
165
+ 'style_consistent': len(existing_clips) > 0,
166
+ 'num_reference_clips': len(existing_clips)
167
+ })
168
+
169
+ except Exception as e:
170
+ logger.error(f"Error generating music: {str(e)}", exc_info=True)
171
+ return jsonify({
172
+ 'error': 'Failed to generate music',
173
+ 'details': str(e)
174
+ }), 500
175
+
176
+ @generation_bp.route('/status', methods=['GET'])
177
+ def get_status():
178
+ """Check if generation services are available"""
179
+ try:
180
+ status = {
181
+ 'diffrhythm': diffrhythm_service is not None
182
+ }
183
+
184
+ return jsonify({
185
+ 'services': status,
186
+ 'ready': status['diffrhythm']
187
+ })
188
+
189
+ except Exception as e:
190
+ logger.error(f"Error checking status: {str(e)}")
191
+ return jsonify({'error': str(e)}), 500
backend/routes/mastering.py ADDED
@@ -0,0 +1,185 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Routes for audio mastering and EQ
3
+ """
4
+ import os
5
+ import logging
6
+ from flask import Blueprint, request, jsonify, current_app, send_file
7
+ from services.mastering_service import MasteringService
8
+ from pathlib import Path
9
+
10
+ logger = logging.getLogger(__name__)
11
+
12
+ mastering_bp = Blueprint('mastering', __name__)
13
+
14
+ # Initialize service
15
+ mastering_service = None
16
+
17
+ def get_mastering_service():
18
+ """Get or create mastering service instance"""
19
+ global mastering_service
20
+ if mastering_service is None:
21
+ mastering_service = MasteringService()
22
+ return mastering_service
23
+
24
+ @mastering_bp.route('/presets', methods=['GET'])
25
+ def get_presets():
26
+ """Get list of all available mastering presets"""
27
+ try:
28
+ service = get_mastering_service()
29
+ presets = service.get_preset_list()
30
+ return jsonify({'presets': presets})
31
+ except Exception as e:
32
+ logger.error(f"Error getting presets: {str(e)}", exc_info=True)
33
+ return jsonify({'error': 'Failed to get presets'}), 500
34
+
35
+ @mastering_bp.route('/apply-preset', methods=['POST'])
36
+ def apply_preset():
37
+ """Apply mastering preset to audio clip"""
38
+ try:
39
+ data = request.json
40
+ clip_id = data.get('clip_id')
41
+ preset_name = data.get('preset')
42
+ audio_path = data.get('audio_path')
43
+
44
+ if not all([clip_id, preset_name, audio_path]):
45
+ return jsonify({'error': 'Missing required parameters'}), 400
46
+
47
+ # Verify audio file exists
48
+ if not os.path.exists(audio_path):
49
+ return jsonify({'error': 'Audio file not found'}), 404
50
+
51
+ # Generate output path
52
+ output_dir = Path(current_app.config['OUTPUT_FOLDER']) / 'mastered'
53
+ output_dir.mkdir(parents=True, exist_ok=True)
54
+
55
+ filename = Path(audio_path).stem
56
+ output_path = output_dir / f"{filename}_mastered_{preset_name}.wav"
57
+
58
+ # Apply preset
59
+ service = get_mastering_service()
60
+ processed_path = service.apply_preset(audio_path, preset_name, str(output_path))
61
+
62
+ # Return URL to processed file
63
+ relative_path = os.path.relpath(processed_path, current_app.config['OUTPUT_FOLDER'])
64
+ file_url = f"/outputs/{relative_path.replace(os.sep, '/')}"
65
+
66
+ return jsonify({
67
+ 'success': True,
68
+ 'processed_path': file_url,
69
+ 'clip_id': clip_id,
70
+ 'preset': preset_name
71
+ })
72
+
73
+ except ValueError as e:
74
+ logger.error(f"Validation error: {str(e)}")
75
+ return jsonify({'error': str(e)}), 400
76
+ except Exception as e:
77
+ logger.error(f"Error applying preset: {str(e)}", exc_info=True)
78
+ return jsonify({'error': f'Failed to apply preset: {str(e)}'}), 500
79
+
80
+ @mastering_bp.route('/apply-custom-eq', methods=['POST'])
81
+ def apply_custom_eq():
82
+ """Apply custom EQ settings to audio clip"""
83
+ try:
84
+ data = request.json
85
+ clip_id = data.get('clip_id')
86
+ audio_path = data.get('audio_path')
87
+ eq_bands = data.get('eq_bands', [])
88
+ compression = data.get('compression')
89
+ limiting = data.get('limiting')
90
+
91
+ if not all([clip_id, audio_path]):
92
+ return jsonify({'error': 'Missing required parameters'}), 400
93
+
94
+ # Verify audio file exists
95
+ if not os.path.exists(audio_path):
96
+ return jsonify({'error': 'Audio file not found'}), 404
97
+
98
+ # Generate output path
99
+ output_dir = Path(current_app.config['OUTPUT_FOLDER']) / 'mastered'
100
+ output_dir.mkdir(parents=True, exist_ok=True)
101
+
102
+ filename = Path(audio_path).stem
103
+ output_path = output_dir / f"{filename}_custom_eq.wav"
104
+
105
+ # Apply custom EQ
106
+ service = get_mastering_service()
107
+ processed_path = service.apply_custom_eq(
108
+ audio_path,
109
+ str(output_path),
110
+ eq_bands,
111
+ compression,
112
+ limiting
113
+ )
114
+
115
+ # Return URL to processed file
116
+ relative_path = os.path.relpath(processed_path, current_app.config['OUTPUT_FOLDER'])
117
+ file_url = f"/outputs/{relative_path.replace(os.sep, '/')}"
118
+
119
+ return jsonify({
120
+ 'success': True,
121
+ 'processed_path': file_url,
122
+ 'clip_id': clip_id
123
+ })
124
+
125
+ except Exception as e:
126
+ logger.error(f"Error applying custom EQ: {str(e)}", exc_info=True)
127
+ return jsonify({'error': f'Failed to apply custom EQ: {str(e)}'}), 500
128
+
129
+ @mastering_bp.route('/preview', methods=['POST'])
130
+ def preview_mastering():
131
+ """Preview mastering effect (non-destructive)"""
132
+ try:
133
+ data = request.json
134
+ clip_id = data.get('clip_id')
135
+ audio_path = data.get('audio_path')
136
+ preset_name = data.get('preset')
137
+ eq_bands = data.get('eq_bands')
138
+
139
+ if not all([clip_id, audio_path]):
140
+ return jsonify({'error': 'Missing required parameters'}), 400
141
+
142
+ # Verify audio file exists
143
+ if not os.path.exists(audio_path):
144
+ return jsonify({'error': 'Audio file not found'}), 404
145
+
146
+ # Generate temp output path for preview
147
+ output_dir = Path(current_app.config['OUTPUT_FOLDER']) / 'preview'
148
+ output_dir.mkdir(parents=True, exist_ok=True)
149
+
150
+ filename = Path(audio_path).stem
151
+ output_path = output_dir / f"{filename}_preview.wav"
152
+
153
+ service = get_mastering_service()
154
+
155
+ if preset_name:
156
+ # Apply preset for preview
157
+ processed_path = service.apply_preset(audio_path, preset_name, str(output_path))
158
+ elif eq_bands:
159
+ # Apply custom EQ for preview
160
+ compression = data.get('compression')
161
+ limiting = data.get('limiting')
162
+ processed_path = service.apply_custom_eq(
163
+ audio_path,
164
+ str(output_path),
165
+ eq_bands,
166
+ compression,
167
+ limiting
168
+ )
169
+ else:
170
+ return jsonify({'error': 'No preset or EQ settings provided'}), 400
171
+
172
+ # Return URL to preview file
173
+ # Use absolute path from project root for frontend to access
174
+ relative_path = os.path.relpath(processed_path, 'outputs')
175
+ file_url = f"http://localhost:7860/outputs/{relative_path.replace(os.sep, '/')}"
176
+
177
+ return jsonify({
178
+ 'success': True,
179
+ 'preview_path': file_url,
180
+ 'clip_id': clip_id
181
+ })
182
+
183
+ except Exception as e:
184
+ logger.error(f"Error generating preview: {str(e)}", exc_info=True)
185
+ return jsonify({'error': f'Failed to generate preview: {str(e)}'}), 500
backend/routes/timeline.py ADDED
@@ -0,0 +1,137 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Routes for timeline management
3
+ """
4
+ import logging
5
+ from flask import Blueprint, request, jsonify
6
+ from services.timeline_service import TimelineService
7
+ from models.schemas import ClipPosition
8
+
9
+ logger = logging.getLogger(__name__)
10
+
11
+ timeline_bp = Blueprint('timeline', __name__)
12
+ timeline_service = TimelineService()
13
+
14
+ @timeline_bp.route('/clips', methods=['GET'])
15
+ def get_clips():
16
+ """Get all clips in the timeline"""
17
+ try:
18
+ clips = timeline_service.get_all_clips()
19
+ return jsonify({
20
+ 'success': True,
21
+ 'clips': clips,
22
+ 'total_duration': timeline_service.get_total_duration()
23
+ })
24
+ except Exception as e:
25
+ logger.error(f"Error fetching clips: {str(e)}", exc_info=True)
26
+ return jsonify({'error': str(e)}), 500
27
+
28
+ @timeline_bp.route('/clips', methods=['POST'])
29
+ def add_clip():
30
+ """
31
+ Add a clip to the timeline
32
+
33
+ Request body:
34
+ {
35
+ "clip_id": "unique_id",
36
+ "file_path": "/path/to/clip.wav",
37
+ "duration": 30,
38
+ "position": "next" // intro, previous, next, outro
39
+ }
40
+ """
41
+ try:
42
+ data = request.get_json()
43
+ logger.info(f"Adding clip to timeline: {data.get('clip_id')}")
44
+
45
+ # Validate required fields
46
+ required_fields = ['clip_id', 'file_path', 'duration', 'position']
47
+ for field in required_fields:
48
+ if field not in data:
49
+ return jsonify({'error': f'Missing required field: {field}'}), 400
50
+
51
+ # Validate position
52
+ try:
53
+ position = ClipPosition(data['position'])
54
+ except ValueError:
55
+ return jsonify({
56
+ 'error': f"Invalid position. Must be one of: {', '.join([p.value for p in ClipPosition])}"
57
+ }), 400
58
+
59
+ # Add clip to timeline
60
+ result = timeline_service.add_clip(
61
+ clip_id=data['clip_id'],
62
+ file_path=data['file_path'],
63
+ duration=data['duration'],
64
+ position=position
65
+ )
66
+
67
+ logger.info(f"Clip added successfully at position: {result['timeline_position']}")
68
+
69
+ return jsonify({
70
+ 'success': True,
71
+ **result
72
+ })
73
+
74
+ except Exception as e:
75
+ logger.error(f"Error adding clip: {str(e)}", exc_info=True)
76
+ return jsonify({'error': str(e)}), 500
77
+
78
+ @timeline_bp.route('/clips/<clip_id>', methods=['DELETE'])
79
+ def remove_clip(clip_id):
80
+ """Remove a clip from the timeline"""
81
+ try:
82
+ logger.info(f"Removing clip: {clip_id}")
83
+ timeline_service.remove_clip(clip_id)
84
+
85
+ return jsonify({
86
+ 'success': True,
87
+ 'message': f'Clip {clip_id} removed'
88
+ })
89
+
90
+ except Exception as e:
91
+ logger.error(f"Error removing clip: {str(e)}", exc_info=True)
92
+ return jsonify({'error': str(e)}), 500
93
+
94
+ @timeline_bp.route('/clips/reorder', methods=['POST'])
95
+ def reorder_clips():
96
+ """
97
+ Reorder clips in the timeline
98
+
99
+ Request body:
100
+ {
101
+ "clip_ids": ["id1", "id2", "id3"]
102
+ }
103
+ """
104
+ try:
105
+ data = request.get_json()
106
+ clip_ids = data.get('clip_ids', [])
107
+
108
+ if not clip_ids:
109
+ return jsonify({'error': 'clip_ids array is required'}), 400
110
+
111
+ logger.info(f"Reordering clips: {clip_ids}")
112
+ timeline_service.reorder_clips(clip_ids)
113
+
114
+ return jsonify({
115
+ 'success': True,
116
+ 'message': 'Clips reordered successfully'
117
+ })
118
+
119
+ except Exception as e:
120
+ logger.error(f"Error reordering clips: {str(e)}", exc_info=True)
121
+ return jsonify({'error': str(e)}), 500
122
+
123
+ @timeline_bp.route('/clear', methods=['POST'])
124
+ def clear_timeline():
125
+ """Clear all clips from the timeline"""
126
+ try:
127
+ logger.info("Clearing timeline")
128
+ timeline_service.clear()
129
+
130
+ return jsonify({
131
+ 'success': True,
132
+ 'message': 'Timeline cleared'
133
+ })
134
+
135
+ except Exception as e:
136
+ logger.error(f"Error clearing timeline: {str(e)}", exc_info=True)
137
+ return jsonify({'error': str(e)}), 500
backend/run.py ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Startup script for the Music Generation App
3
+ """
4
+ import sys
5
+ import os
6
+ import signal
7
+
8
+ # Add backend directory to Python path
9
+ sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
10
+
11
+ from app import create_app
12
+
13
+ def signal_handler(sig, frame):
14
+ """Handle shutdown signals gracefully"""
15
+ print('\n\n[INFO] Shutting down server...')
16
+ sys.exit(0)
17
+
18
+ if __name__ == '__main__':
19
+ try:
20
+ app = create_app()
21
+ port = int(os.getenv('PORT', 7860)) # Default to 7860 to match frontend expectations
22
+ host = os.getenv('HOST', '0.0.0.0')
23
+
24
+ print(f"""
25
+ ================================================================
26
+ Music Generation App Server Starting...
27
+ ================================================================
28
+
29
+ Server running at: http://{host}:{port}
30
+ API endpoints: http://{host}:{port}/api
31
+ Health check: http://{host}:{port}/api/health
32
+
33
+ Press Ctrl+C to stop the server
34
+ ================================================================
35
+ """)
36
+
37
+ # Register signal handlers
38
+ signal.signal(signal.SIGINT, signal_handler)
39
+ signal.signal(signal.SIGTERM, signal_handler)
40
+
41
+ # Use waitress for production-ready server
42
+ from waitress import serve
43
+ print('[INFO] Server is ready!')
44
+ serve(app, host=host, port=port, threads=4)
45
+
46
+ except Exception as e:
47
+ print(f"\n[ERROR] Failed to start server: {e}", file=sys.stderr)
48
+ import traceback
49
+ traceback.print_exc()
50
+ sys.exit(1)
backend/services/__init__.py ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Services package"""
2
+ from .diffrhythm_service import DiffRhythmService
3
+ from .timeline_service import TimelineService
4
+ from .export_service import ExportService
5
+ from .fish_speech_service import FishSpeechService
6
+
7
+ __all__ = [
8
+ 'DiffRhythmService',
9
+ 'TimelineService',
10
+ 'ExportService',
11
+ 'FishSpeechService'
12
+ ]
backend/services/diffrhythm_service.py ADDED
@@ -0,0 +1,397 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ DiffRhythm 2 music generation service
3
+ Integrates with the DiffRhythm 2 model for music generation with vocals
4
+ """
5
+ import os
6
+ import sys
7
+ import logging
8
+ import uuid
9
+ from pathlib import Path
10
+ from typing import Optional
11
+ import numpy as np
12
+ import soundfile as sf
13
+ import torch
14
+ import torchaudio
15
+ import json
16
+
17
+ # Configure espeak-ng path for phonemizer (required by g2p module)
18
+ # Note: Environment configuration handled by hf_config.py for HuggingFace Spaces
19
+ # or by launch scripts for local development
20
+ if "PHONEMIZER_ESPEAK_PATH" not in os.environ:
21
+ # Fallback for local development without launcher
22
+ espeak_path = Path(__file__).parent.parent.parent / "external" / "espeak-ng"
23
+ if espeak_path.exists():
24
+ os.environ["PHONEMIZER_ESPEAK_LIBRARY"] = str(espeak_path / "libespeak-ng.dll")
25
+ os.environ["PHONEMIZER_ESPEAK_PATH"] = str(espeak_path)
26
+
27
+ # Add DiffRhythm2 source code to path (cloned repo, not pip package)
28
+ diffrhythm2_src = Path(__file__).parent.parent.parent / "models" / "diffrhythm2_source"
29
+ sys.path.insert(0, str(diffrhythm2_src))
30
+
31
+ logger = logging.getLogger(__name__)
32
+
33
+ class DiffRhythmService:
34
+ """Service for DiffRhythm 2 music generation"""
35
+
36
+ def __init__(self, model_path: str):
37
+ """
38
+ Initialize DiffRhythm 2 service
39
+
40
+ Args:
41
+ model_path: Path to DiffRhythm 2 model files
42
+ """
43
+ self.model_path = model_path
44
+ self.model = None
45
+ self.mulan = None
46
+ self.lrc_tokenizer = None
47
+ self.decoder = None
48
+ self.is_initialized = False
49
+ self.device = self._get_device()
50
+ logger.info(f"DiffRhythm 2 service created with model path: {model_path}")
51
+ logger.info(f"Using device: {self.device}")
52
+
53
+ def _get_device(self):
54
+ """Get compute device (CUDA or CPU)"""
55
+ # Try CUDA first (NVIDIA)
56
+ if torch.cuda.is_available():
57
+ logger.info("Using CUDA (NVIDIA GPU)")
58
+ return torch.device("cuda")
59
+
60
+ # Note: DirectML support disabled due to version conflicts with DiffRhythm2
61
+ # DiffRhythm2 requires torch>=2.4, but torch-directml requires torch==2.4.1
62
+ # For AMD GPU acceleration, consider using ROCm with compatible PyTorch build
63
+
64
+ # Fallback to CPU
65
+ logger.info("Using CPU (no GPU acceleration)")
66
+ return torch.device("cpu")
67
+
68
+ def _initialize_model(self):
69
+ """Lazy load the DiffRhythm 2 model when first needed"""
70
+ if self.is_initialized:
71
+ return
72
+
73
+ try:
74
+ logger.info("Initializing DiffRhythm 2 model...")
75
+
76
+ from diffrhythm2.cfm import CFM
77
+ from diffrhythm2.backbones.dit import DiT
78
+ from bigvgan.model import Generator
79
+ from muq import MuQMuLan
80
+ from huggingface_hub import hf_hub_download
81
+ from safetensors.torch import load_file
82
+
83
+ # Load DiffRhythm 2 model
84
+ repo_id = "ASLP-lab/DiffRhythm2"
85
+
86
+ # Download model files
87
+ model_ckpt = hf_hub_download(
88
+ repo_id=repo_id,
89
+ filename="model.safetensors",
90
+ local_dir=self.model_path,
91
+ local_files_only=False,
92
+ )
93
+ model_config_path = hf_hub_download(
94
+ repo_id=repo_id,
95
+ filename="config.json",
96
+ local_dir=self.model_path,
97
+ local_files_only=False,
98
+ )
99
+
100
+ # Load config
101
+ with open(model_config_path) as f:
102
+ model_config = json.load(f)
103
+
104
+ model_config['use_flex_attn'] = False
105
+
106
+ # Create model
107
+ self.model = CFM(
108
+ transformer=DiT(**model_config),
109
+ num_channels=model_config['mel_dim'],
110
+ block_size=model_config['block_size'],
111
+ )
112
+
113
+ # Load weights
114
+ ckpt = load_file(model_ckpt)
115
+ self.model.load_state_dict(ckpt)
116
+ self.model = self.model.to(self.device)
117
+
118
+ # Load MuLan for style encoding
119
+ self.mulan = MuQMuLan.from_pretrained(
120
+ "OpenMuQ/MuQ-MuLan-large",
121
+ cache_dir=os.path.join(self.model_path, "mulan")
122
+ ).to(self.device)
123
+
124
+ # Load tokenizer
125
+ from g2p.g2p_generation import chn_eng_g2p
126
+ vocab_path = os.path.join(self.model_path, "vocab.json")
127
+ if not os.path.exists(vocab_path):
128
+ # Download vocab
129
+ vocab_path = hf_hub_download(
130
+ repo_id=repo_id,
131
+ filename="g2p/g2p/vocab.json",
132
+ local_dir=self.model_path,
133
+ local_files_only=False,
134
+ )
135
+
136
+ with open(vocab_path, 'r') as f:
137
+ phone2id = json.load(f)['vocab']
138
+
139
+ self.lrc_tokenizer = {
140
+ 'phone2id': phone2id,
141
+ 'g2p': chn_eng_g2p
142
+ }
143
+
144
+ # Load decoder (BigVGAN vocoder)
145
+ decoder_ckpt = hf_hub_download(
146
+ repo_id=repo_id,
147
+ filename="decoder.bin",
148
+ local_dir=self.model_path,
149
+ local_files_only=False,
150
+ )
151
+ decoder_config = hf_hub_download(
152
+ repo_id=repo_id,
153
+ filename="decoder.json",
154
+ local_dir=self.model_path,
155
+ local_files_only=False,
156
+ )
157
+
158
+ self.decoder = Generator(decoder_config, decoder_ckpt)
159
+ self.decoder = self.decoder.to(self.device)
160
+
161
+ logger.info("βœ… DiffRhythm 2 model loaded successfully")
162
+
163
+ self.is_initialized = True
164
+ logger.info("DiffRhythm 2 service initialized")
165
+
166
+ except Exception as e:
167
+ logger.error(f"Failed to initialize DiffRhythm 2: {str(e)}", exc_info=True)
168
+ raise RuntimeError(f"Could not load DiffRhythm 2 model: {str(e)}")
169
+
170
+ def generate(
171
+ self,
172
+ prompt: str,
173
+ duration: int = 30,
174
+ sample_rate: int = 44100,
175
+ lyrics: Optional[str] = None,
176
+ reference_audio: Optional[str] = None
177
+ ) -> str:
178
+ """
179
+ Generate music from text prompt with optional vocals/lyrics and style reference
180
+
181
+ Args:
182
+ prompt: Text description of desired music
183
+ duration: Length in seconds
184
+ sample_rate: Audio sample rate
185
+ lyrics: Optional lyrics for vocals
186
+ reference_audio: Optional path to reference audio for style consistency
187
+
188
+ Returns:
189
+ Path to generated audio file
190
+ """
191
+ try:
192
+ self._initialize_model()
193
+
194
+ if lyrics:
195
+ logger.info(f"Generating music with vocals: prompt='{prompt}', lyrics_length={len(lyrics)}")
196
+ else:
197
+ logger.info(f"Generating instrumental music: prompt='{prompt}'")
198
+
199
+ if reference_audio and os.path.exists(reference_audio):
200
+ logger.info(f"Using style reference: {reference_audio}")
201
+
202
+ logger.info(f"Duration={duration}s")
203
+
204
+ # Try to generate with DiffRhythm 2
205
+ if self.model is not None:
206
+ audio = self._generate_with_diffrhythm2(prompt, lyrics, duration, sample_rate, reference_audio)
207
+ else:
208
+ # Fallback: Generate placeholder
209
+ logger.warning("Using placeholder audio generation")
210
+ audio = self._generate_placeholder(duration, sample_rate)
211
+
212
+ # Save to file
213
+ output_dir = os.path.join('outputs', 'music')
214
+ os.makedirs(output_dir, exist_ok=True)
215
+
216
+ clip_id = str(uuid.uuid4())
217
+ output_path = os.path.join(output_dir, f"{clip_id}.wav")
218
+
219
+ # Ensure audio is in correct format (channels, samples) for soundfile
220
+ # If audio is 1D (mono), keep it as is. If 2D, ensure it's (samples, channels)
221
+ if audio.ndim == 1:
222
+ # Mono audio - soundfile expects (samples,) shape
223
+ sf.write(output_path, audio, sample_rate)
224
+ else:
225
+ # Stereo/multi-channel - soundfile expects (samples, channels)
226
+ sf.write(output_path, audio, sample_rate)
227
+
228
+ logger.info(f"Music generated successfully: {output_path}")
229
+
230
+ return output_path
231
+
232
+ except Exception as e:
233
+ logger.error(f"Music generation failed: {str(e)}", exc_info=True)
234
+ raise RuntimeError(f"Failed to generate music: {str(e)}")
235
+
236
+ def _generate_with_diffrhythm2(
237
+ self,
238
+ prompt: str,
239
+ lyrics: Optional[str],
240
+ duration: int,
241
+ sample_rate: int,
242
+ reference_audio: Optional[str] = None
243
+ ) -> np.ndarray:
244
+ """
245
+ Generate music using DiffRhythm 2 model with optional style reference
246
+
247
+ Args:
248
+ prompt: Music description (used as style prompt)
249
+ lyrics: Lyrics for vocals (required for vocal generation)
250
+ duration: Duration in seconds
251
+ sample_rate: Sample rate
252
+ reference_audio: Optional path to reference audio for style guidance
253
+
254
+ Returns:
255
+ Audio array
256
+ """
257
+ try:
258
+ logger.info("Generating with DiffRhythm 2 model...")
259
+
260
+ # Prepare lyrics tokens
261
+ if lyrics:
262
+ lyrics_token = self._tokenize_lyrics(lyrics)
263
+ else:
264
+ # For instrumental, use empty structure
265
+ lyrics_token = torch.tensor([500, 511], dtype=torch.long, device=self.device) # [start][stop]
266
+
267
+ # Encode style prompt with optional reference audio blending
268
+ with torch.no_grad():
269
+ if reference_audio and os.path.exists(reference_audio):
270
+ try:
271
+ import torchaudio
272
+ # Load reference audio
273
+ ref_waveform, ref_sr = torchaudio.load(reference_audio)
274
+ if ref_sr != 24000: # MuLan expects 24kHz
275
+ ref_waveform = torchaudio.functional.resample(ref_waveform, ref_sr, 24000)
276
+
277
+ # Encode reference audio with MuLan
278
+ ref_waveform = ref_waveform.to(self.device)
279
+ audio_style_embed = self.mulan(audios=ref_waveform.unsqueeze(0))
280
+ text_style_embed = self.mulan(texts=[prompt])
281
+
282
+ # Blend reference audio style with text prompt (70% audio, 30% text)
283
+ style_prompt_embed = 0.7 * audio_style_embed + 0.3 * text_style_embed
284
+ logger.info("Using blended style: 70% reference audio + 30% text prompt")
285
+ except Exception as e:
286
+ logger.warning(f"Failed to use reference audio, using text prompt only: {e}")
287
+ style_prompt_embed = self.mulan(texts=[prompt])
288
+ else:
289
+ style_prompt_embed = self.mulan(texts=[prompt])
290
+
291
+ style_prompt_embed = style_prompt_embed.to(self.device).squeeze(0)
292
+
293
+ # Use FP16 if on GPU
294
+ if self.device.type != 'cpu':
295
+ self.model = self.model.half()
296
+ self.decoder = self.decoder.half()
297
+ style_prompt_embed = style_prompt_embed.half()
298
+
299
+ # Generate latent representation
300
+ with torch.inference_mode():
301
+ latent = self.model.sample_block_cache(
302
+ text=lyrics_token.unsqueeze(0),
303
+ duration=int(duration * 5), # DiffRhythm uses 5 frames per second
304
+ style_prompt=style_prompt_embed.unsqueeze(0),
305
+ steps=16, # Sampling steps
306
+ cfg_strength=2.0, # Classifier-free guidance
307
+ process_bar=False
308
+ )
309
+
310
+ # Decode to audio
311
+ latent = latent.transpose(1, 2)
312
+ audio = self.decoder.decode_audio(latent, overlap=5, chunk_size=20)
313
+
314
+ # Convert to numpy
315
+ audio = audio.float().cpu().numpy().squeeze()
316
+
317
+ # Ensure correct length
318
+ target_length = int(duration * sample_rate)
319
+ if len(audio) > target_length:
320
+ audio = audio[:target_length]
321
+ elif len(audio) < target_length:
322
+ audio = np.pad(audio, (0, target_length - len(audio)))
323
+
324
+ # Resample if needed
325
+ if sample_rate != 24000: # DiffRhythm 2 native sample rate
326
+ import scipy.signal as signal
327
+ audio = signal.resample(audio, target_length)
328
+
329
+ logger.info("βœ… DiffRhythm 2 generation successful")
330
+ return audio.astype(np.float32)
331
+
332
+ except Exception as e:
333
+ logger.error(f"DiffRhythm 2 generation failed: {str(e)}")
334
+ return self._generate_placeholder(duration, sample_rate)
335
+
336
+ def _tokenize_lyrics(self, lyrics: str) -> torch.Tensor:
337
+ """
338
+ Tokenize lyrics for DiffRhythm 2
339
+
340
+ Args:
341
+ lyrics: Lyrics text
342
+
343
+ Returns:
344
+ Tokenized lyrics tensor
345
+ """
346
+ try:
347
+ # Structure tags
348
+ STRUCT_INFO = {
349
+ "[start]": 500,
350
+ "[end]": 501,
351
+ "[intro]": 502,
352
+ "[verse]": 503,
353
+ "[chorus]": 504,
354
+ "[outro]": 505,
355
+ "[inst]": 506,
356
+ "[solo]": 507,
357
+ "[bridge]": 508,
358
+ "[hook]": 509,
359
+ "[break]": 510,
360
+ "[stop]": 511,
361
+ "[space]": 512
362
+ }
363
+
364
+ # Convert lyrics to phonemes and tokens
365
+ phone, tokens = self.lrc_tokenizer['g2p'](lyrics)
366
+ tokens = [x + 1 for x in tokens] # Offset by 1
367
+
368
+ # Add structure: [start] + lyrics + [stop]
369
+ lyrics_tokens = [STRUCT_INFO['[start]']] + tokens + [STRUCT_INFO['[stop]']]
370
+
371
+ return torch.tensor(lyrics_tokens, dtype=torch.long, device=self.device)
372
+
373
+ except Exception as e:
374
+ logger.error(f"Lyrics tokenization failed: {str(e)}")
375
+ # Return minimal structure
376
+ return torch.tensor([500, 511], dtype=torch.long, device=self.device)
377
+
378
+ def _generate_placeholder(self, duration: int, sample_rate: int) -> np.ndarray:
379
+ """
380
+ Generate placeholder audio (for testing without actual model)
381
+
382
+ Args:
383
+ duration: Length in seconds
384
+ sample_rate: Sample rate
385
+
386
+ Returns:
387
+ Audio array
388
+ """
389
+ logger.warning("Using placeholder audio - DiffRhythm 2 model not loaded")
390
+
391
+ # Generate simple sine wave as placeholder
392
+ t = np.linspace(0, duration, int(duration * sample_rate))
393
+ frequency = 440 # A4 note
394
+ audio = 0.3 * np.sin(2 * np.pi * frequency * t)
395
+
396
+ return audio.astype(np.float32)
397
+
backend/services/export_service.py ADDED
@@ -0,0 +1,123 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Export and merge service
3
+ """
4
+ import os
5
+ import logging
6
+ from typing import Optional, List
7
+ import numpy as np
8
+ import soundfile as sf
9
+ from services.timeline_service import TimelineService
10
+
11
+ logger = logging.getLogger(__name__)
12
+
13
+ class ExportService:
14
+ """Service for exporting and merging audio"""
15
+
16
+ def __init__(self):
17
+ """Initialize export service"""
18
+ self.timeline_service = TimelineService()
19
+ logger.info("Export service initialized")
20
+
21
+ def merge_clips(
22
+ self,
23
+ filename: str = "output",
24
+ export_format: str = "wav"
25
+ ) -> Optional[str]:
26
+ """
27
+ Merge all timeline clips into a single file
28
+
29
+ Args:
30
+ filename: Output filename (without extension)
31
+ export_format: Output format (wav, mp3, flac)
32
+
33
+ Returns:
34
+ Path to merged file, or None if no clips
35
+ """
36
+ try:
37
+ clips = self.timeline_service.get_all_clips()
38
+
39
+ if not clips:
40
+ logger.warning("No clips to merge")
41
+ return None
42
+
43
+ logger.info(f"Merging {len(clips)} clips")
44
+
45
+ # Load all clips
46
+ audio_data = []
47
+ sample_rate = None
48
+
49
+ for clip in clips:
50
+ audio, sr = sf.read(clip['file_path'])
51
+
52
+ if sample_rate is None:
53
+ sample_rate = sr
54
+ elif sr != sample_rate:
55
+ logger.warning(f"Sample rate mismatch: {sr} vs {sample_rate}")
56
+ # Could resample here if needed
57
+
58
+ audio_data.append(audio)
59
+
60
+ # Concatenate all clips
61
+ merged_audio = np.concatenate(audio_data)
62
+
63
+ # Normalize
64
+ max_val = np.abs(merged_audio).max()
65
+ if max_val > 0:
66
+ merged_audio = merged_audio / max_val * 0.95
67
+
68
+ # Save merged file
69
+ output_dir = 'outputs'
70
+ os.makedirs(output_dir, exist_ok=True)
71
+
72
+ output_path = os.path.join(output_dir, f"{filename}.{export_format}")
73
+
74
+ sf.write(output_path, merged_audio, sample_rate)
75
+
76
+ logger.info(f"Clips merged successfully: {output_path}")
77
+ return output_path
78
+
79
+ except Exception as e:
80
+ logger.error(f"Failed to merge clips: {str(e)}", exc_info=True)
81
+ raise
82
+
83
+ def export_clip(
84
+ self,
85
+ clip_id: str,
86
+ export_format: str = "wav"
87
+ ) -> Optional[str]:
88
+ """
89
+ Export a single clip
90
+
91
+ Args:
92
+ clip_id: ID of clip to export
93
+ export_format: Output format
94
+
95
+ Returns:
96
+ Path to exported file, or None if clip not found
97
+ """
98
+ try:
99
+ clip = self.timeline_service.get_clip(clip_id)
100
+
101
+ if not clip:
102
+ logger.warning(f"Clip not found: {clip_id}")
103
+ return None
104
+
105
+ logger.info(f"Exporting clip: {clip_id}")
106
+
107
+ # Load clip
108
+ audio, sr = sf.read(clip.file_path)
109
+
110
+ # Export with requested format
111
+ output_dir = 'outputs'
112
+ os.makedirs(output_dir, exist_ok=True)
113
+
114
+ output_path = os.path.join(output_dir, f"{clip_id}.{export_format}")
115
+
116
+ sf.write(output_path, audio, sr)
117
+
118
+ logger.info(f"Clip exported: {output_path}")
119
+ return output_path
120
+
121
+ except Exception as e:
122
+ logger.error(f"Failed to export clip: {str(e)}", exc_info=True)
123
+ raise
backend/services/fish_speech_service.py ADDED
@@ -0,0 +1,148 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Fish Speech TTS/vocals service
3
+ """
4
+ import os
5
+ import logging
6
+ import uuid
7
+ import torch
8
+ from typing import Optional
9
+ import numpy as np
10
+ import soundfile as sf
11
+
12
+ logger = logging.getLogger(__name__)
13
+
14
+ class FishSpeechService:
15
+ """Service for Fish Speech TTS and vocal synthesis"""
16
+
17
+ def __init__(self, model_path: str):
18
+ """
19
+ Initialize Fish Speech service
20
+
21
+ Args:
22
+ model_path: Path to Fish Speech model files
23
+ """
24
+ self.model_path = model_path
25
+ self.model = None
26
+ self.vocoder = None
27
+ self.is_initialized = False
28
+ self.device = self._get_device()
29
+ logger.info(f"Fish Speech service created with model path: {model_path}")
30
+ logger.info(f"Using device: {self.device}")
31
+
32
+ def _get_device(self):
33
+ """Get compute device (AMD GPU via DirectML or CPU)"""
34
+ try:
35
+ from utils.amd_gpu import DEFAULT_DEVICE
36
+ return DEFAULT_DEVICE
37
+ except:
38
+ return torch.device("cpu")
39
+
40
+ def _initialize_model(self):
41
+ """Lazy load the model when first needed"""
42
+ if self.is_initialized:
43
+ return
44
+
45
+ try:
46
+ logger.info("Initializing Fish Speech model...")
47
+ # TODO: Load actual Fish Speech model
48
+ # from fish_speech import FishSpeechModel
49
+ # self.model = FishSpeechModel.load(self.model_path)
50
+
51
+ self.is_initialized = True
52
+ logger.info("Fish Speech model initialized successfully")
53
+
54
+ except Exception as e:
55
+ logger.error(f"Failed to initialize Fish Speech model: {str(e)}", exc_info=True)
56
+ raise RuntimeError(f"Could not load Fish Speech model: {str(e)}")
57
+
58
+ def synthesize_vocals(
59
+ self,
60
+ lyrics: str,
61
+ duration: int = 30,
62
+ sample_rate: int = 44100
63
+ ) -> str:
64
+ """
65
+ Synthesize vocals from lyrics
66
+
67
+ Args:
68
+ lyrics: Lyrics text to sing
69
+ duration: Target duration in seconds
70
+ sample_rate: Audio sample rate
71
+
72
+ Returns:
73
+ Path to generated vocals file
74
+ """
75
+ try:
76
+ self._initialize_model()
77
+
78
+ logger.info(f"Synthesizing vocals: {len(lyrics)} characters")
79
+
80
+ # TODO: Replace with actual Fish Speech synthesis
81
+ # vocals = self.model.synthesize(lyrics, duration=duration, sample_rate=sample_rate)
82
+
83
+ # Placeholder: Generate silence
84
+ vocals = np.zeros(int(duration * sample_rate), dtype=np.float32)
85
+
86
+ # Save to file
87
+ output_dir = os.path.join('outputs', 'vocals')
88
+ os.makedirs(output_dir, exist_ok=True)
89
+
90
+ vocals_id = str(uuid.uuid4())
91
+ output_path = os.path.join(output_dir, f"{vocals_id}.wav")
92
+
93
+ sf.write(output_path, vocals, sample_rate)
94
+ logger.info(f"Vocals synthesized: {output_path}")
95
+
96
+ return output_path
97
+
98
+ except Exception as e:
99
+ logger.error(f"Vocal synthesis failed: {str(e)}", exc_info=True)
100
+ raise RuntimeError(f"Failed to synthesize vocals: {str(e)}")
101
+
102
+ def add_vocals(
103
+ self,
104
+ music_path: str,
105
+ lyrics: str,
106
+ duration: int = 30
107
+ ) -> str:
108
+ """
109
+ Add synthesized vocals to music track
110
+
111
+ Args:
112
+ music_path: Path to music audio file
113
+ lyrics: Lyrics to sing
114
+ duration: Duration in seconds
115
+
116
+ Returns:
117
+ Path to mixed audio file
118
+ """
119
+ try:
120
+ logger.info(f"Adding vocals to music: {music_path}")
121
+
122
+ # Load music
123
+ music_audio, sr = sf.read(music_path)
124
+
125
+ # Synthesize vocals
126
+ vocals_path = self.synthesize_vocals(lyrics, duration, sr)
127
+ vocals_audio, _ = sf.read(vocals_path)
128
+
129
+ # Mix vocals with music
130
+ # Ensure same length
131
+ min_len = min(len(music_audio), len(vocals_audio))
132
+ mixed = music_audio[:min_len] * 0.7 + vocals_audio[:min_len] * 0.3
133
+
134
+ # Save mixed audio
135
+ output_dir = os.path.join('outputs', 'mixed')
136
+ os.makedirs(output_dir, exist_ok=True)
137
+
138
+ mixed_id = str(uuid.uuid4())
139
+ output_path = os.path.join(output_dir, f"{mixed_id}.wav")
140
+
141
+ sf.write(output_path, mixed, sr)
142
+ logger.info(f"Vocals added successfully: {output_path}")
143
+
144
+ return output_path
145
+
146
+ except Exception as e:
147
+ logger.error(f"Adding vocals failed: {str(e)}", exc_info=True)
148
+ raise RuntimeError(f"Failed to add vocals: {str(e)}")
backend/services/lyricmind_service.py ADDED
@@ -0,0 +1,220 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ LyricMind AI lyrics generation service
3
+ """
4
+ import os
5
+ import logging
6
+ import torch
7
+ from typing import Optional
8
+
9
+ logger = logging.getLogger(__name__)
10
+
11
+ class LyricMindService:
12
+ """Service for LyricMind AI lyrics generation"""
13
+
14
+ def __init__(self, model_path: str):
15
+ """
16
+ Initialize LyricMind service
17
+
18
+ Args:
19
+ model_path: Path to LyricMind model files
20
+ """
21
+ self.model_path = model_path
22
+ self.model = None
23
+ self.tokenizer = None
24
+ self.is_initialized = False
25
+ self.device = self._get_device()
26
+ logger.info(f"LyricMind service created with model path: {model_path}")
27
+ logger.info(f"Using device: {self.device}")
28
+
29
+ def _get_device(self):
30
+ """Get compute device (AMD GPU via DirectML or CPU)"""
31
+ try:
32
+ from utils.amd_gpu import DEFAULT_DEVICE
33
+ return DEFAULT_DEVICE
34
+ except:
35
+ return torch.device("cpu")
36
+
37
+ def _initialize_model(self):
38
+ """Lazy load the model when first needed"""
39
+ if self.is_initialized:
40
+ return
41
+
42
+ try:
43
+ logger.info("Initializing LyricMind model...")
44
+
45
+ # Try to load text generation model as fallback
46
+ try:
47
+ from transformers import AutoTokenizer, AutoModelForCausalLM
48
+
49
+ fallback_path = os.path.join(os.path.dirname(self.model_path), "text_generator")
50
+
51
+ if os.path.exists(fallback_path):
52
+ logger.info(f"Loading text generation model from {fallback_path}")
53
+ self.tokenizer = AutoTokenizer.from_pretrained(fallback_path, trust_remote_code=True)
54
+ self.model = AutoModelForCausalLM.from_pretrained(
55
+ fallback_path,
56
+ trust_remote_code=True,
57
+ torch_dtype=torch.float32 # Use FP32 for AMD GPU compatibility
58
+ )
59
+ self.model.to(self.device)
60
+ logger.info("βœ… Text generation model loaded successfully")
61
+ else:
62
+ logger.warning("Text generation model not found, using placeholder")
63
+
64
+ except Exception as e:
65
+ logger.warning(f"Could not load text model: {str(e)}")
66
+
67
+ self.is_initialized = True
68
+ logger.info("LyricMind service initialized")
69
+
70
+ except Exception as e:
71
+ logger.error(f"Failed to initialize LyricMind model: {str(e)}", exc_info=True)
72
+ raise RuntimeError(f"Could not load LyricMind model: {str(e)}")
73
+
74
+ def generate(
75
+ self,
76
+ prompt: str,
77
+ style: Optional[str] = None,
78
+ duration: int = 30,
79
+ prompt_analysis: Optional[dict] = None
80
+ ) -> str:
81
+ """
82
+ Generate lyrics from prompt using analysis context
83
+
84
+ Args:
85
+ prompt: Description of desired lyrics theme
86
+ style: Music style (optional, will be detected if not provided)
87
+ duration: Target song duration (affects lyrics length)
88
+ prompt_analysis: Pre-computed prompt analysis (optional)
89
+
90
+ Returns:
91
+ Generated lyrics text
92
+ """
93
+ try:
94
+ self._initialize_model()
95
+
96
+ # Use prompt analysis for better context
97
+ from utils.prompt_analyzer import PromptAnalyzer
98
+
99
+ if prompt_analysis is None:
100
+ analysis = PromptAnalyzer.analyze(prompt)
101
+ else:
102
+ analysis = prompt_analysis
103
+
104
+ # Use detected genre/style if not explicitly provided
105
+ effective_style = style or analysis.get('genre', 'pop')
106
+ mood = analysis.get('mood', 'neutral')
107
+
108
+ logger.info(f"Generating lyrics: prompt='{prompt}', style={effective_style}, mood={mood}")
109
+
110
+ # Try to generate with text model
111
+ if self.model is not None and self.tokenizer is not None:
112
+ lyrics = self._generate_with_model(prompt, effective_style, duration, analysis)
113
+ else:
114
+ # Fallback: placeholder lyrics
115
+ lyrics = self._generate_placeholder(prompt, effective_style, duration)
116
+
117
+ logger.info("Lyrics generated successfully")
118
+ return lyrics
119
+
120
+ except Exception as e:
121
+ logger.error(f"Lyrics generation failed: {str(e)}", exc_info=True)
122
+ raise RuntimeError(f"Failed to generate lyrics: {str(e)}")
123
+
124
+ def _generate_with_model(self, prompt: str, style: str, duration: int, analysis: dict) -> str:
125
+ """
126
+ Generate lyrics using text generation model with analysis context
127
+
128
+ Args:
129
+ prompt: Theme prompt
130
+ style: Music style
131
+ duration: Duration in seconds
132
+ analysis: Prompt analysis with genre, mood, etc.
133
+
134
+ Returns:
135
+ Generated lyrics
136
+ """
137
+ try:
138
+ logger.info("Generating lyrics with AI model...")
139
+
140
+ # Create structured prompt with analysis context
141
+ mood = analysis.get('mood', 'neutral')
142
+ bpm = analysis.get('bpm', 120)
143
+
144
+ full_prompt = f"""Write song lyrics in {style} style about: {prompt}
145
+ Mood: {mood}
146
+ Tempo: {bpm} BPM
147
+
148
+ Lyrics:
149
+ """
150
+
151
+ # Tokenize
152
+ inputs = self.tokenizer(full_prompt, return_tensors="pt")
153
+ inputs = {k: v.to(self.device) for k, v in inputs.items()}
154
+
155
+ # Calculate max length based on duration
156
+ max_length = min(200 + inputs["input_ids"].shape[1], 512)
157
+
158
+ # Generate
159
+ with torch.no_grad():
160
+ outputs = self.model.generate(
161
+ **inputs,
162
+ max_length=max_length,
163
+ temperature=0.9,
164
+ top_p=0.95,
165
+ do_sample=True,
166
+ pad_token_id=self.tokenizer.eos_token_id
167
+ )
168
+
169
+ # Decode
170
+ generated_text = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
171
+
172
+ # Extract lyrics (remove prompt)
173
+ lyrics = generated_text.split("Lyrics:")[-1].strip()
174
+
175
+ logger.info("βœ… AI lyrics generation successful")
176
+ return lyrics if lyrics else self._generate_placeholder(prompt, style, duration)
177
+
178
+ except Exception as e:
179
+ logger.error(f"Model generation failed: {str(e)}")
180
+ return self._generate_placeholder(prompt, style, duration)
181
+
182
+ def _generate_placeholder(
183
+ self,
184
+ prompt: str,
185
+ style: str,
186
+ duration: int
187
+ ) -> str:
188
+ """
189
+ Generate placeholder lyrics for testing
190
+
191
+ Args:
192
+ prompt: Theme prompt
193
+ style: Music style
194
+ duration: Duration in seconds
195
+
196
+ Returns:
197
+ Placeholder lyrics
198
+ """
199
+ logger.warning("Using placeholder lyrics - LyricMind model not loaded")
200
+
201
+ # Estimate number of lines based on duration
202
+ lines_per_30s = 8
203
+ num_lines = int((duration / 30) * lines_per_30s)
204
+
205
+ lyrics_lines = [
206
+ f"[Verse 1]",
207
+ f"Theme: {prompt}",
208
+ f"Style: {style}",
209
+ "",
210
+ "[Chorus]",
211
+ "This is a placeholder",
212
+ "Generated by LyricMind AI",
213
+ "Replace with actual model output",
214
+ ]
215
+
216
+ # Pad to desired length
217
+ while len(lyrics_lines) < num_lines:
218
+ lyrics_lines.append("La la la...")
219
+
220
+ return "\n".join(lyrics_lines[:num_lines])
backend/services/mastering_service.py ADDED
@@ -0,0 +1,641 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Audio mastering service with industry-standard presets using Pedalboard
3
+ """
4
+ import os
5
+ import logging
6
+ import numpy as np
7
+ from pathlib import Path
8
+ from typing import Dict, List, Optional
9
+ import soundfile as sf
10
+ from pedalboard import (
11
+ Pedalboard,
12
+ Compressor,
13
+ Limiter,
14
+ Gain,
15
+ HighpassFilter,
16
+ LowpassFilter,
17
+ PeakFilter,
18
+ LowShelfFilter,
19
+ HighShelfFilter,
20
+ Reverb,
21
+ Chorus,
22
+ Delay
23
+ )
24
+
25
+ logger = logging.getLogger(__name__)
26
+
27
+ class MasteringPreset:
28
+ """Mastering preset configuration"""
29
+
30
+ def __init__(self, name: str, description: str, chain: List):
31
+ self.name = name
32
+ self.description = description
33
+ self.chain = chain
34
+
35
+ class MasteringService:
36
+ """Audio mastering and EQ service"""
37
+
38
+ # Industry-standard mastering presets
39
+ PRESETS = {
40
+ # Clean/Transparent Presets
41
+ "clean_master": MasteringPreset(
42
+ "Clean Master",
43
+ "Transparent mastering with gentle compression",
44
+ [
45
+ HighpassFilter(cutoff_frequency_hz=30),
46
+ PeakFilter(cutoff_frequency_hz=100, gain_db=-1, q=0.7),
47
+ PeakFilter(cutoff_frequency_hz=3000, gain_db=0.5, q=1.0),
48
+ PeakFilter(cutoff_frequency_hz=10000, gain_db=1.0, q=0.7),
49
+ Compressor(threshold_db=-12, ratio=2.0, attack_ms=5, release_ms=100),
50
+ Limiter(threshold_db=-1.0, release_ms=100)
51
+ ]
52
+ ),
53
+
54
+ "subtle_warmth": MasteringPreset(
55
+ "Subtle Warmth",
56
+ "Gentle low-end enhancement with smooth highs",
57
+ [
58
+ HighpassFilter(cutoff_frequency_hz=25),
59
+ LowShelfFilter(cutoff_frequency_hz=100, gain_db=1.5, q=0.7),
60
+ PeakFilter(cutoff_frequency_hz=200, gain_db=0.8, q=0.5),
61
+ PeakFilter(cutoff_frequency_hz=8000, gain_db=-0.5, q=1.0),
62
+ HighShelfFilter(cutoff_frequency_hz=12000, gain_db=1.0, q=0.7),
63
+ Compressor(threshold_db=-15, ratio=2.5, attack_ms=10, release_ms=150),
64
+ Limiter(threshold_db=-0.5, release_ms=100)
65
+ ]
66
+ ),
67
+
68
+ # Pop/Commercial Presets
69
+ "modern_pop": MasteringPreset(
70
+ "Modern Pop",
71
+ "Radio-ready pop sound with punchy compression",
72
+ [
73
+ HighpassFilter(cutoff_frequency_hz=35),
74
+ PeakFilter(cutoff_frequency_hz=80, gain_db=-1.5, q=0.8),
75
+ LowShelfFilter(cutoff_frequency_hz=120, gain_db=2.0, q=0.7),
76
+ PeakFilter(cutoff_frequency_hz=2500, gain_db=1.5, q=1.2),
77
+ HighShelfFilter(cutoff_frequency_hz=10000, gain_db=2.5, q=0.7),
78
+ Compressor(threshold_db=-10, ratio=4.0, attack_ms=3, release_ms=80),
79
+ Limiter(threshold_db=-0.3, release_ms=50)
80
+ ]
81
+ ),
82
+
83
+ "radio_ready": MasteringPreset(
84
+ "Radio Ready",
85
+ "Maximum loudness for commercial radio",
86
+ [
87
+ HighpassFilter(cutoff_frequency_hz=40),
88
+ PeakFilter(cutoff_frequency_hz=60, gain_db=-2.0, q=1.0),
89
+ LowShelfFilter(cutoff_frequency_hz=150, gain_db=1.5, q=0.8),
90
+ PeakFilter(cutoff_frequency_hz=3000, gain_db=2.0, q=1.5),
91
+ PeakFilter(cutoff_frequency_hz=8000, gain_db=1.5, q=1.0),
92
+ HighShelfFilter(cutoff_frequency_hz=12000, gain_db=3.0, q=0.7),
93
+ Compressor(threshold_db=-8, ratio=6.0, attack_ms=2, release_ms=60),
94
+ Limiter(threshold_db=-0.1, release_ms=30)
95
+ ]
96
+ ),
97
+
98
+ "punchy_commercial": MasteringPreset(
99
+ "Punchy Commercial",
100
+ "Aggressive punch for mainstream appeal",
101
+ [
102
+ HighpassFilter(cutoff_frequency_hz=30),
103
+ PeakFilter(cutoff_frequency_hz=100, gain_db=-2.0, q=1.2),
104
+ LowShelfFilter(cutoff_frequency_hz=200, gain_db=2.5, q=0.7),
105
+ PeakFilter(cutoff_frequency_hz=1000, gain_db=-1.0, q=0.8),
106
+ PeakFilter(cutoff_frequency_hz=4000, gain_db=2.5, q=1.5),
107
+ HighShelfFilter(cutoff_frequency_hz=10000, gain_db=2.0, q=0.8),
108
+ Compressor(threshold_db=-9, ratio=5.0, attack_ms=1, release_ms=50),
109
+ Limiter(threshold_db=-0.2, release_ms=40)
110
+ ]
111
+ ),
112
+
113
+ # Rock/Alternative Presets
114
+ "rock_master": MasteringPreset(
115
+ "Rock Master",
116
+ "Powerful rock sound with emphasis on mids",
117
+ [
118
+ HighpassFilter(cutoff_frequency_hz=35),
119
+ LowShelfFilter(cutoff_frequency_hz=100, gain_db=1.0, q=0.7),
120
+ PeakFilter(cutoff_frequency_hz=400, gain_db=1.5, q=1.0),
121
+ PeakFilter(cutoff_frequency_hz=2000, gain_db=2.0, q=1.2),
122
+ PeakFilter(cutoff_frequency_hz=5000, gain_db=1.5, q=1.0),
123
+ HighShelfFilter(cutoff_frequency_hz=8000, gain_db=1.0, q=0.8),
124
+ Compressor(threshold_db=-12, ratio=3.5, attack_ms=5, release_ms=120),
125
+ Limiter(threshold_db=-0.5, release_ms=80)
126
+ ]
127
+ ),
128
+
129
+ "metal_aggressive": MasteringPreset(
130
+ "Metal Aggressive",
131
+ "Heavy, aggressive metal mastering",
132
+ [
133
+ HighpassFilter(cutoff_frequency_hz=40),
134
+ PeakFilter(cutoff_frequency_hz=80, gain_db=-1.5, q=1.0),
135
+ LowShelfFilter(cutoff_frequency_hz=150, gain_db=2.0, q=0.8),
136
+ PeakFilter(cutoff_frequency_hz=800, gain_db=-1.5, q=1.2),
137
+ PeakFilter(cutoff_frequency_hz=3000, gain_db=3.0, q=1.5),
138
+ HighShelfFilter(cutoff_frequency_hz=10000, gain_db=2.5, q=0.7),
139
+ Compressor(threshold_db=-8, ratio=6.0, attack_ms=1, release_ms=50),
140
+ Limiter(threshold_db=-0.1, release_ms=30)
141
+ ]
142
+ ),
143
+
144
+ "indie_rock": MasteringPreset(
145
+ "Indie Rock",
146
+ "Lo-fi character with mid presence",
147
+ [
148
+ HighpassFilter(cutoff_frequency_hz=30),
149
+ LowShelfFilter(cutoff_frequency_hz=120, gain_db=0.5, q=0.7),
150
+ PeakFilter(cutoff_frequency_hz=500, gain_db=1.5, q=1.0),
151
+ PeakFilter(cutoff_frequency_hz=2500, gain_db=2.0, q=1.2),
152
+ PeakFilter(cutoff_frequency_hz=7000, gain_db=-0.5, q=1.0),
153
+ HighShelfFilter(cutoff_frequency_hz=10000, gain_db=0.5, q=0.8),
154
+ Compressor(threshold_db=-14, ratio=3.0, attack_ms=8, release_ms=150),
155
+ Limiter(threshold_db=-0.8, release_ms=100)
156
+ ]
157
+ ),
158
+
159
+ # Electronic/EDM Presets
160
+ "edm_club": MasteringPreset(
161
+ "EDM Club",
162
+ "Powerful club sound with deep bass",
163
+ [
164
+ HighpassFilter(cutoff_frequency_hz=25),
165
+ LowShelfFilter(cutoff_frequency_hz=80, gain_db=3.0, q=0.7),
166
+ PeakFilter(cutoff_frequency_hz=150, gain_db=2.0, q=0.8),
167
+ PeakFilter(cutoff_frequency_hz=1000, gain_db=-1.5, q=1.0),
168
+ PeakFilter(cutoff_frequency_hz=5000, gain_db=2.0, q=1.2),
169
+ HighShelfFilter(cutoff_frequency_hz=12000, gain_db=3.0, q=0.7),
170
+ Compressor(threshold_db=-6, ratio=8.0, attack_ms=0.5, release_ms=40),
171
+ Limiter(threshold_db=0.0, release_ms=20)
172
+ ]
173
+ ),
174
+
175
+ "house_groovy": MasteringPreset(
176
+ "House Groovy",
177
+ "Smooth house music with rolling bass",
178
+ [
179
+ HighpassFilter(cutoff_frequency_hz=30),
180
+ LowShelfFilter(cutoff_frequency_hz=100, gain_db=2.5, q=0.7),
181
+ PeakFilter(cutoff_frequency_hz=250, gain_db=1.0, q=0.8),
182
+ PeakFilter(cutoff_frequency_hz=2000, gain_db=0.5, q=1.0),
183
+ PeakFilter(cutoff_frequency_hz=8000, gain_db=1.5, q=1.0),
184
+ HighShelfFilter(cutoff_frequency_hz=10000, gain_db=2.0, q=0.7),
185
+ Compressor(threshold_db=-10, ratio=4.0, attack_ms=2, release_ms=60),
186
+ Limiter(threshold_db=-0.2, release_ms=40)
187
+ ]
188
+ ),
189
+
190
+ "techno_dark": MasteringPreset(
191
+ "Techno Dark",
192
+ "Dark, pounding techno master",
193
+ [
194
+ HighpassFilter(cutoff_frequency_hz=35),
195
+ PeakFilter(cutoff_frequency_hz=60, gain_db=2.0, q=1.0),
196
+ LowShelfFilter(cutoff_frequency_hz=120, gain_db=1.5, q=0.8),
197
+ PeakFilter(cutoff_frequency_hz=800, gain_db=-2.0, q=1.5),
198
+ PeakFilter(cutoff_frequency_hz=4000, gain_db=1.0, q=1.0),
199
+ HighShelfFilter(cutoff_frequency_hz=10000, gain_db=-0.5, q=0.8),
200
+ Compressor(threshold_db=-8, ratio=6.0, attack_ms=1, release_ms=50),
201
+ Limiter(threshold_db=-0.1, release_ms=30)
202
+ ]
203
+ ),
204
+
205
+ "dubstep_heavy": MasteringPreset(
206
+ "Dubstep Heavy",
207
+ "Sub-bass focused with crispy highs",
208
+ [
209
+ HighpassFilter(cutoff_frequency_hz=20),
210
+ PeakFilter(cutoff_frequency_hz=50, gain_db=3.5, q=1.2),
211
+ LowShelfFilter(cutoff_frequency_hz=100, gain_db=2.5, q=0.8),
212
+ PeakFilter(cutoff_frequency_hz=500, gain_db=-2.0, q=1.5),
213
+ PeakFilter(cutoff_frequency_hz=6000, gain_db=2.5, q=1.2),
214
+ HighShelfFilter(cutoff_frequency_hz=12000, gain_db=3.5, q=0.7),
215
+ Compressor(threshold_db=-6, ratio=10.0, attack_ms=0.3, release_ms=30),
216
+ Limiter(threshold_db=0.0, release_ms=20)
217
+ ]
218
+ ),
219
+
220
+ # Hip-Hop/R&B Presets
221
+ "hiphop_modern": MasteringPreset(
222
+ "Hip-Hop Modern",
223
+ "Contemporary hip-hop with deep bass",
224
+ [
225
+ HighpassFilter(cutoff_frequency_hz=25),
226
+ LowShelfFilter(cutoff_frequency_hz=80, gain_db=2.5, q=0.7),
227
+ PeakFilter(cutoff_frequency_hz=150, gain_db=1.5, q=0.8),
228
+ PeakFilter(cutoff_frequency_hz=1000, gain_db=-1.0, q=1.0),
229
+ PeakFilter(cutoff_frequency_hz=3500, gain_db=2.0, q=1.2),
230
+ HighShelfFilter(cutoff_frequency_hz=10000, gain_db=1.5, q=0.7),
231
+ Compressor(threshold_db=-10, ratio=4.0, attack_ms=5, release_ms=80),
232
+ Limiter(threshold_db=-0.3, release_ms=60)
233
+ ]
234
+ ),
235
+
236
+ "trap_808": MasteringPreset(
237
+ "Trap 808",
238
+ "808-focused trap mastering",
239
+ [
240
+ HighpassFilter(cutoff_frequency_hz=20),
241
+ PeakFilter(cutoff_frequency_hz=50, gain_db=3.0, q=1.0),
242
+ LowShelfFilter(cutoff_frequency_hz=100, gain_db=2.0, q=0.7),
243
+ PeakFilter(cutoff_frequency_hz=800, gain_db=-1.5, q=1.2),
244
+ PeakFilter(cutoff_frequency_hz=5000, gain_db=2.5, q=1.2),
245
+ HighShelfFilter(cutoff_frequency_hz=12000, gain_db=2.0, q=0.7),
246
+ Compressor(threshold_db=-8, ratio=5.0, attack_ms=3, release_ms=60),
247
+ Limiter(threshold_db=-0.2, release_ms=40)
248
+ ]
249
+ ),
250
+
251
+ "rnb_smooth": MasteringPreset(
252
+ "R&B Smooth",
253
+ "Silky smooth R&B sound",
254
+ [
255
+ HighpassFilter(cutoff_frequency_hz=30),
256
+ LowShelfFilter(cutoff_frequency_hz=100, gain_db=1.5, q=0.7),
257
+ PeakFilter(cutoff_frequency_hz=300, gain_db=1.0, q=0.8),
258
+ PeakFilter(cutoff_frequency_hz=2000, gain_db=0.5, q=1.0),
259
+ PeakFilter(cutoff_frequency_hz=6000, gain_db=1.5, q=1.0),
260
+ HighShelfFilter(cutoff_frequency_hz=10000, gain_db=2.0, q=0.7),
261
+ Compressor(threshold_db=-12, ratio=3.0, attack_ms=8, release_ms=120),
262
+ Limiter(threshold_db=-0.5, release_ms=80)
263
+ ]
264
+ ),
265
+
266
+ # Acoustic/Organic Presets
267
+ "acoustic_natural": MasteringPreset(
268
+ "Acoustic Natural",
269
+ "Natural, transparent acoustic sound",
270
+ [
271
+ HighpassFilter(cutoff_frequency_hz=25),
272
+ LowShelfFilter(cutoff_frequency_hz=100, gain_db=0.5, q=0.7),
273
+ PeakFilter(cutoff_frequency_hz=500, gain_db=0.8, q=0.8),
274
+ PeakFilter(cutoff_frequency_hz=3000, gain_db=1.0, q=1.0),
275
+ HighShelfFilter(cutoff_frequency_hz=8000, gain_db=1.5, q=0.7),
276
+ Compressor(threshold_db=-16, ratio=2.0, attack_ms=15, release_ms=200),
277
+ Limiter(threshold_db=-1.0, release_ms=120)
278
+ ]
279
+ ),
280
+
281
+ "folk_warm": MasteringPreset(
282
+ "Folk Warm",
283
+ "Warm, intimate folk sound",
284
+ [
285
+ HighpassFilter(cutoff_frequency_hz=30),
286
+ LowShelfFilter(cutoff_frequency_hz=150, gain_db=1.0, q=0.7),
287
+ PeakFilter(cutoff_frequency_hz=400, gain_db=1.5, q=0.8),
288
+ PeakFilter(cutoff_frequency_hz=2500, gain_db=1.0, q=1.0),
289
+ PeakFilter(cutoff_frequency_hz=7000, gain_db=-0.5, q=1.0),
290
+ HighShelfFilter(cutoff_frequency_hz=10000, gain_db=1.0, q=0.8),
291
+ Compressor(threshold_db=-18, ratio=2.5, attack_ms=20, release_ms=250),
292
+ Limiter(threshold_db=-1.5, release_ms=150)
293
+ ]
294
+ ),
295
+
296
+ "jazz_vintage": MasteringPreset(
297
+ "Jazz Vintage",
298
+ "Classic jazz warmth and space",
299
+ [
300
+ HighpassFilter(cutoff_frequency_hz=35),
301
+ LowShelfFilter(cutoff_frequency_hz=120, gain_db=1.0, q=0.7),
302
+ PeakFilter(cutoff_frequency_hz=500, gain_db=1.0, q=0.8),
303
+ PeakFilter(cutoff_frequency_hz=2000, gain_db=0.5, q=0.8),
304
+ PeakFilter(cutoff_frequency_hz=8000, gain_db=-1.0, q=1.0),
305
+ HighShelfFilter(cutoff_frequency_hz=12000, gain_db=0.5, q=0.8),
306
+ Compressor(threshold_db=-20, ratio=2.0, attack_ms=25, release_ms=300),
307
+ Limiter(threshold_db=-2.0, release_ms=180)
308
+ ]
309
+ ),
310
+
311
+ # Classical/Orchestral Presets
312
+ "orchestral_wide": MasteringPreset(
313
+ "Orchestral Wide",
314
+ "Wide, natural orchestral sound",
315
+ [
316
+ HighpassFilter(cutoff_frequency_hz=20),
317
+ LowShelfFilter(cutoff_frequency_hz=80, gain_db=0.5, q=0.7),
318
+ PeakFilter(cutoff_frequency_hz=300, gain_db=0.5, q=0.7),
319
+ PeakFilter(cutoff_frequency_hz=4000, gain_db=0.8, q=0.8),
320
+ HighShelfFilter(cutoff_frequency_hz=10000, gain_db=1.0, q=0.7),
321
+ Compressor(threshold_db=-24, ratio=1.5, attack_ms=30, release_ms=400),
322
+ Limiter(threshold_db=-3.0, release_ms=250)
323
+ ]
324
+ ),
325
+
326
+ "classical_concert": MasteringPreset(
327
+ "Classical Concert",
328
+ "Concert hall ambience and dynamics",
329
+ [
330
+ HighpassFilter(cutoff_frequency_hz=25),
331
+ PeakFilter(cutoff_frequency_hz=200, gain_db=0.5, q=0.7),
332
+ PeakFilter(cutoff_frequency_hz=1000, gain_db=0.3, q=0.8),
333
+ PeakFilter(cutoff_frequency_hz=6000, gain_db=0.8, q=0.8),
334
+ HighShelfFilter(cutoff_frequency_hz=12000, gain_db=0.5, q=0.7),
335
+ Compressor(threshold_db=-30, ratio=1.2, attack_ms=50, release_ms=500),
336
+ Limiter(threshold_db=-4.0, release_ms=300)
337
+ ]
338
+ ),
339
+
340
+ # Ambient/Atmospheric Presets
341
+ "ambient_spacious": MasteringPreset(
342
+ "Ambient Spacious",
343
+ "Wide, spacious ambient master",
344
+ [
345
+ HighpassFilter(cutoff_frequency_hz=25),
346
+ LowShelfFilter(cutoff_frequency_hz=100, gain_db=0.5, q=0.7),
347
+ PeakFilter(cutoff_frequency_hz=500, gain_db=-0.5, q=0.8),
348
+ PeakFilter(cutoff_frequency_hz=3000, gain_db=0.5, q=1.0),
349
+ HighShelfFilter(cutoff_frequency_hz=8000, gain_db=1.5, q=0.7),
350
+ Compressor(threshold_db=-20, ratio=2.0, attack_ms=50, release_ms=400),
351
+ Limiter(threshold_db=-2.0, release_ms=200)
352
+ ]
353
+ ),
354
+
355
+ "cinematic_epic": MasteringPreset(
356
+ "Cinematic Epic",
357
+ "Big, powerful cinematic sound",
358
+ [
359
+ HighpassFilter(cutoff_frequency_hz=30),
360
+ LowShelfFilter(cutoff_frequency_hz=100, gain_db=2.0, q=0.7),
361
+ PeakFilter(cutoff_frequency_hz=250, gain_db=1.0, q=0.8),
362
+ PeakFilter(cutoff_frequency_hz=2000, gain_db=1.5, q=1.0),
363
+ PeakFilter(cutoff_frequency_hz=6000, gain_db=2.0, q=1.0),
364
+ HighShelfFilter(cutoff_frequency_hz=10000, gain_db=2.5, q=0.7),
365
+ Compressor(threshold_db=-14, ratio=3.0, attack_ms=10, release_ms=150),
366
+ Limiter(threshold_db=-0.5, release_ms=100)
367
+ ]
368
+ ),
369
+
370
+ # Vintage/Lo-Fi Presets
371
+ "lofi_chill": MasteringPreset(
372
+ "Lo-Fi Chill",
373
+ "Vintage lo-fi character",
374
+ [
375
+ HighpassFilter(cutoff_frequency_hz=50),
376
+ LowpassFilter(cutoff_frequency_hz=10000),
377
+ LowShelfFilter(cutoff_frequency_hz=150, gain_db=1.5, q=0.7),
378
+ PeakFilter(cutoff_frequency_hz=800, gain_db=-1.0, q=1.2),
379
+ PeakFilter(cutoff_frequency_hz=4000, gain_db=-1.5, q=1.0),
380
+ Compressor(threshold_db=-12, ratio=3.0, attack_ms=15, release_ms=180),
381
+ Limiter(threshold_db=-1.0, release_ms=120)
382
+ ]
383
+ ),
384
+
385
+ "vintage_vinyl": MasteringPreset(
386
+ "Vintage Vinyl",
387
+ "Classic vinyl record warmth",
388
+ [
389
+ HighpassFilter(cutoff_frequency_hz=40),
390
+ LowpassFilter(cutoff_frequency_hz=12000),
391
+ LowShelfFilter(cutoff_frequency_hz=120, gain_db=2.0, q=0.7),
392
+ PeakFilter(cutoff_frequency_hz=1000, gain_db=-0.5, q=0.8),
393
+ PeakFilter(cutoff_frequency_hz=5000, gain_db=-1.0, q=1.0),
394
+ HighShelfFilter(cutoff_frequency_hz=8000, gain_db=-1.5, q=0.8),
395
+ Compressor(threshold_db=-16, ratio=2.5, attack_ms=20, release_ms=200),
396
+ Limiter(threshold_db=-1.5, release_ms=150)
397
+ ]
398
+ ),
399
+
400
+ "retro_80s": MasteringPreset(
401
+ "Retro 80s",
402
+ "80s digital warmth and punch",
403
+ [
404
+ HighpassFilter(cutoff_frequency_hz=35),
405
+ LowShelfFilter(cutoff_frequency_hz=100, gain_db=1.5, q=0.7),
406
+ PeakFilter(cutoff_frequency_hz=800, gain_db=1.0, q=1.0),
407
+ PeakFilter(cutoff_frequency_hz=3000, gain_db=2.0, q=1.2),
408
+ PeakFilter(cutoff_frequency_hz=8000, gain_db=1.5, q=1.0),
409
+ HighShelfFilter(cutoff_frequency_hz=10000, gain_db=1.0, q=0.8),
410
+ Compressor(threshold_db=-10, ratio=4.0, attack_ms=5, release_ms=100),
411
+ Limiter(threshold_db=-0.5, release_ms=80)
412
+ ]
413
+ ),
414
+
415
+ # Specialized Presets
416
+ "vocal_focused": MasteringPreset(
417
+ "Vocal Focused",
418
+ "Emphasizes vocal clarity and presence",
419
+ [
420
+ HighpassFilter(cutoff_frequency_hz=30),
421
+ PeakFilter(cutoff_frequency_hz=200, gain_db=-1.0, q=0.8),
422
+ PeakFilter(cutoff_frequency_hz=1000, gain_db=1.0, q=1.0),
423
+ PeakFilter(cutoff_frequency_hz=3000, gain_db=2.5, q=1.2),
424
+ PeakFilter(cutoff_frequency_hz=5000, gain_db=1.5, q=1.0),
425
+ HighShelfFilter(cutoff_frequency_hz=10000, gain_db=1.0, q=0.7),
426
+ Compressor(threshold_db=-12, ratio=3.0, attack_ms=5, release_ms=100),
427
+ Limiter(threshold_db=-0.5, release_ms=80)
428
+ ]
429
+ ),
430
+
431
+ "bass_heavy": MasteringPreset(
432
+ "Bass Heavy",
433
+ "Maximum low-end power",
434
+ [
435
+ HighpassFilter(cutoff_frequency_hz=20),
436
+ LowShelfFilter(cutoff_frequency_hz=60, gain_db=4.0, q=0.7),
437
+ PeakFilter(cutoff_frequency_hz=100, gain_db=2.5, q=0.8),
438
+ PeakFilter(cutoff_frequency_hz=500, gain_db=-1.5, q=1.0),
439
+ PeakFilter(cutoff_frequency_hz=4000, gain_db=1.0, q=1.0),
440
+ HighShelfFilter(cutoff_frequency_hz=10000, gain_db=1.5, q=0.7),
441
+ Compressor(threshold_db=-10, ratio=4.0, attack_ms=10, release_ms=100),
442
+ Limiter(threshold_db=-0.3, release_ms=60)
443
+ ]
444
+ ),
445
+
446
+ "bright_airy": MasteringPreset(
447
+ "Bright & Airy",
448
+ "Crystal clear highs with airiness",
449
+ [
450
+ HighpassFilter(cutoff_frequency_hz=30),
451
+ LowShelfFilter(cutoff_frequency_hz=100, gain_db=-0.5, q=0.7),
452
+ PeakFilter(cutoff_frequency_hz=500, gain_db=-1.0, q=0.8),
453
+ PeakFilter(cutoff_frequency_hz=5000, gain_db=2.0, q=1.0),
454
+ PeakFilter(cutoff_frequency_hz=10000, gain_db=2.5, q=1.0),
455
+ HighShelfFilter(cutoff_frequency_hz=12000, gain_db=3.0, q=0.7),
456
+ Compressor(threshold_db=-14, ratio=2.5, attack_ms=8, release_ms=120),
457
+ Limiter(threshold_db=-0.8, release_ms=100)
458
+ ]
459
+ ),
460
+
461
+ "midrange_punch": MasteringPreset(
462
+ "Midrange Punch",
463
+ "Powerful mids for presence",
464
+ [
465
+ HighpassFilter(cutoff_frequency_hz=30),
466
+ LowShelfFilter(cutoff_frequency_hz=100, gain_db=0.5, q=0.7),
467
+ PeakFilter(cutoff_frequency_hz=500, gain_db=2.0, q=1.0),
468
+ PeakFilter(cutoff_frequency_hz=1500, gain_db=2.5, q=1.2),
469
+ PeakFilter(cutoff_frequency_hz=3000, gain_db=2.0, q=1.0),
470
+ HighShelfFilter(cutoff_frequency_hz=8000, gain_db=0.5, q=0.7),
471
+ Compressor(threshold_db=-11, ratio=3.5, attack_ms=5, release_ms=90),
472
+ Limiter(threshold_db=-0.5, release_ms=70)
473
+ ]
474
+ ),
475
+
476
+ "dynamic_range": MasteringPreset(
477
+ "Dynamic Range",
478
+ "Preserves maximum dynamics",
479
+ [
480
+ HighpassFilter(cutoff_frequency_hz=25),
481
+ PeakFilter(cutoff_frequency_hz=100, gain_db=-0.5, q=0.7),
482
+ PeakFilter(cutoff_frequency_hz=3000, gain_db=0.5, q=0.8),
483
+ HighShelfFilter(cutoff_frequency_hz=10000, gain_db=1.0, q=0.7),
484
+ Compressor(threshold_db=-20, ratio=1.5, attack_ms=20, release_ms=250),
485
+ Limiter(threshold_db=-2.0, release_ms=200)
486
+ ]
487
+ ),
488
+
489
+ "streaming_optimized": MasteringPreset(
490
+ "Streaming Optimized",
491
+ "Optimized for streaming platforms (Spotify, Apple Music)",
492
+ [
493
+ HighpassFilter(cutoff_frequency_hz=30),
494
+ LowShelfFilter(cutoff_frequency_hz=100, gain_db=1.0, q=0.7),
495
+ PeakFilter(cutoff_frequency_hz=500, gain_db=0.5, q=0.8),
496
+ PeakFilter(cutoff_frequency_hz=3000, gain_db=1.5, q=1.0),
497
+ HighShelfFilter(cutoff_frequency_hz=10000, gain_db=1.5, q=0.7),
498
+ Compressor(threshold_db=-14, ratio=3.0, attack_ms=5, release_ms=100),
499
+ Limiter(threshold_db=-1.0, release_ms=100)
500
+ ]
501
+ )
502
+ }
503
+
504
+ def __init__(self):
505
+ """Initialize mastering service"""
506
+ logger.info("Mastering service initialized with 32 presets")
507
+
508
+ def apply_preset(self, audio_path: str, preset_name: str, output_path: str) -> str:
509
+ """
510
+ Apply mastering preset to audio file
511
+
512
+ Args:
513
+ audio_path: Path to input audio file
514
+ preset_name: Name of preset to apply
515
+ output_path: Path to save processed audio
516
+
517
+ Returns:
518
+ Path to processed audio file
519
+ """
520
+ try:
521
+ if preset_name not in self.PRESETS:
522
+ raise ValueError(f"Unknown preset: {preset_name}")
523
+
524
+ preset = self.PRESETS[preset_name]
525
+ logger.info(f"Applying preset '{preset.name}' to {audio_path}")
526
+
527
+ # Load audio
528
+ audio, sr = sf.read(audio_path)
529
+
530
+ # Ensure stereo
531
+ if len(audio.shape) == 1:
532
+ audio = np.stack([audio, audio], axis=1)
533
+
534
+ # Create pedalboard with preset chain
535
+ board = Pedalboard(preset.chain)
536
+
537
+ # Process audio
538
+ processed = board(audio.T, sr)
539
+
540
+ # Save processed audio
541
+ sf.write(output_path, processed.T, sr)
542
+ logger.info(f"Saved mastered audio to {output_path}")
543
+
544
+ return output_path
545
+
546
+ except Exception as e:
547
+ logger.error(f"Error applying preset: {str(e)}", exc_info=True)
548
+ raise
549
+
550
+ def apply_custom_eq(
551
+ self,
552
+ audio_path: str,
553
+ output_path: str,
554
+ eq_bands: List[Dict],
555
+ compression: Optional[Dict] = None,
556
+ limiting: Optional[Dict] = None
557
+ ) -> str:
558
+ """
559
+ Apply custom EQ settings to audio file
560
+
561
+ Args:
562
+ audio_path: Path to input audio file
563
+ output_path: Path to save processed audio
564
+ eq_bands: List of EQ band settings
565
+ compression: Compression settings (optional)
566
+ limiting: Limiter settings (optional)
567
+
568
+ Returns:
569
+ Path to processed audio file
570
+ """
571
+ try:
572
+ logger.info(f"Applying custom EQ to {audio_path}")
573
+
574
+ # Load audio
575
+ audio, sr = sf.read(audio_path)
576
+
577
+ # Ensure stereo
578
+ if len(audio.shape) == 1:
579
+ audio = np.stack([audio, audio], axis=1)
580
+
581
+ # Build processing chain
582
+ chain = []
583
+
584
+ # Add EQ bands
585
+ for band in eq_bands:
586
+ band_type = band.get('type', 'peak')
587
+ freq = band.get('frequency', 1000)
588
+ gain = band.get('gain', 0)
589
+ q = band.get('q', 1.0)
590
+
591
+ if band_type == 'highpass':
592
+ chain.append(HighpassFilter(cutoff_frequency_hz=freq))
593
+ elif band_type == 'lowpass':
594
+ chain.append(LowpassFilter(cutoff_frequency_hz=freq))
595
+ elif band_type == 'lowshelf':
596
+ chain.append(LowShelfFilter(cutoff_frequency_hz=freq, gain_db=gain, q=q))
597
+ elif band_type == 'highshelf':
598
+ chain.append(HighShelfFilter(cutoff_frequency_hz=freq, gain_db=gain, q=q))
599
+ else: # peak
600
+ chain.append(PeakFilter(cutoff_frequency_hz=freq, gain_db=gain, q=q))
601
+
602
+ # Add compression if specified
603
+ if compression:
604
+ chain.append(Compressor(
605
+ threshold_db=compression.get('threshold', -12),
606
+ ratio=compression.get('ratio', 2.0),
607
+ attack_ms=compression.get('attack', 5),
608
+ release_ms=compression.get('release', 100)
609
+ ))
610
+
611
+ # Add limiting if specified
612
+ if limiting:
613
+ chain.append(Limiter(
614
+ threshold_db=limiting.get('threshold', -1.0),
615
+ release_ms=limiting.get('release', 100)
616
+ ))
617
+
618
+ # Create and apply pedalboard
619
+ board = Pedalboard(chain)
620
+ processed = board(audio.T, sr)
621
+
622
+ # Save processed audio
623
+ sf.write(output_path, processed.T, sr)
624
+ logger.info(f"Saved custom EQ audio to {output_path}")
625
+
626
+ return output_path
627
+
628
+ except Exception as e:
629
+ logger.error(f"Error applying custom EQ: {str(e)}", exc_info=True)
630
+ raise
631
+
632
+ def get_preset_list(self) -> List[Dict]:
633
+ """Get list of available presets with descriptions"""
634
+ return [
635
+ {
636
+ 'id': key,
637
+ 'name': preset.name,
638
+ 'description': preset.description
639
+ }
640
+ for key, preset in self.PRESETS.items()
641
+ ]
backend/services/style_consistency_service.py ADDED
@@ -0,0 +1,340 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Style Consistency Service
3
+ Uses audio feature extraction and style embeddings to ensure consistent generation
4
+ """
5
+ import os
6
+ import logging
7
+ import numpy as np
8
+ import librosa
9
+ import soundfile as sf
10
+ from pathlib import Path
11
+ from typing import List, Optional, Dict, Tuple
12
+ import torch
13
+
14
+ logger = logging.getLogger(__name__)
15
+
16
+ class StyleConsistencyService:
17
+ """
18
+ Ensures style consistency across generated clips by analyzing existing audio
19
+ and providing style guidance for new generations
20
+ """
21
+
22
+ def __init__(self):
23
+ self.sample_rate = 44100
24
+ logger.info("Style Consistency Service initialized")
25
+
26
+ def extract_audio_features(self, audio_path: str) -> Dict[str, np.ndarray]:
27
+ """
28
+ Extract comprehensive audio features for style analysis
29
+
30
+ Args:
31
+ audio_path: Path to audio file
32
+
33
+ Returns:
34
+ Dictionary of extracted features
35
+ """
36
+ try:
37
+ # Load audio
38
+ audio, sr = librosa.load(audio_path, sr=self.sample_rate)
39
+
40
+ # Extract features
41
+ features = {}
42
+
43
+ # Spectral features
44
+ features['mel_spectrogram'] = librosa.feature.melspectrogram(
45
+ y=audio, sr=sr, n_mels=128, n_fft=2048, hop_length=512
46
+ )
47
+ features['spectral_centroid'] = librosa.feature.spectral_centroid(
48
+ y=audio, sr=sr, n_fft=2048, hop_length=512
49
+ )
50
+ features['spectral_bandwidth'] = librosa.feature.spectral_bandwidth(
51
+ y=audio, sr=sr, n_fft=2048, hop_length=512
52
+ )
53
+ features['spectral_contrast'] = librosa.feature.spectral_contrast(
54
+ y=audio, sr=sr, n_fft=2048, hop_length=512, n_bands=6
55
+ )
56
+ features['spectral_rolloff'] = librosa.feature.spectral_rolloff(
57
+ y=audio, sr=sr, n_fft=2048, hop_length=512
58
+ )
59
+
60
+ # Temporal features
61
+ features['zero_crossing_rate'] = librosa.feature.zero_crossing_rate(
62
+ audio, frame_length=2048, hop_length=512
63
+ )
64
+ features['rms'] = librosa.feature.rms(
65
+ y=audio, frame_length=2048, hop_length=512
66
+ )
67
+
68
+ # Harmonic/percussive
69
+ harmonic, percussive = librosa.effects.hpss(audio)
70
+ features['harmonic_ratio'] = np.mean(np.abs(harmonic)) / (np.mean(np.abs(audio)) + 1e-10)
71
+ features['percussive_ratio'] = np.mean(np.abs(percussive)) / (np.mean(np.abs(audio)) + 1e-10)
72
+
73
+ # Chroma features
74
+ features['chroma'] = librosa.feature.chroma_stft(
75
+ y=audio, sr=sr, n_chroma=12, n_fft=2048, hop_length=512
76
+ )
77
+
78
+ # MFCC
79
+ features['mfcc'] = librosa.feature.mfcc(
80
+ y=audio, sr=sr, n_mfcc=20
81
+ )
82
+
83
+ # Tempo and rhythm
84
+ tempo, beats = librosa.beat.beat_track(y=audio, sr=sr)
85
+ features['tempo'] = tempo
86
+ features['beat_frames'] = beats
87
+
88
+ logger.info(f"Extracted features from {audio_path}")
89
+ return features
90
+
91
+ except Exception as e:
92
+ logger.error(f"Failed to extract features from {audio_path}: {e}")
93
+ return {}
94
+
95
+ def compute_style_statistics(self, features: Dict[str, np.ndarray]) -> Dict[str, float]:
96
+ """
97
+ Compute statistical summaries of audio features for style matching
98
+
99
+ Args:
100
+ features: Dictionary of extracted features
101
+
102
+ Returns:
103
+ Dictionary of style statistics
104
+ """
105
+ stats = {}
106
+
107
+ # Compute mean/std for spectral features
108
+ for key in ['spectral_centroid', 'spectral_bandwidth', 'spectral_rolloff',
109
+ 'zero_crossing_rate', 'rms']:
110
+ if key in features:
111
+ stats[f'{key}_mean'] = float(np.mean(features[key]))
112
+ stats[f'{key}_std'] = float(np.std(features[key]))
113
+
114
+ # Spectral contrast summary
115
+ if 'spectral_contrast' in features:
116
+ stats['spectral_contrast_mean'] = float(np.mean(features['spectral_contrast']))
117
+ stats['spectral_contrast_std'] = float(np.std(features['spectral_contrast']))
118
+
119
+ # Harmonic/percussive balance
120
+ stats['harmonic_ratio'] = float(features.get('harmonic_ratio', 0.5))
121
+ stats['percussive_ratio'] = float(features.get('percussive_ratio', 0.5))
122
+
123
+ # Tempo
124
+ stats['tempo'] = float(features.get('tempo', 120.0))
125
+
126
+ # Chroma energy distribution
127
+ if 'chroma' in features:
128
+ chroma_mean = np.mean(features['chroma'], axis=1)
129
+ stats['chroma_energy'] = chroma_mean.tolist()
130
+
131
+ # MFCC summary (timbre)
132
+ if 'mfcc' in features:
133
+ mfcc_mean = np.mean(features['mfcc'], axis=1)
134
+ stats['timbre_signature'] = mfcc_mean[:13].tolist() # First 13 MFCCs
135
+
136
+ return stats
137
+
138
+ def analyze_timeline_style(self, clip_paths: List[str]) -> Dict[str, any]:
139
+ """
140
+ Analyze style characteristics of all clips on timeline
141
+
142
+ Args:
143
+ clip_paths: List of audio file paths from timeline
144
+
145
+ Returns:
146
+ Aggregate style profile
147
+ """
148
+ if not clip_paths:
149
+ return {}
150
+
151
+ all_features = []
152
+ all_stats = []
153
+
154
+ for path in clip_paths:
155
+ if os.path.exists(path):
156
+ features = self.extract_audio_features(path)
157
+ if features:
158
+ stats = self.compute_style_statistics(features)
159
+ all_features.append(features)
160
+ all_stats.append(stats)
161
+
162
+ if not all_stats:
163
+ return {}
164
+
165
+ # Aggregate statistics across all clips
166
+ aggregate_style = {}
167
+
168
+ # Average numerical features
169
+ numeric_keys = [k for k in all_stats[0].keys() if isinstance(all_stats[0][k], (int, float))]
170
+ for key in numeric_keys:
171
+ values = [stats[key] for stats in all_stats if key in stats]
172
+ aggregate_style[key] = float(np.mean(values))
173
+
174
+ # Average chroma and timbre
175
+ if 'chroma_energy' in all_stats[0]:
176
+ chroma_arrays = [np.array(stats['chroma_energy']) for stats in all_stats if 'chroma_energy' in stats]
177
+ if chroma_arrays:
178
+ aggregate_style['chroma_energy'] = np.mean(chroma_arrays, axis=0).tolist()
179
+
180
+ if 'timbre_signature' in all_stats[0]:
181
+ timbre_arrays = [np.array(stats['timbre_signature']) for stats in all_stats if 'timbre_signature' in stats]
182
+ if timbre_arrays:
183
+ aggregate_style['timbre_signature'] = np.mean(timbre_arrays, axis=0).tolist()
184
+
185
+ logger.info(f"Analyzed style from {len(clip_paths)} clips")
186
+ return aggregate_style
187
+
188
+ def create_style_reference_audio(self, clip_paths: List[str], output_path: str) -> str:
189
+ """
190
+ Mix all timeline clips into a single reference audio for style guidance
191
+
192
+ Args:
193
+ clip_paths: List of audio file paths
194
+ output_path: Where to save the reference audio
195
+
196
+ Returns:
197
+ Path to created reference audio
198
+ """
199
+ if not clip_paths:
200
+ raise ValueError("No clips provided for style reference")
201
+
202
+ try:
203
+ # Load all clips and find max duration
204
+ clips_audio = []
205
+ max_length = 0
206
+
207
+ for path in clip_paths:
208
+ if os.path.exists(path):
209
+ audio, sr = librosa.load(path, sr=self.sample_rate)
210
+ clips_audio.append(audio)
211
+ max_length = max(max_length, len(audio))
212
+
213
+ if not clips_audio:
214
+ raise ValueError("No valid audio files found")
215
+
216
+ # Pad all clips to same length
217
+ padded_clips = []
218
+ for audio in clips_audio:
219
+ if len(audio) < max_length:
220
+ audio = np.pad(audio, (0, max_length - len(audio)))
221
+ padded_clips.append(audio)
222
+
223
+ # Mix clips (average them)
224
+ mixed_audio = np.mean(padded_clips, axis=0)
225
+
226
+ # Normalize
227
+ mixed_audio = librosa.util.normalize(mixed_audio)
228
+
229
+ # Save reference audio
230
+ os.makedirs(os.path.dirname(output_path), exist_ok=True)
231
+ sf.write(output_path, mixed_audio, self.sample_rate)
232
+
233
+ logger.info(f"Created style reference audio: {output_path}")
234
+ return output_path
235
+
236
+ except Exception as e:
237
+ logger.error(f"Failed to create style reference: {e}")
238
+ raise
239
+
240
+ def enhance_prompt_with_style(
241
+ self,
242
+ base_prompt: str,
243
+ style_profile: Dict[str, any]
244
+ ) -> str:
245
+ """
246
+ Enhance generation prompt with style characteristics
247
+
248
+ Args:
249
+ base_prompt: User's original prompt
250
+ style_profile: Style analysis from timeline
251
+
252
+ Returns:
253
+ Enhanced prompt
254
+ """
255
+ if not style_profile:
256
+ return base_prompt
257
+
258
+ style_descriptors = []
259
+
260
+ # Tempo descriptor
261
+ tempo = style_profile.get('tempo', 120)
262
+ if tempo < 90:
263
+ style_descriptors.append("slow tempo")
264
+ elif tempo > 140:
265
+ style_descriptors.append("fast tempo")
266
+
267
+ # Energy/dynamics descriptor
268
+ rms_mean = style_profile.get('rms_mean', 0.1)
269
+ if rms_mean > 0.15:
270
+ style_descriptors.append("energetic")
271
+ elif rms_mean < 0.08:
272
+ style_descriptors.append("gentle")
273
+
274
+ # Harmonic/percussive balance
275
+ harmonic_ratio = style_profile.get('harmonic_ratio', 0.5)
276
+ percussive_ratio = style_profile.get('percussive_ratio', 0.5)
277
+
278
+ if harmonic_ratio > percussive_ratio * 1.3:
279
+ style_descriptors.append("melodic")
280
+ elif percussive_ratio > harmonic_ratio * 1.3:
281
+ style_descriptors.append("rhythmic")
282
+
283
+ # Spectral brightness
284
+ centroid_mean = style_profile.get('spectral_centroid_mean', 2000)
285
+ if centroid_mean > 3000:
286
+ style_descriptors.append("bright")
287
+ elif centroid_mean < 1500:
288
+ style_descriptors.append("warm")
289
+
290
+ # Combine with base prompt
291
+ if style_descriptors:
292
+ enhanced = f"{base_prompt}, consistent with existing style: {', '.join(style_descriptors)}"
293
+ logger.info(f"Enhanced prompt: {enhanced}")
294
+ return enhanced
295
+
296
+ return base_prompt
297
+
298
+ def get_style_guidance_for_generation(
299
+ self,
300
+ timeline_clips: List[Dict]
301
+ ) -> Tuple[Optional[str], Dict[str, any]]:
302
+ """
303
+ Prepare style guidance for new generation
304
+
305
+ Args:
306
+ timeline_clips: List of clip dictionaries from timeline
307
+
308
+ Returns:
309
+ Tuple of (reference_audio_path, style_profile)
310
+ """
311
+ if not timeline_clips:
312
+ logger.info("No existing clips - no style guidance available")
313
+ return None, {}
314
+
315
+ # Get audio paths from clips
316
+ clip_paths = []
317
+ for clip in timeline_clips:
318
+ audio_path = clip.get('music_path') or clip.get('mixed_path') or clip.get('file_path')
319
+ if audio_path and os.path.exists(audio_path):
320
+ clip_paths.append(audio_path)
321
+
322
+ if not clip_paths:
323
+ return None, {}
324
+
325
+ # Analyze timeline style
326
+ style_profile = self.analyze_timeline_style(clip_paths)
327
+
328
+ # Create reference audio (mix of all clips)
329
+ try:
330
+ ref_dir = os.path.join('outputs', 'style_reference')
331
+ os.makedirs(ref_dir, exist_ok=True)
332
+ ref_path = os.path.join(ref_dir, 'timeline_reference.wav')
333
+
334
+ reference_audio = self.create_style_reference_audio(clip_paths, ref_path)
335
+ logger.info(f"Style guidance ready: {len(clip_paths)} clips analyzed")
336
+ return reference_audio, style_profile
337
+
338
+ except Exception as e:
339
+ logger.error(f"Failed to create reference audio: {e}")
340
+ return None, style_profile
backend/services/timeline_service.py ADDED
@@ -0,0 +1,186 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Timeline management service
3
+ """
4
+ import logging
5
+ from typing import List, Dict, Optional
6
+ from models.schemas import ClipPosition, TimelineClip
7
+
8
+ logger = logging.getLogger(__name__)
9
+
10
+ class TimelineService:
11
+ """Service for managing timeline clips"""
12
+
13
+ def __init__(self):
14
+ """Initialize timeline service"""
15
+ self.clips: List[TimelineClip] = []
16
+ logger.info("Timeline service initialized")
17
+
18
+ def add_clip(
19
+ self,
20
+ clip_id: str,
21
+ file_path: str,
22
+ duration: float,
23
+ position: ClipPosition
24
+ ) -> Dict:
25
+ """
26
+ Add a clip to the timeline
27
+
28
+ Args:
29
+ clip_id: Unique clip identifier
30
+ file_path: Path to audio file
31
+ duration: Clip duration in seconds
32
+ position: Where to place the clip
33
+
34
+ Returns:
35
+ Clip information with timeline position
36
+ """
37
+ try:
38
+ # Calculate timeline position based on requested position
39
+ if position == ClipPosition.INTRO:
40
+ timeline_position = 0
41
+ start_time = 0.0
42
+ # Shift all existing clips
43
+ for clip in self.clips:
44
+ clip.timeline_position += 1
45
+ clip.start_time += duration
46
+
47
+ elif position == ClipPosition.PREVIOUS:
48
+ if len(self.clips) == 0:
49
+ timeline_position = 0
50
+ start_time = 0.0
51
+ else:
52
+ timeline_position = len(self.clips) - 1
53
+ start_time = self.clips[-1].start_time
54
+ # Shift last clip
55
+ self.clips[-1].timeline_position += 1
56
+ self.clips[-1].start_time += duration
57
+
58
+ elif position == ClipPosition.NEXT:
59
+ timeline_position = len(self.clips)
60
+ start_time = self.get_total_duration()
61
+
62
+ else: # OUTRO
63
+ timeline_position = len(self.clips)
64
+ start_time = self.get_total_duration()
65
+
66
+ # Create clip
67
+ clip = TimelineClip(
68
+ clip_id=clip_id,
69
+ file_path=file_path,
70
+ duration=duration,
71
+ timeline_position=timeline_position,
72
+ start_time=start_time,
73
+ music_path=file_path # Store as music_path for consistent access
74
+ )
75
+
76
+ # Insert clip at correct position
77
+ self.clips.insert(timeline_position, clip)
78
+
79
+ logger.info(f"Clip added: {clip_id} at position {timeline_position}")
80
+
81
+ return {
82
+ 'clip_id': clip_id,
83
+ 'timeline_position': timeline_position,
84
+ 'start_time': start_time,
85
+ 'duration': duration
86
+ }
87
+
88
+ except Exception as e:
89
+ logger.error(f"Failed to add clip: {str(e)}", exc_info=True)
90
+ raise
91
+
92
+ def remove_clip(self, clip_id: str):
93
+ """
94
+ Remove a clip from timeline
95
+
96
+ Args:
97
+ clip_id: Clip to remove
98
+ """
99
+ try:
100
+ # Find and remove clip
101
+ clip_index = None
102
+ for i, clip in enumerate(self.clips):
103
+ if clip.clip_id == clip_id:
104
+ clip_index = i
105
+ break
106
+
107
+ if clip_index is None:
108
+ raise ValueError(f"Clip not found: {clip_id}")
109
+
110
+ removed_clip = self.clips.pop(clip_index)
111
+
112
+ # Recalculate positions
113
+ self._recalculate_positions()
114
+
115
+ logger.info(f"Clip removed: {clip_id}")
116
+
117
+ except Exception as e:
118
+ logger.error(f"Failed to remove clip: {str(e)}", exc_info=True)
119
+ raise
120
+
121
+ def reorder_clips(self, clip_ids: List[str]):
122
+ """
123
+ Reorder clips on timeline
124
+
125
+ Args:
126
+ clip_ids: New order of clip IDs
127
+ """
128
+ try:
129
+ # Validate all clip IDs exist
130
+ existing_ids = {clip.clip_id for clip in self.clips}
131
+ requested_ids = set(clip_ids)
132
+
133
+ if existing_ids != requested_ids:
134
+ raise ValueError("Clip IDs don't match existing clips")
135
+
136
+ # Create new order
137
+ clip_dict = {clip.clip_id: clip for clip in self.clips}
138
+ self.clips = [clip_dict[cid] for cid in clip_ids]
139
+
140
+ # Recalculate positions
141
+ self._recalculate_positions()
142
+
143
+ logger.info("Clips reordered")
144
+
145
+ except Exception as e:
146
+ logger.error(f"Failed to reorder clips: {str(e)}", exc_info=True)
147
+ raise
148
+
149
+ def get_all_clips(self) -> List[Dict]:
150
+ """Get all clips with their information"""
151
+ return [
152
+ {
153
+ 'clip_id': clip.clip_id,
154
+ 'file_path': clip.file_path,
155
+ 'duration': clip.duration,
156
+ 'timeline_position': clip.timeline_position,
157
+ 'start_time': clip.start_time
158
+ }
159
+ for clip in self.clips
160
+ ]
161
+
162
+ def get_clip(self, clip_id: str) -> Optional[TimelineClip]:
163
+ """Get a specific clip"""
164
+ for clip in self.clips:
165
+ if clip.clip_id == clip_id:
166
+ return clip
167
+ return None
168
+
169
+ def get_total_duration(self) -> float:
170
+ """Get total duration of all clips"""
171
+ if not self.clips:
172
+ return 0.0
173
+ return sum(clip.duration for clip in self.clips)
174
+
175
+ def clear(self):
176
+ """Clear all clips from timeline"""
177
+ self.clips = []
178
+ logger.info("Timeline cleared")
179
+
180
+ def _recalculate_positions(self):
181
+ """Recalculate all clip positions and start times"""
182
+ current_time = 0.0
183
+ for i, clip in enumerate(self.clips):
184
+ clip.timeline_position = i
185
+ clip.start_time = current_time
186
+ current_time += clip.duration
backend/start_with_env.py ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Wrapper script to start the backend with required environment variables.
3
+ This is used by the PowerShell launcher to ensure environment variables are set.
4
+ """
5
+ import os
6
+ import sys
7
+ import subprocess
8
+ from pathlib import Path
9
+
10
+ # Get project root (parent of backend directory)
11
+ project_root = Path(__file__).parent.parent
12
+
13
+ # Set required environment variables
14
+ os.environ['PHONEMIZER_ESPEAK_LIBRARY'] = str(project_root / 'external' / 'espeak-ng' / 'libespeak-ng.dll')
15
+ os.environ['PHONEMIZER_ESPEAK_PATH'] = str(project_root / 'external' / 'espeak-ng')
16
+
17
+ # Run the backend run.py script
18
+ backend_script = project_root / 'backend' / 'run.py'
19
+
20
+ # Execute run.py in the same interpreter
21
+ exec(open(backend_script).read())
backend/utils/__init__.py ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ """Utilities package"""
2
+ from .logger import setup_logger
3
+ from .validators import validate_generation_params, validate_clip_data
4
+
5
+ __all__ = ['setup_logger', 'validate_generation_params', 'validate_clip_data']
backend/utils/amd_gpu.py ADDED
@@ -0,0 +1,96 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ AMD GPU Detection and Configuration
3
+ Enables DirectML support for AMD GPUs (including Vega 8)
4
+ Note: DirectML may not be compatible with Python 3.13+ - CPU fallback will be used
5
+ """
6
+ import os
7
+ import logging
8
+ import torch
9
+
10
+ logger = logging.getLogger(__name__)
11
+
12
+ def setup_amd_gpu():
13
+ """
14
+ Configure DirectML for AMD GPU support
15
+ Returns device to use for model inference
16
+ """
17
+ try:
18
+ # Check if torch-directml is available
19
+ try:
20
+ import torch_directml
21
+
22
+ # Get DirectML device
23
+ if torch_directml.is_available():
24
+ device = torch_directml.device()
25
+ logger.info(f"βœ… AMD GPU detected via DirectML")
26
+ logger.info(f"Device: {device}")
27
+
28
+ # Set default device
29
+ torch.set_default_device(device)
30
+
31
+ return device
32
+ else:
33
+ logger.warning("DirectML available but no compatible GPU found")
34
+ return torch.device("cpu")
35
+ except ImportError:
36
+ logger.warning("torch-directml not available (may not support Python 3.13+)")
37
+ logger.info("Using CPU mode - consider Python 3.11 for DirectML support")
38
+ return torch.device("cpu")
39
+
40
+ except Exception as e:
41
+ logger.error(f"Error setting up AMD GPU: {str(e)}")
42
+ return torch.device("cpu")
43
+
44
+ def get_device_info():
45
+ """Get detailed information about available compute devices"""
46
+ info = {
47
+ "device": "cpu",
48
+ "device_name": "CPU",
49
+ "directml_available": False,
50
+ "cuda_available": torch.cuda.is_available()
51
+ }
52
+
53
+ try:
54
+ try:
55
+ import torch_directml
56
+
57
+ if torch_directml.is_available():
58
+ info["directml_available"] = True
59
+ info["device"] = "directml"
60
+ info["device_name"] = "AMD GPU (DirectML)"
61
+
62
+ # Get device
63
+ device = torch_directml.device()
64
+ info["device_object"] = device
65
+ except ImportError:
66
+ logger.info("DirectML not available - Python 3.13+ may not support torch-directml")
67
+ logger.info("For AMD GPU support, consider using Python 3.11 with torch-directml")
68
+
69
+ except Exception as e:
70
+ logger.error(f"Error getting device info: {str(e)}")
71
+
72
+ return info
73
+
74
+ def optimize_for_amd():
75
+ """Apply optimizations for AMD GPU inference"""
76
+ try:
77
+ # Disable CUDA if present (prefer DirectML for AMD)
78
+ os.environ["CUDA_VISIBLE_DEVICES"] = ""
79
+
80
+ # Set DirectML memory management
81
+ os.environ["PYTORCH_DIRECTML_FORCE_FP32_OPS"] = "0" # Allow FP16
82
+
83
+ # Enable TensorFloat-32 for better performance
84
+ torch.backends.cudnn.allow_tf32 = True
85
+ torch.backends.cuda.matmul.allow_tf32 = True
86
+
87
+ logger.info("βœ… AMD GPU optimizations applied")
88
+
89
+ except Exception as e:
90
+ logger.error(f"Error applying AMD optimizations: {str(e)}")
91
+
92
+ # Auto-configure on module import
93
+ DEFAULT_DEVICE = setup_amd_gpu()
94
+ optimize_for_amd()
95
+
96
+ logger.info(f"Default compute device: {DEFAULT_DEVICE}")
backend/utils/logger.py ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Logging configuration
3
+ """
4
+ import logging
5
+ import os
6
+ from logging.handlers import RotatingFileHandler
7
+
8
+ def setup_logger(app):
9
+ """
10
+ Configure application logging
11
+
12
+ Args:
13
+ app: Flask application instance
14
+ """
15
+ # Create logs directory
16
+ log_dir = 'logs'
17
+ os.makedirs(log_dir, exist_ok=True)
18
+
19
+ # Set log level
20
+ log_level = getattr(logging, app.config.get('LOG_LEVEL', 'INFO'))
21
+
22
+ # Configure root logger
23
+ logging.basicConfig(
24
+ level=log_level,
25
+ format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
26
+ datefmt='%Y-%m-%d %H:%M:%S'
27
+ )
28
+
29
+ # File handler with rotation
30
+ log_file = app.config.get('LOG_FILE', os.path.join(log_dir, 'app.log'))
31
+ file_handler = RotatingFileHandler(
32
+ log_file,
33
+ maxBytes=10 * 1024 * 1024, # 10MB
34
+ backupCount=5
35
+ )
36
+ file_handler.setLevel(log_level)
37
+ file_handler.setFormatter(logging.Formatter(
38
+ '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
39
+ ))
40
+
41
+ # Console handler
42
+ console_handler = logging.StreamHandler()
43
+ console_handler.setLevel(log_level)
44
+ console_handler.setFormatter(logging.Formatter(
45
+ '%(asctime)s - %(levelname)s - %(message)s'
46
+ ))
47
+
48
+ # Add handlers to app logger
49
+ app.logger.addHandler(file_handler)
50
+ app.logger.addHandler(console_handler)
51
+ app.logger.setLevel(log_level)
52
+
53
+ # Set library log levels
54
+ logging.getLogger('werkzeug').setLevel(logging.WARNING)
55
+ logging.getLogger('urllib3').setLevel(logging.WARNING)
56
+
57
+ app.logger.info("Logging configured successfully")
backend/utils/prompt_analyzer.py ADDED
@@ -0,0 +1,291 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Prompt analysis utility for extracting music attributes
3
+ Analyzes user prompts to extract genre, style, BPM, mood, and other musical attributes
4
+ """
5
+ import re
6
+ import logging
7
+ from typing import Dict, Optional, List, Any
8
+
9
+ logger = logging.getLogger(__name__)
10
+
11
+ class PromptAnalyzer:
12
+ """Analyzes music prompts to extract musical attributes"""
13
+
14
+ # Genre/style keywords
15
+ GENRES = {
16
+ 'pop': ['pop', 'mainstream', 'catchy', 'radio-friendly'],
17
+ 'rock': ['rock', 'guitar', 'electric', 'distortion', 'power chords'],
18
+ 'hip-hop': ['hip-hop', 'rap', 'trap', 'beats', 'rhymes', 'flow'],
19
+ 'electronic': ['edm', 'electronic', 'synth', 'techno', 'house', 'trance'],
20
+ 'jazz': ['jazz', 'swing', 'bebop', 'saxophone', 'improvisation'],
21
+ 'classical': ['classical', 'orchestra', 'symphony', 'piano', 'strings'],
22
+ 'country': ['country', 'folk', 'acoustic', 'banjo', 'bluegrass'],
23
+ 'r&b': ['r&b', 'soul', 'rnb', 'rhythm and blues', 'groove'],
24
+ 'metal': ['metal', 'heavy', 'headbanging', 'aggressive', 'brutal'],
25
+ 'indie': ['indie', 'alternative', 'underground', 'experimental'],
26
+ 'reggae': ['reggae', 'ska', 'dub', 'jamaican', 'offbeat'],
27
+ 'blues': ['blues', 'twelve bar', 'soulful', 'melancholic']
28
+ }
29
+
30
+ # BPM keywords and ranges
31
+ BPM_KEYWORDS = {
32
+ 'slow': (60, 80),
33
+ 'ballad': (60, 80),
34
+ 'moderate': (80, 120),
35
+ 'medium': (90, 110),
36
+ 'upbeat': (120, 140),
37
+ 'fast': (140, 180),
38
+ 'energetic': (130, 150),
39
+ 'intense': (150, 180)
40
+ }
41
+
42
+ # Mood/emotion keywords
43
+ MOODS = {
44
+ 'happy': ['happy', 'joyful', 'cheerful', 'uplifting', 'bright'],
45
+ 'sad': ['sad', 'melancholic', 'sorrowful', 'emotional', 'tearful'],
46
+ 'energetic': ['energetic', 'powerful', 'dynamic', 'intense', 'vigorous'],
47
+ 'calm': ['calm', 'peaceful', 'relaxing', 'soothing', 'tranquil'],
48
+ 'dark': ['dark', 'ominous', 'mysterious', 'sinister', 'haunting'],
49
+ 'romantic': ['romantic', 'love', 'passionate', 'tender', 'intimate'],
50
+ 'angry': ['angry', 'aggressive', 'fierce', 'furious', 'rage'],
51
+ 'nostalgic': ['nostalgic', 'reminiscent', 'wistful', 'longing']
52
+ }
53
+
54
+ # Instrumental keywords
55
+ INSTRUMENTS = [
56
+ 'guitar', 'piano', 'drums', 'bass', 'synth', 'violin', 'saxophone',
57
+ 'trumpet', 'flute', 'organ', 'keyboard', 'strings', 'brass', 'percussion'
58
+ ]
59
+
60
+ @classmethod
61
+ def analyze(cls, prompt: str) -> Dict[str, Any]:
62
+ """
63
+ Analyze a music prompt to extract attributes
64
+
65
+ Args:
66
+ prompt: User's music description
67
+
68
+ Returns:
69
+ Dictionary containing:
70
+ - genre: Detected genre(s)
71
+ - bpm: Estimated BPM or range
72
+ - mood: Detected mood(s)
73
+ - instruments: Mentioned instruments
74
+ - style_tags: Additional style descriptors
75
+ - analysis_text: Formatted analysis for AI models
76
+ """
77
+ if not prompt:
78
+ return cls._get_default_analysis()
79
+
80
+ prompt_lower = prompt.lower()
81
+
82
+ # Detect genre
83
+ detected_genres = cls._detect_genres(prompt_lower)
84
+
85
+ # Detect BPM
86
+ bpm_info = cls._detect_bpm(prompt_lower)
87
+
88
+ # Detect mood
89
+ detected_moods = cls._detect_moods(prompt_lower)
90
+
91
+ # Detect instruments
92
+ detected_instruments = cls._detect_instruments(prompt_lower)
93
+
94
+ # Extract additional style tags
95
+ style_tags = cls._extract_style_tags(prompt_lower)
96
+
97
+ # Build structured analysis
98
+ analysis = {
99
+ 'genre': detected_genres[0] if detected_genres else 'pop',
100
+ 'genres': detected_genres,
101
+ 'bpm': bpm_info['bpm'],
102
+ 'bpm_range': bpm_info['range'],
103
+ 'mood': detected_moods[0] if detected_moods else 'neutral',
104
+ 'moods': detected_moods,
105
+ 'instruments': detected_instruments,
106
+ 'style_tags': style_tags,
107
+ 'has_vocals': cls._should_have_vocals(prompt_lower),
108
+ 'analysis_text': cls._format_analysis_text(
109
+ detected_genres, bpm_info, detected_moods, detected_instruments
110
+ )
111
+ }
112
+
113
+ logger.info(f"Prompt analysis: genre={analysis['genre']}, bpm={analysis['bpm']}, mood={analysis['mood']}")
114
+
115
+ return analysis
116
+
117
+ @classmethod
118
+ def _detect_genres(cls, prompt: str) -> List[str]:
119
+ """Detect genres from prompt"""
120
+ detected = []
121
+ for genre, keywords in cls.GENRES.items():
122
+ if any(keyword in prompt for keyword in keywords):
123
+ detected.append(genre)
124
+ return detected[:3] # Top 3 genres
125
+
126
+ @classmethod
127
+ def _detect_bpm(cls, prompt: str) -> Dict[str, Any]:
128
+ """Detect BPM or BPM range from prompt"""
129
+ # Check for explicit BPM numbers
130
+ bpm_match = re.search(r'\b(\d{2,3})\s*bpm\b', prompt)
131
+ if bpm_match:
132
+ bpm_value = int(bpm_match.group(1))
133
+ return {
134
+ 'bpm': bpm_value,
135
+ 'range': (bpm_value - 5, bpm_value + 5)
136
+ }
137
+
138
+ # Check for BPM keywords
139
+ for keyword, (min_bpm, max_bpm) in cls.BPM_KEYWORDS.items():
140
+ if keyword in prompt:
141
+ return {
142
+ 'bpm': (min_bpm + max_bpm) // 2,
143
+ 'range': (min_bpm, max_bpm)
144
+ }
145
+
146
+ # Default: moderate tempo
147
+ return {'bpm': 120, 'range': (100, 140)}
148
+
149
+ @classmethod
150
+ def _detect_moods(cls, prompt: str) -> List[str]:
151
+ """Detect moods from prompt"""
152
+ detected = []
153
+ for mood, keywords in cls.MOODS.items():
154
+ if any(keyword in prompt for keyword in keywords):
155
+ detected.append(mood)
156
+ return detected[:2] # Top 2 moods
157
+
158
+ @classmethod
159
+ def _detect_instruments(cls, prompt: str) -> List[str]:
160
+ """Detect mentioned instruments"""
161
+ detected = []
162
+ for instrument in cls.INSTRUMENTS:
163
+ if instrument in prompt:
164
+ detected.append(instrument)
165
+ return detected
166
+
167
+ @classmethod
168
+ def _extract_style_tags(cls, prompt: str) -> List[str]:
169
+ """Extract additional style descriptors"""
170
+ tags = []
171
+ style_keywords = [
172
+ 'vintage', 'modern', 'retro', 'futuristic', 'minimal', 'complex',
173
+ 'acoustic', 'electric', 'orchestral', 'ambient', 'rhythmic',
174
+ 'melodic', 'harmonic', 'atmospheric', 'driving', 'groovy'
175
+ ]
176
+
177
+ for tag in style_keywords:
178
+ if tag in prompt:
179
+ tags.append(tag)
180
+
181
+ return tags
182
+
183
+ @classmethod
184
+ def _should_have_vocals(cls, prompt: str) -> bool:
185
+ """Determine if music should have vocals"""
186
+ vocal_keywords = ['vocal', 'singing', 'voice', 'lyrics', 'song', 'sung']
187
+ instrumental_keywords = ['instrumental', 'no vocals', 'no voice', 'without vocals']
188
+
189
+ has_vocal_mention = any(keyword in prompt for keyword in vocal_keywords)
190
+ has_instrumental_mention = any(keyword in prompt for keyword in instrumental_keywords)
191
+
192
+ # Default to vocals unless explicitly instrumental
193
+ if has_instrumental_mention:
194
+ return False
195
+
196
+ return True # Default to vocals
197
+
198
+ @classmethod
199
+ def _format_analysis_text(
200
+ cls,
201
+ genres: List[str],
202
+ bpm_info: Dict,
203
+ moods: List[str],
204
+ instruments: List[str]
205
+ ) -> str:
206
+ """Format analysis into text for AI model context"""
207
+ parts = []
208
+
209
+ if genres:
210
+ parts.append(f"Genre: {', '.join(genres)}")
211
+
212
+ if bpm_info.get('bpm'):
213
+ parts.append(f"BPM: {bpm_info['bpm']}")
214
+
215
+ if moods:
216
+ parts.append(f"Mood: {', '.join(moods)}")
217
+
218
+ if instruments:
219
+ parts.append(f"Instruments: {', '.join(instruments)}")
220
+
221
+ return '; '.join(parts) if parts else "General music"
222
+
223
+ @classmethod
224
+ def _get_default_analysis(cls) -> Dict[str, Any]:
225
+ """Return default analysis when prompt is empty"""
226
+ return {
227
+ 'genre': 'pop',
228
+ 'genres': ['pop'],
229
+ 'bpm': 120,
230
+ 'bpm_range': (100, 140),
231
+ 'mood': 'neutral',
232
+ 'moods': [],
233
+ 'instruments': [],
234
+ 'style_tags': [],
235
+ 'has_vocals': True,
236
+ 'analysis_text': 'General pop music at moderate tempo'
237
+ }
238
+
239
+ @classmethod
240
+ def format_for_diffrhythm(cls, prompt: str, lyrics: Optional[str] = None, analysis: Optional[Dict] = None) -> str:
241
+ """
242
+ Format prompt for DiffRhythm model
243
+
244
+ Args:
245
+ prompt: Original user prompt
246
+ lyrics: Optional lyrics
247
+ analysis: Optional pre-computed analysis
248
+
249
+ Returns:
250
+ Formatted prompt for DiffRhythm
251
+ """
252
+ if analysis is None:
253
+ analysis = cls.analyze(prompt)
254
+
255
+ parts = [prompt]
256
+
257
+ # Add analysis context
258
+ if analysis.get('analysis_text'):
259
+ parts.append(f"[{analysis['analysis_text']}]")
260
+
261
+ # Add lyrics if provided
262
+ if lyrics:
263
+ parts.append(f"Lyrics: {lyrics}")
264
+
265
+ return ' '.join(parts)
266
+
267
+ @classmethod
268
+ def format_for_lyrics_generation(cls, prompt: str, analysis: Optional[Dict] = None) -> str:
269
+ """
270
+ Format prompt for lyrics generation
271
+
272
+ Args:
273
+ prompt: Original user prompt
274
+ analysis: Optional pre-computed analysis
275
+
276
+ Returns:
277
+ Formatted prompt for LyricsMind
278
+ """
279
+ if analysis is None:
280
+ analysis = cls.analyze(prompt)
281
+
282
+ genre = analysis.get('genre', 'pop')
283
+ mood = analysis.get('mood', 'neutral')
284
+
285
+ formatted = f"Write {genre} song lyrics with a {mood} mood about: {prompt}"
286
+
287
+ # Add additional context
288
+ if analysis.get('style_tags'):
289
+ formatted += f" (Style: {', '.join(analysis['style_tags'][:2])})"
290
+
291
+ return formatted
backend/utils/validators.py ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Request validation utilities
3
+ """
4
+ from typing import Dict, Optional
5
+
6
+ def validate_generation_params(data: Dict) -> Optional[str]:
7
+ """
8
+ Validate music generation parameters
9
+
10
+ Args:
11
+ data: Request data dictionary
12
+
13
+ Returns:
14
+ Error message if validation fails, None otherwise
15
+ """
16
+ if not data:
17
+ return "Request body is required"
18
+
19
+ if 'prompt' not in data:
20
+ return "Missing required field: prompt"
21
+
22
+ if not data['prompt'] or not data['prompt'].strip():
23
+ return "Prompt cannot be empty"
24
+
25
+ if 'duration' in data:
26
+ duration = data['duration']
27
+ if not isinstance(duration, (int, float)):
28
+ return "Duration must be a number"
29
+ if duration < 10 or duration > 120:
30
+ return "Duration must be between 10 and 120 seconds"
31
+
32
+ if 'use_vocals' in data:
33
+ if not isinstance(data['use_vocals'], bool):
34
+ return "use_vocals must be a boolean"
35
+
36
+ if data['use_vocals'] and not data.get('lyrics'):
37
+ return "Lyrics are required when use_vocals is true"
38
+
39
+ return None
40
+
41
+ def validate_clip_data(data: Dict) -> Optional[str]:
42
+ """
43
+ Validate timeline clip data
44
+
45
+ Args:
46
+ data: Clip data dictionary
47
+
48
+ Returns:
49
+ Error message if validation fails, None otherwise
50
+ """
51
+ required_fields = ['clip_id', 'file_path', 'duration', 'position']
52
+
53
+ for field in required_fields:
54
+ if field not in data:
55
+ return f"Missing required field: {field}"
56
+
57
+ if not isinstance(data['duration'], (int, float)) or data['duration'] <= 0:
58
+ return "Duration must be a positive number"
59
+
60
+ valid_positions = ['intro', 'previous', 'next', 'outro']
61
+ if data['position'] not in valid_positions:
62
+ return f"Invalid position. Must be one of: {', '.join(valid_positions)}"
63
+
64
+ return None
hf_config.py ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Configuration for HuggingFace Spaces deployment
3
+ Handles espeak-ng and model paths for cloud environment
4
+ """
5
+ import os
6
+ from pathlib import Path
7
+
8
+ # Detect if running on HuggingFace Spaces
9
+ IS_SPACES = os.getenv("SPACE_ID") is not None
10
+
11
+ # Configure espeak-ng for HuggingFace Spaces
12
+ if IS_SPACES:
13
+ # On Spaces, espeak-ng is installed via packages.txt
14
+ # It's available system-wide
15
+ if os.path.exists("/usr/bin/espeak-ng"):
16
+ os.environ["PHONEMIZER_ESPEAK_PATH"] = "/usr/bin/espeak-ng"
17
+ if os.path.exists("/usr/lib/x86_64-linux-gnu/libespeak-ng.so"):
18
+ os.environ["PHONEMIZER_ESPEAK_LIBRARY"] = "/usr/lib/x86_64-linux-gnu/libespeak-ng.so"
19
+ elif os.path.exists("/usr/lib/libespeak-ng.so"):
20
+ os.environ["PHONEMIZER_ESPEAK_LIBRARY"] = "/usr/lib/libespeak-ng.so"
21
+ else:
22
+ # Local development - use bundled espeak-ng
23
+ espeak_path = Path(__file__).parent.parent / "external" / "espeak-ng"
24
+ if espeak_path.exists():
25
+ os.environ["PHONEMIZER_ESPEAK_LIBRARY"] = str(espeak_path / "libespeak-ng.dll")
26
+ os.environ["PHONEMIZER_ESPEAK_PATH"] = str(espeak_path)
27
+
28
+ print(f"πŸ”§ Environment: {'HuggingFace Spaces' if IS_SPACES else 'Local'}")
29
+ print(f"πŸ”Š PHONEMIZER_ESPEAK_PATH: {os.getenv('PHONEMIZER_ESPEAK_PATH', 'Not set')}")
30
+ print(f"πŸ“š PHONEMIZER_ESPEAK_LIBRARY: {os.getenv('PHONEMIZER_ESPEAK_LIBRARY', 'Not set')}")
packages.txt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ espeak-ng
2
+ ffmpeg
3
+ libsndfile1
pre_startup.sh ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+
3
+ # Pre-startup script for HuggingFace Spaces
4
+ # This runs before the main application
5
+
6
+ echo "πŸš€ Initializing Music Generation Studio..."
7
+
8
+ # Verify espeak-ng installation
9
+ if command -v espeak-ng &> /dev/null; then
10
+ echo "βœ… espeak-ng is installed"
11
+ espeak-ng --version
12
+ else
13
+ echo "❌ espeak-ng not found"
14
+ exit 1
15
+ fi
16
+
17
+ # Verify ffmpeg
18
+ if command -v ffmpeg &> /dev/null; then
19
+ echo "βœ… ffmpeg is installed"
20
+ ffmpeg -version | head -1
21
+ else
22
+ echo "❌ ffmpeg not found"
23
+ fi
24
+
25
+ # Create necessary directories
26
+ mkdir -p outputs/music
27
+ mkdir -p outputs/mixed
28
+ mkdir -p models
29
+ mkdir -p logs
30
+
31
+ echo "βœ… Directories created"
32
+
33
+ # Check Python version
34
+ python --version
35
+
36
+ # Verify key dependencies
37
+ echo "πŸ“¦ Verifying Python packages..."
38
+ python -c "import torch; print(f'βœ… PyTorch {torch.__version__}')" || echo "❌ PyTorch not found"
39
+ python -c "import gradio; print(f'βœ… Gradio {gradio.__version__}')" || echo "❌ Gradio not found"
40
+ python -c "import phonemizer; print('βœ… phonemizer OK')" || echo "❌ phonemizer not found"
41
+
42
+ echo "βœ… Pre-startup checks complete"
requirements.txt ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Core dependencies for HuggingFace Spaces deployment
2
+ gradio==4.44.0
3
+ numpy>=1.24.0,<2.0.0
4
+ scipy>=1.10.0
5
+ librosa>=0.10.0
6
+ soundfile>=0.12.0
7
+ pydantic>=2.0.0
8
+ pyyaml>=6.0
9
+
10
+ # PyTorch - CPU mode for HuggingFace Spaces
11
+ torch>=2.4.0,<2.5.0
12
+ torchaudio>=2.4.0,<2.5.0
13
+
14
+ # DiffRhythm2 dependencies
15
+ torchdiffeq>=0.2.4
16
+ phonemizer>=3.2.0
17
+ muq>=0.1.0
18
+ jieba>=0.42.0
19
+ pypinyin>=0.50.0
20
+ cn2an>=0.5.0
21
+ onnxruntime>=1.15.0
22
+ pykakasi>=2.3.0
23
+ unidecode>=1.3.0
24
+ py3langid>=0.2.2
25
+
26
+ # AI Model dependencies
27
+ transformers==4.47.1
28
+ diffusers>=0.21.0
29
+ sentencepiece>=0.1.99
30
+ protobuf>=3.20.0,<5.0.0
31
+ accelerate>=0.20.0
32
+ einops>=0.7.0
33
+ omegaconf>=2.3.0
34
+
35
+ # Audio processing
36
+ pedalboard>=0.7.0
37
+ pydub>=0.25.1
38
+ resampy>=0.4.2
39
+
40
+ # Utilities
41
+ tqdm>=4.65.0
42
+ huggingface-hub>=0.17.0
43
+ safetensors>=0.3.0
44
+
45
+ # System dependencies note:
46
+ # espeak-ng is required by phonemizer and should be installed via packages.txt
setup_diffrhythm2_src.sh ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+ # Setup script for HuggingFace Spaces
3
+ # Clones DiffRhythm2 source code if not present
4
+
5
+ set -e
6
+
7
+ echo "πŸ”§ Setting up DiffRhythm2 source code..."
8
+
9
+ MODELS_DIR="models"
10
+ DR2_SRC_DIR="$MODELS_DIR/diffrhythm2_source"
11
+
12
+ # Create models directory
13
+ mkdir -p "$MODELS_DIR"
14
+
15
+ # Check if DiffRhythm2 source exists
16
+ if [ ! -d "$DR2_SRC_DIR" ]; then
17
+ echo "πŸ“₯ Cloning DiffRhythm2 source repository..."
18
+ git clone https://github.com/ASLP-lab/DiffRhythm2.git "$DR2_SRC_DIR"
19
+ echo "βœ… DiffRhythm2 source cloned"
20
+ else
21
+ echo "βœ… DiffRhythm2 source already exists"
22
+ fi
23
+
24
+ echo "βœ… Setup complete"