Spaces:

Gamahea
/

lemm-test-100

Running on Zero

lemm-test-100 / HF_COLLECTION_INTEGRATION.md

Gamahea

Initialize dropdowns with data on app load - Populate training dataset dropdown with prepared datasets on startup - Initialize LoRA dropdowns with available LoRAs - Load LoRA list table with existing data - Populate export dataset dropdown - Fixes 'No prepared datasets available' when datasets exist

5912922 4 days ago

preview code

raw

history blame contribute delete

6.53 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

HuggingFace Collection Integration - Complete

🎯 Overview

Full integration with HuggingFace Collection for LEMM LoRAs and datasets, including automatic syncing, import/export, and name conflict resolution.

✅ Implemented Features

1. Dataset Import (`import_prepared_dataset`)

Location: backend/services/dataset_service.py
Purpose: Import prepared datasets from ZIP files
Features:
- Supports both root-level and subfolder dataset_info.json structures
- Automatic name conflict resolution with numeric suffixes (_1, _2, etc.)
- Validates dataset structure before import
- Updates metadata with new dataset key if renamed

# Example usage in app.py
def import_dataset(zip_file):
    dataset_service = DatasetService()
    dataset_key = dataset_service.import_prepared_dataset(zip_file)
    return f"✅ Imported dataset: {dataset_key}"

2. LoRA Collection Sync (`sync_on_startup`)

Location: backend/services/hf_storage_service.py
Purpose: Automatically download missing LoRAs from HF collection on app startup
Features:
- Lists all LoRAs in collection
- Compares with local LoRA directory
- Downloads only missing LoRAs
- Handles name conflicts with numeric suffixes
- Logs sync activity

# Called automatically on app startup (app.py line 82)
hf_storage = HFStorageService(username="Gamahea", collection_slug="lemm-100-pre-beta")
sync_result = hf_storage.sync_on_startup(loras_dir=Path("models/loras"))

3. Enhanced LoRA Upload

Location: app.py - start_lora_training() function
Purpose: Upload trained LoRAs to HF collection with full metadata
Features:
- Uploads LoRA to individual model repo
- Adds to collection automatically
- Includes training config in metadata
- Returns repo URL and collection link
- Graceful error handling (saves locally if upload fails)

# Upload after training (app.py lines 1397-1411)
upload_result = hf_storage.upload_lora(lora_dir, training_config=config)
if upload_result and 'repo_id' in upload_result:
    # Success - show URLs
    progress += f"\n✅ LoRA uploaded successfully!"
    progress += f"\n🔗 Model: {upload_result['repo_id']}"
    progress += f"\n📚 Collection: https://huggingface.co/collections/Gamahea/lemm-100-pre-beta"

📦 Name Conflict Resolution

All import functions implement automatic name conflict resolution:

First Check: Try original name
If Exists: Append _1, _2, _3, etc.
Update Metadata: Store new name in dataset_info.json or metadata.json
Log Action: Inform user of renaming

Example Flow

Original: my_dataset
Already exists → my_dataset_1
Already exists → my_dataset_2
Available → Use my_dataset_2 ✅

🔄 Automatic Workflows

On App Startup

Check HF collection for LoRAs
Compare with local models/loras/ directory
Download any missing LoRAs
Log sync results

After LoRA Training

Train LoRA adapter locally
Upload to HF as individual model repo
Add to collection
Return URLs for viewing

Dataset Import

User uploads ZIP file
Extract and validate structure
Check for name conflicts
Copy to training_data/ directory
Update dropdown lists

🛠️ Technical Details

File Structure Support

LoRA ZIP Files (both supported):

Option 1 (root):
  my_lora.zip/
    ├── metadata.json
    ├── adapter_config.json
    └── adapter_model.safetensors

Option 2 (subfolder):
  my_lora.zip/
    └── my_lora/
        ├── metadata.json
        ├── adapter_config.json
        └── adapter_model.safetensors

Dataset ZIP Files (both supported):

Option 1 (root):
  my_dataset.zip/
    ├── dataset_info.json
    ├── audio/
    │   ├── sample_000001.wav
    │   └── sample_000002.wav
    └── splits.json

Option 2 (subfolder):
  my_dataset.zip/
    └── my_dataset/
        ├── dataset_info.json
        ├── audio/
        └── splits.json

Error Handling

All import/sync functions include:

Try-catch blocks for graceful error handling
Comprehensive logging with context
User-friendly error messages
Fallback behavior (e.g., save locally if upload fails)

📊 HuggingFace Collection Structure

Collection: Gamahea/lemm-100-pre-beta

Purpose: Organize all LEMM LoRA adapters
Visibility: Public
Items: Individual model repos

Model Repos: Gamahea/lemm-lora-{name}

Type: LoRA adapters (safetensors)
Metadata: Training config, dataset info, creation date
Files: adapter_model.safetensors, adapter_config.json, metadata.json

🎯 User Workflows

Train & Share a LoRA

Prepare dataset (curated or user audio)
Configure training parameters
Click "Start Training"
Wait for completion
LoRA automatically uploaded to HF collection
Share collection link with others

Use Someone's LoRA

Open LEMM Space
App automatically syncs LoRAs from collection
Select LoRA in generation dropdown
Generate music with custom style

Import a Dataset

Export dataset from another LEMM instance
Click "Import Dataset" in training tab
Upload ZIP file
Dataset appears in training dropdown
Use for LoRA training

🔗 Related Files

HF Storage Service: backend/services/hf_storage_service.py
Dataset Service: backend/services/dataset_service.py
Main App: app.py
LoRA Training Service: backend/services/lora_training_service.py

📝 Commit History

17f5813 (latest): Add dataset import & LoRA collection sync
- import_prepared_dataset() method
- sync_on_startup() method
- Enhanced upload_lora() with training_config
- Numeric suffix naming for conflicts
f65e448: Fixed LoRA import to support both ZIP structures
2f0c8b4: Added "Load for Training" workflow
b40ee5f: Fixed DataFrame handling in dataset preparation

🎉 Result

Complete HuggingFace ecosystem integration!

✅ Auto-sync LoRAs from collection
✅ Upload trained LoRAs to collection
✅ Import/export datasets
✅ Name conflict resolution
✅ Comprehensive error handling
✅ User-friendly feedback

All three issues from screenshots are now resolved! 🚀