lemm-test-100 / HF_COLLECTION_INTEGRATION.md
Gamahea
Initialize dropdowns with data on app load - Populate training dataset dropdown with prepared datasets on startup - Initialize LoRA dropdowns with available LoRAs - Load LoRA list table with existing data - Populate export dataset dropdown - Fixes 'No prepared datasets available' when datasets exist
5912922

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

HuggingFace Collection Integration - Complete

🎯 Overview

Full integration with HuggingFace Collection for LEMM LoRAs and datasets, including automatic syncing, import/export, and name conflict resolution.

βœ… Implemented Features

1. Dataset Import (import_prepared_dataset)

  • Location: backend/services/dataset_service.py
  • Purpose: Import prepared datasets from ZIP files
  • Features:
    • Supports both root-level and subfolder dataset_info.json structures
    • Automatic name conflict resolution with numeric suffixes (_1, _2, etc.)
    • Validates dataset structure before import
    • Updates metadata with new dataset key if renamed
# Example usage in app.py
def import_dataset(zip_file):
    dataset_service = DatasetService()
    dataset_key = dataset_service.import_prepared_dataset(zip_file)
    return f"βœ… Imported dataset: {dataset_key}"

2. LoRA Collection Sync (sync_on_startup)

  • Location: backend/services/hf_storage_service.py
  • Purpose: Automatically download missing LoRAs from HF collection on app startup
  • Features:
    • Lists all LoRAs in collection
    • Compares with local LoRA directory
    • Downloads only missing LoRAs
    • Handles name conflicts with numeric suffixes
    • Logs sync activity
# Called automatically on app startup (app.py line 82)
hf_storage = HFStorageService(username="Gamahea", collection_slug="lemm-100-pre-beta")
sync_result = hf_storage.sync_on_startup(loras_dir=Path("models/loras"))

3. Enhanced LoRA Upload

  • Location: app.py - start_lora_training() function
  • Purpose: Upload trained LoRAs to HF collection with full metadata
  • Features:
    • Uploads LoRA to individual model repo
    • Adds to collection automatically
    • Includes training config in metadata
    • Returns repo URL and collection link
    • Graceful error handling (saves locally if upload fails)
# Upload after training (app.py lines 1397-1411)
upload_result = hf_storage.upload_lora(lora_dir, training_config=config)
if upload_result and 'repo_id' in upload_result:
    # Success - show URLs
    progress += f"\nβœ… LoRA uploaded successfully!"
    progress += f"\nπŸ”— Model: {upload_result['repo_id']}"
    progress += f"\nπŸ“š Collection: https://huggingface.co/collections/Gamahea/lemm-100-pre-beta"

πŸ“¦ Name Conflict Resolution

All import functions implement automatic name conflict resolution:

  1. First Check: Try original name
  2. If Exists: Append _1, _2, _3, etc.
  3. Update Metadata: Store new name in dataset_info.json or metadata.json
  4. Log Action: Inform user of renaming

Example Flow

Original: my_dataset
Already exists β†’ my_dataset_1
Already exists β†’ my_dataset_2
Available β†’ Use my_dataset_2 βœ…

πŸ”„ Automatic Workflows

On App Startup

  1. Check HF collection for LoRAs
  2. Compare with local models/loras/ directory
  3. Download any missing LoRAs
  4. Log sync results

After LoRA Training

  1. Train LoRA adapter locally
  2. Upload to HF as individual model repo
  3. Add to collection
  4. Return URLs for viewing

Dataset Import

  1. User uploads ZIP file
  2. Extract and validate structure
  3. Check for name conflicts
  4. Copy to training_data/ directory
  5. Update dropdown lists

πŸ› οΈ Technical Details

File Structure Support

LoRA ZIP Files (both supported):

Option 1 (root):
  my_lora.zip/
    β”œβ”€β”€ metadata.json
    β”œβ”€β”€ adapter_config.json
    └── adapter_model.safetensors

Option 2 (subfolder):
  my_lora.zip/
    └── my_lora/
        β”œβ”€β”€ metadata.json
        β”œβ”€β”€ adapter_config.json
        └── adapter_model.safetensors

Dataset ZIP Files (both supported):

Option 1 (root):
  my_dataset.zip/
    β”œβ”€β”€ dataset_info.json
    β”œβ”€β”€ audio/
    β”‚   β”œβ”€β”€ sample_000001.wav
    β”‚   └── sample_000002.wav
    └── splits.json

Option 2 (subfolder):
  my_dataset.zip/
    └── my_dataset/
        β”œβ”€β”€ dataset_info.json
        β”œβ”€β”€ audio/
        └── splits.json

Error Handling

All import/sync functions include:

  • Try-catch blocks for graceful error handling
  • Comprehensive logging with context
  • User-friendly error messages
  • Fallback behavior (e.g., save locally if upload fails)

πŸ“Š HuggingFace Collection Structure

Collection: Gamahea/lemm-100-pre-beta

  • Purpose: Organize all LEMM LoRA adapters
  • Visibility: Public
  • Items: Individual model repos

Model Repos: Gamahea/lemm-lora-{name}

  • Type: LoRA adapters (safetensors)
  • Metadata: Training config, dataset info, creation date
  • Files: adapter_model.safetensors, adapter_config.json, metadata.json

🎯 User Workflows

Train & Share a LoRA

  1. Prepare dataset (curated or user audio)
  2. Configure training parameters
  3. Click "Start Training"
  4. Wait for completion
  5. LoRA automatically uploaded to HF collection
  6. Share collection link with others

Use Someone's LoRA

  1. Open LEMM Space
  2. App automatically syncs LoRAs from collection
  3. Select LoRA in generation dropdown
  4. Generate music with custom style

Import a Dataset

  1. Export dataset from another LEMM instance
  2. Click "Import Dataset" in training tab
  3. Upload ZIP file
  4. Dataset appears in training dropdown
  5. Use for LoRA training

πŸ”— Related Files

πŸ“ Commit History

  • 17f5813 (latest): Add dataset import & LoRA collection sync

    • import_prepared_dataset() method
    • sync_on_startup() method
    • Enhanced upload_lora() with training_config
    • Numeric suffix naming for conflicts
  • f65e448: Fixed LoRA import to support both ZIP structures

  • 2f0c8b4: Added "Load for Training" workflow

  • b40ee5f: Fixed DataFrame handling in dataset preparation

πŸŽ‰ Result

Complete HuggingFace ecosystem integration!

  • βœ… Auto-sync LoRAs from collection
  • βœ… Upload trained LoRAs to collection
  • βœ… Import/export datasets
  • βœ… Name conflict resolution
  • βœ… Comprehensive error handling
  • βœ… User-friendly feedback

All three issues from screenshots are now resolved! πŸš€