Spaces:

smartTranscend
/

Bert_two

Sleeping

App Files Files Community

smartTranscend commited on Nov 1

Commit

fa3056f

verified ·

1 Parent(s): 76f5d0d

Upload 3 files

Browse files

Files changed (3) hide show

bert_readme.txt +214 -0
bert_requirements.py +9 -0
bert_second_finetuning.py +1657 -0

bert_readme.txt ADDED Viewed

	@@ -0,0 +1,214 @@

+# 🥼 BERT 乳癌存活預測 - 二次微調完整平台
+完整的 BERT 二次微調系統,支援從第一次微調到二次微調的完整流程,並可在新數據上比較多個模型的表現。
+## 🌟 核心功能
+### 1️⃣ 第一次微調
+- 從純 BERT 開始訓練
+- 支援三種微調方法:
+  - **Full Fine-tuning**: 訓練所有參數
+  - **LoRA**: 低秩適配,參數高效
+  - **AdaLoRA**: 自適應 LoRA,動態調整秩
+- 自動比較純 BERT vs 第一次微調的表現
+### 2️⃣ 二次微調
+- 基於第一次微調模型繼續訓練
+- 使用新的訓練數據
+- 自動繼承第一次的微調方法
+- 適合增量學習和領域適應
+### 3️⃣ 新數據測試
+- 上傳新測試數據
+- 同時比較最多 3 個模型:
+  - 純 BERT (Baseline)
+  - 第一次微調模型
+  - 第二次微調模型
+- 並排顯示所有評估指標
+### 4️⃣ 模型預測
+- 選擇任一已訓練模型
+- 輸入病歷文本進行預測
+- 同時顯示未微調和微調模型的預測結果
+## 📋 資料格式
+CSV 檔案必須包含以下欄位:
+- **Text**: 病歷文本 (英文)
+- **label**: 標籤 (0=存活, 1=死亡)
+範例:
+```csv
+Text,label
+"Patient is a 45-year-old female with stage II breast cancer...",0
+"65-year-old woman diagnosed with triple-negative breast cancer...",1
+```
+## 🚀 使用流程
+### 步驟 1: 第一次微調
+1. 進入「1️⃣ 第一次微調」頁面
+2. 上傳訓練數據 A (CSV)
+3. 選擇微調方法 (建議先用 Full Fine-tuning)
+4. 調整訓練參數:
+   - 權重倍數: 0.8 (處理不平衡數據)
+   - 訓練輪數: 8-10
+   - 學習率: 2e-5
+5. 點擊「開始第一次微調」
+6. 等待訓練完成,查看結果
+### 步驟 2: 二次微調
+1. 進入「2️⃣ 二次微調」頁面
+2. 點擊「🔄 重新整理模型列表」
+3. 選擇第一次微調的模型
+4. 上傳新的訓練數據 B
+5. 調整訓練參數 (建議):
+   - 訓練輪數: 3-5 (比第一次少)
+   - 學習率: 1e-5 (比第一次小)
+6. 點擊「開始二次微調」
+7. 等待訓練完成
+### 步驟 3: 新數據測試
+1. 進入「3️⃣ 新數據測試」頁面
+2. 上傳測試數據 C
+3. 選擇要比較的模型:
+   - 純 BERT: 選擇「評估純 BERT」
+   - 第一次微調: 從下拉選單選擇
+   - 第二次微調: 從下拉選單選擇
+4. 點擊「開始測試」
+5. 查看三個模型的比較結果
+### 步驟 4: 預測
+1. 進入「4️⃣ 模型預測」頁面
+2. 選擇要使用的模型
+3. 輸入病歷文本
+4. 點擊「開始預測」
+5. 查看預測結果
+## 🎯 微調方法比較
+| 方法 | 參數量 | 訓練速度 | 記憶體使用 | 效果 |
+|------|--------|---------|-----------|------|
+| **Full Fine-tuning** | 100% | 1x (基準) | 高 | 最佳 |
+| **LoRA** | ~1% | 3-5x 快 | 低 | 良好 |
+| **AdaLoRA** | ~1% | 3-5x 快 | 低 | 良好 |
+## 💡 二次微調最佳實踐
+### 何時使用二次微調?
+1. **領域適應**
+   - 第一次: 使用通用醫療數據
+   - 第二次: 使用特定醫院/科別數據
+2. **增量學習**
+   - 第一次: 使用歷史數據
+   - 第二次: 加入新收集的數據
+3. **數據稀缺**
+   - 第一次: 使用大量相關領域數據
+   - 第二次: 使用少量目標領域數據
+### 參數調整建議
+| 參數 | 第一次微調 | 第二次微調 | 原因 |
+|------|----------|----------|------|
+| **Epochs** | 8-10 | 3-5 | 避免過度擬合 |
+| **Learning Rate** | 2e-5 | 1e-5 | 保護已學習知識 |
+| **Warmup Steps** | 200 | 100 | 較少的預熱 |
+| **權重倍數** | 根據數據調整 | 根據新數據調整 | 處理不平衡 |
+### 注意事項
+⚠️ **重要提醒**:
+- 第二次微調會自動使用第一次的微調方法,無法更換
+- 建議第二次的學習率比第一次小,避免「災難性遺忘」
+- 如果第二次數據與第一次差異很大,可能需要更多輪數
+- 始終在新數據上測試,確保沒有性能下降
+## 📊 評估指標說明
+| 指標 | 說明 | 適用場景 |
+|------|------|---------|
+| **F1 Score** | 精確率和召回率的調和平均 | 平衡評估,通用指標 |
+| **Accuracy** | 整體準確率 | 數據平衡時使用 |
+| **Precision** | 預測為死亡中的準確率 | 避免誤報時優化 |
+| **Recall** | 實際死亡中被識別的比例 | 避免漏診時優化 |
+| **Sensitivity** | 等同於 Recall | 醫療場景常用 |
+| **Specificity** | 實際存活中被識別的比例 | 避免過度治療 |
+| **AUC** | ROC 曲線下面積 | 整體分類能力 |
+## 🔧 技術細節
+### 訓練流程
+1. **數據準備**
+   - 載入 CSV
+   - 保持原始類別比例
+   - Tokenization (max_length=256)
+   - 80/20 訓練/驗證分割
+2. **模型初始化**
+   - 第一次: 從 `bert-base-uncased` 載入
+   - 第二次: 從第一次微調模型載入
+   - 應用 PEFT 配置 (如果使用 LoRA/AdaLoRA)
+3. **訓練**
+   - 使用類別權重處理不平衡
+   - Early stopping (基於驗證集)
+   - 保存最佳模型
+4. **評估**
+   - 在驗證集上評估
+   - 計算所有指標
+   - 生成混淆矩陣
+### 模型儲存
+- 模型檔案: `./breast_cancer_bert_{method}_{type}_{timestamp}/`
+- 模型清單: `./saved_models_list.json`
+- 包含所有訓練資訊和超參數
+## 🐛 常見問題
+### Q1: 為什麼二次微調不能更換方法?
+**A**: 因為不同方法的參數結構不同。例如 LoRA 添加了低秩矩陣,如果切換到 Full Fine-tuning,這些參數會遺失。
+### Q2: 第二次微調的數據量應該多少?
+**A**: 建議至少 100 筆,但可以比第一次少。如果數據太少,可能會過度擬合。
+### Q3: 如何選擇最佳化指標?
+**A**:
+- 醫療場景通常優先 **Recall** (避免漏診)
+- 如果誤報代價高,選 **Precision**
+- 平衡場景選 **F1 Score**
+### Q4: GPU 記憶體不足怎麼辦?
+**A**:
+- 使用 LoRA 或 AdaLoRA (減少 90% 記憶體)
+- 減小 batch size
+- 減少 max_length
+### Q5: 訓練時間太長?
+**A**:
+- 使用 LoRA/AdaLoRA (快 3-5 倍)
+- 減少 epochs
+- 增加 batch size (如果記憶體允許)
+## 📝 版本資訊
+- **Version**: 1.0.0
+- **Python**: 3.10+
+- **主要依賴**:
+  - transformers 4.36.0
+  - torch 2.1.0
+  - peft 0.7.1
+  - gradio 4.44.0
+## 📄 授權
+本專案完全保留您的原始程式邏輯,僅新增二次微調和測試功能。
+## 🙏 致謝
+基於 BERT 模型和 Hugging Face Transformers 庫開發。

bert_requirements.py ADDED Viewed

	@@ -0,0 +1,9 @@

+gradio==4.44.0
+pandas==2.0.3
+torch==2.1.0
+transformers==4.36.0
+datasets==2.14.6
+scikit-learn==1.3.2
+numpy==1.24.3
+peft==0.7.1
+accelerate==0.25.0

bert_second_finetuning.py ADDED Viewed

	@@ -0,0 +1,1657 @@

+import gradio as gr
+import pandas as pd
+import torch
+from torch import nn
+from transformers import (
+    BertTokenizer,
+    BertForSequenceClassification,
+    TrainingArguments,
+    Trainer
+)
+from datasets import Dataset
+from sklearn.metrics import (
+    accuracy_score,
+    precision_recall_fscore_support,
+    roc_auc_score,
+    confusion_matrix
+)
+import numpy as np
+from datetime import datetime
+import json
+import os
+import gc
+# PEFT 相關的 import(LoRA 和 AdaLoRA)
+try:
+    from peft import (
+        LoraConfig,
+        AdaLoraConfig,
+        get_peft_model,
+        TaskType,
+        PeftModel
+    )
+    PEFT_AVAILABLE = True
+except ImportError:
+    PEFT_AVAILABLE = False
+    print("⚠️ PEFT 未安裝,LoRA 和 AdaLoRA 功能將不可用")
+# 檢查 GPU
+device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+_MODEL_PATH = None
+LAST_TOKENIZER = None
+LAST_TUNING_METHOD = None
+# ==================== 您的原始函數 - 完全不動 ====================
+def evaluate_baseline_bert(eval_dataset, df_clean):
+    """
+    評估原始 BERT(完全沒看過資料)的表現
+    這部分是從您的格子 5 提取的 baseline 比較邏輯
+    """
+    print("\n" + "=" * 80)
+    print("評估 Baseline 純 BERT(完全沒看過資料)")
+    print("=" * 80)
+    # 載入純 BERT
+    baseline_tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
+    baseline_model = BertForSequenceClassification.from_pretrained(
+        "bert-base-uncased",
+        num_labels=2
+    ).to(device)
+    baseline_model.eval()
+    print("   ⚠️ 這個模型完全沒有使用您的資料訓練")
+    # 重新處理驗證集
+    baseline_dataset = Dataset.from_pandas(df_clean[['text', 'label']])
+    def baseline_preprocess(examples):
+        return baseline_tokenizer(examples['text'], truncation=True, padding='max_length', max_length=256)
+    baseline_tokenized = baseline_dataset.map(baseline_preprocess, batched=True)
+    baseline_split = baseline_tokenized.train_test_split(test_size=0.2, seed=42)
+    baseline_eval_dataset = baseline_split['test']
+    # 建立 Baseline Trainer
+    baseline_trainer_args = TrainingArguments(
+        output_dir='./temp_baseline',
+        per_device_eval_batch_size=32,
+        report_to="none"
+    )
+    baseline_trainer = Trainer(
+        model=baseline_model,
+        args=baseline_trainer_args,
+    )
+    # 評估 Baseline
+    print("📄 評估純 BERT...")
+    predictions_output = baseline_trainer.predict(baseline_eval_dataset)
+    all_preds = predictions_output.predictions.argmax(-1)
+    all_labels = predictions_output.label_ids
+    probs = torch.nn.functional.softmax(torch.tensor(predictions_output.predictions), dim=-1)[:, 1].numpy()
+    # 計算指標
+    precision, recall, f1, _ = precision_recall_fscore_support(
+        all_labels, all_preds, average='binary', pos_label=1, zero_division=0
+    )
+    acc = accuracy_score(all_labels, all_preds)
+    try:
+        auc = roc_auc_score(all_labels, probs)
+    except:
+        auc = 0.0
+    cm = confusion_matrix(all_labels, all_preds)
+    if cm.shape == (2, 2):
+        tn, fp, fn, tp = cm.ravel()
+        sensitivity = tp / (tp + fn) if (tp + fn) > 0 else 0
+        specificity = tn / (tn + fp) if (tn + fp) > 0 else 0
+    else:
+        sensitivity = specificity = 0
+        tn = fp = fn = tp = 0
+    baseline_results = {
+        'f1': float(f1),
+        'accuracy': float(acc),
+        'precision': float(precision),
+        'recall': float(recall),
+        'sensitivity': float(sensitivity),
+        'specificity': float(specificity),
+        'auc': float(auc),
+        'tp': int(tp),
+        'tn': int(tn),
+        'fp': int(fp),
+        'fn': int(fn)
+    }
+    print("✅ Baseline 評估完成")
+    # 清理
+    del baseline_model
+    del baseline_trainer
+    torch.cuda.empty_cache()
+    gc.collect()
+    return baseline_results
+def run_original_code_with_tuning(
+    file_path,
+    weight_multiplier,
+    epochs,
+    batch_size,
+    learning_rate,
+    warmup_steps,
+    tuning_method,
+    best_metric,
+    # LoRA 參數
+    lora_r,
+    lora_alpha,
+    lora_dropout,
+    lora_modules,
+    # AdaLoRA 參數
+    adalora_init_r,
+    adalora_target_r,
+    adalora_tinit,
+    adalora_tfinal,
+    adalora_delta_t,
+    # 新增:是否為二次微調
+    is_second_finetuning=False,
+    base_model_path=None
+):
+    """
+    您的原始程式碼 + 不同微調方法的選項 + Baseline 比較
+    核心邏輯完全不變,只是在模型初始化部分加入條件判斷
+    新增參數:
+    - is_second_finetuning: 是否為二次微調
+    - base_model_path: 第一次微調模型的路徑(僅二次微調時使用)
+    """
+    global LAST_MODEL_PATH, LAST_TOKENIZER, LAST_TUNING_METHOD
+    # ==================== 清空記憶體(訓練前) ====================
+    torch.cuda.empty_cache()
+    gc.collect()
+    print("🧹 記憶體已清空")
+    # ==================== 您的原始程式碼開始 ====================
+    # 讀取上傳的檔案
+    df_original = pd.read_csv(file_path)
+    df_clean = pd.DataFrame({
+        'text': df_original['Text'],
+        'label': df_original['label']
+    })
+    df_clean = df_clean.dropna()
+    training_type = "二次微調" if is_second_finetuning else "第一次微調"
+    print("\n" + "=" * 80)
+    print(f"乳癌存活預測 BERT {training_type} - {tuning_method} 方法")
+    print("=" * 80)
+    print(f"開始時間: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
+    print(f"訓練類型: {training_type}")
+    print(f"微調方法: {tuning_method}")
+    print(f"最佳化指標: {best_metric}")
+    if is_second_finetuning:
+        print(f"基礎模型: {base_model_path}")
+    print("=" * 80)
+    # 載入 Tokenizer
+    print("\n📦 載入 BERT Tokenizer...")
+    tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
+    print("✅ Tokenizer 載入完成")
+    # 評估函數 - 完全是您的原始程式碼,不動
+    def compute_metrics(pred):
+        labels = pred.label_ids
+        preds = pred.predictions.argmax(-1)
+        probs = torch.nn.functional.softmax(torch.tensor(pred.predictions), dim=-1)[:, 1].numpy()
+        precision, recall, f1, _ = precision_recall_fscore_support(
+            labels, preds, average='binary', pos_label=1, zero_division=0
+        )
+        acc = accuracy_score(labels, preds)
+        try:
+            auc = roc_auc_score(labels, probs)
+        except:
+            auc = 0.0
+        cm = confusion_matrix(labels, preds)
+        if cm.shape == (2, 2):
+            tn, fp, fn, tp = cm.ravel()
+        else:
+            if len(np.unique(preds)) == 1:
+                if preds[0] == 0:
+                    tn, fp, fn, tp = sum(labels == 0), 0, sum(labels == 1), 0
+                else:
+                    tn, fp, fn, tp = 0, sum(labels == 0), 0, sum(labels == 1)
+            else:
+                tn = fp = fn = tp = 0
+        sensitivity = tp / (tp + fn) if (tp + fn) > 0 else 0
+        specificity = tn / (tn + fp) if (tn + fp) > 0 else 0
+        return {
+            'accuracy': acc, 'f1': f1, 'precision': precision, 'recall': recall,
+            'auc': auc, 'sensitivity': sensitivity, 'specificity': specificity,
+            'tp': int(tp), 'tn': int(tn), 'fp': int(fp), 'fn': int(fn)
+        }
+    # ============================================================================
+    # 步驟 1:準備資料(不做平衡) - 您的原始程式碼
+    # ============================================================================
+    print("\n" + "=" * 80)
+    print("步驟 1:準備資料(保持原始比例)")
+    print("=" * 80)
+    print(f"\n原始資料分布:")
+    print(f"  存活 (0): {sum(df_clean['label']==0)} 筆 ({sum(df_clean['label']==0)/len(df_clean)*100:.1f}%)")
+    print(f"  死亡 (1): {sum(df_clean['label']==1)} 筆 ({sum(df_clean['label']==1)/len(df_clean)*100:.1f}%)")
+    ratio = sum(df_clean['label']==0) / sum(df_clean['label']==1)
+    print(f"  不平衡比例: {ratio:.1f}:1")
+    # ============================================================================
+    # 步驟 2:Tokenization - 您的原始程式碼
+    # ============================================================================
+    print("\n" + "=" * 80)
+    print("步驟 2:Tokenization")
+    print("=" * 80)
+    dataset = Dataset.from_pandas(df_clean[['text', 'label']])
+    def preprocess_function(examples):
+        return tokenizer(examples['text'], truncation=True, padding='max_length', max_length=256)
+    tokenized_dataset = dataset.map(preprocess_function, batched=True)
+    train_test_split = tokenized_dataset.train_test_split(test_size=0.2, seed=42)
+    train_dataset = train_test_split['train']
+    eval_dataset = train_test_split['test']
+    print(f"\n✅ 資料集準備完成:")
+    print(f"  訓練集: {len(train_dataset)} 筆")
+    print(f"  驗證集: {len(eval_dataset)} 筆")
+    # ============================================================================
+    # 步驟 3:設定權重 - 您的原始程式碼
+    # ============================================================================
+    print("\n" + "=" * 80)
+    print(f"步驟 3:設定類別權重({weight_multiplier}x 倍數)")
+    print("=" * 80)
+    weight_0 = 1.0
+    weight_1 = ratio * weight_multiplier
+    print(f"\n權重設定:")
+    print(f"  倍數: {weight_multiplier}x")
+    print(f"  存活類權重: {weight_0:.3f}")
+    print(f"  死亡類權重: {weight_1:.3f} (= {ratio:.1f} × {weight_multiplier})")
+    class_weights = torch.tensor([weight_0, weight_1], dtype=torch.float).to(device)
+    # ============================================================================
+    # 步驟 4:訓練模型 - 這裡加入二次微調的邏輯
+    # ============================================================================
+    print("\n" + "=" * 80)
+    print(f"步驟 4:訓練 {tuning_method} BERT 模型 ({training_type})")
+    print("=" * 80)
+    print(f"\n🔄 初始化模型 ({tuning_method})...")
+    # 【新增】二次微調:載入第一次微調的模型
+    if is_second_finetuning and base_model_path:
+        print(f"📦 載入第一次微調模型: {base_model_path}")
+        # 讀取第一次模型資訊
+        with open('./saved_models_list.json', 'r') as f:
+            models_list = json.load(f)
+        base_model_info = None
+        for model_info in models_list:
+            if model_info['model_path'] == base_model_path:
+                base_model_info = model_info
+                break
+        if base_model_info is None:
+            raise ValueError(f"找不到基礎模型資訊: {base_model_path}")
+        base_tuning_method = base_model_info['tuning_method']
+        print(f"   第一次微調方法: {base_tuning_method}")
+        # 根據第一次的方法載入模型
+        if base_tuning_method in ["LoRA", "AdaLoRA"] and PEFT_AVAILABLE:
+            # 載入 PEFT 模型
+            base_bert = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)
+            model = PeftModel.from_pretrained(base_bert, base_model_path)
+            print(f"   ✅ 已載入 {base_tuning_method} 模型")
+        else:
+            # 載入一般模型
+            model = BertForSequenceClassification.from_pretrained(base_model_path, num_labels=2)
+            print(f"   ✅ 已載入 Full Fine-tuning 模型")
+        model = model.to(device)
+        print(f"   ⚠️ 注意:二次微調將使用與第一次相同的方法 ({base_tuning_method})")
+        # 二次微調時強制使用相同方法
+        tuning_method = base_tuning_method
+    else:
+        # 【原始邏輯】第一次微調:從純 BERT 開始
+        model = BertForSequenceClassification.from_pretrained(
+            "bert-base-uncased", num_labels=2, problem_type="single_label_classification"
+        )
+        # 根據選擇的微調方法設定模型
+        if tuning_method == "Full Fine-tuning":
+            # 您的原始方法 - 完全不動
+            model = model.to(device)
+            print("✅ 使用完整 Fine-tuning(所有參數可訓練)")
+            trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
+            all_params = sum(p.numel() for p in model.parameters())
+            print(f"  可訓練參數: {trainable_params:,} / {all_params:,} ({100 * trainable_params / all_params:.2f}%)")
+        elif tuning_method == "LoRA" and PEFT_AVAILABLE:
+            # LoRA 設定
+            target_modules = lora_modules.split(",") if lora_modules else ["query", "value"]
+            target_modules = [m.strip() for m in target_modules]
+            peft_config = LoraConfig(
+                task_type=TaskType.SEQ_CLS,
+                r=int(lora_r),
+                lora_alpha=int(lora_alpha),
+                lora_dropout=float(lora_dropout),
+                target_modules=target_modules
+            )
+            model = get_peft_model(model, peft_config)
+            model = model.to(device)
+            print("✅ 使用 LoRA 微調")
+            print(f"  LoRA rank (r): {lora_r}")
+            print(f"  LoRA alpha: {lora_alpha}")
+            print(f"  LoRA dropout: {lora_dropout}")
+            print(f"  目標模組: {target_modules}")
+            model.print_trainable_parameters()
+        elif tuning_method == "AdaLoRA" and PEFT_AVAILABLE:
+            # AdaLoRA 設定
+            target_modules = lora_modules.split(",") if lora_modules else ["query", "value"]
+            target_modules = [m.strip() for m in target_modules]
+            peft_config = AdaLoraConfig(
+                task_type=TaskType.SEQ_CLS,
+                init_r=int(adalora_init_r),
+                target_r=int(adalora_target_r),
+                tinit=int(adalora_tinit),
+                tfinal=int(adalora_tfinal),
+                deltaT=int(adalora_delta_t),
+                lora_alpha=int(lora_alpha),
+                lora_dropout=float(lora_dropout),
+                target_modules=target_modules
+            )
+            model = get_peft_model(model, peft_config)
+            model = model.to(device)
+            print("✅ 使用 AdaLoRA 微調")
+            print(f"  初始 rank: {adalora_init_r}")
+            print(f"  目標 rank: {adalora_target_r}")
+            print(f"  Tinit: {adalora_tinit}, Tfinal: {adalora_tfinal}, DeltaT: {adalora_delta_t}")
+            model.print_trainable_parameters()
+        else:
+            # 預設使用 Full Fine-tuning
+            model = model.to(device)
+            print("⚠️ PEFT 未安裝或方法無效,使用 Full Fine-tuning")
+    # 自訂 Trainer(使用權重) - 您的原始程式碼
+    class WeightedTrainer(Trainer):
+        def compute_loss(self, model, inputs, return_outputs=False):
+            labels = inputs.pop("labels")
+            outputs = model(**inputs)
+            loss_fct = nn.CrossEntropyLoss(weight=class_weights)
+            loss = loss_fct(outputs.logits.view(-1, 2), labels.view(-1))
+            return (loss, outputs) if return_outputs else loss
+    # 訓練設定 - 根據選擇的最佳指標調整
+    metric_map = {
+        "f1": "f1",
+        "accuracy": "accuracy",
+        "precision": "precision",
+        "recall": "recall",
+        "sensitivity": "sensitivity",
+        "specificity": "specificity",
+        "auc": "auc"
+    }
+    training_args = TrainingArguments(
+        output_dir='./results_weight',
+        num_train_epochs=epochs,
+        per_device_train_batch_size=batch_size,
+        per_device_eval_batch_size=batch_size*2,
+        warmup_steps=warmup_steps,
+        weight_decay=0.01,
+        learning_rate=learning_rate,
+        logging_steps=50,
+        evaluation_strategy="epoch",
+        save_strategy="epoch",
+        load_best_model_at_end=True,
+        metric_for_best_model=metric_map.get(best_metric, "f1"),
+        report_to="none",
+        greater_is_better=True
+    )
+    trainer = WeightedTrainer(
+        model=model, args=training_args,
+        train_dataset=train_dataset, eval_dataset=eval_dataset,
+        compute_metrics=compute_metrics
+    )
+    print(f"\n🚀 開始訓練({epochs} epochs)...")
+    print(f"   最佳化指標: {best_metric}")
+    print("-" * 80)
+    trainer.train()
+    print("\n✅ 模型訓練完成!")
+    # 評估模型
+    print("\n📊 評估模型...")
+    results = trainer.evaluate()
+    print(f"\n{training_type} {tuning_method} BERT ({weight_multiplier}x 權重) 表現:")
+    print(f"  F1 Score: {results['eval_f1']:.4f}")
+    print(f"  Accuracy: {results['eval_accuracy']:.4f}")
+    print(f"  Precision: {results['eval_precision']:.4f}")
+    print(f"  Recall: {results['eval_recall']:.4f}")
+    print(f"  Sensitivity: {results['eval_sensitivity']:.4f}")
+    print(f"  Specificity: {results['eval_specificity']:.4f}")
+    print(f"  AUC: {results['eval_auc']:.4f}")
+    print(f"  混淆矩陣: Tp={results['eval_tp']}, Tn={results['eval_tn']}, "
+          f"Fp={results['eval_fp']}, Fn={results['eval_fn']}")
+    # ============================================================================
+    # 步驟 5:Baseline 比較(純 BERT) - 僅第一次微調時執行
+    # ============================================================================
+    if not is_second_finetuning:
+        print("\n" + "=" * 80)
+        print("步驟 5:Baseline 比較 - 純 BERT(完全沒看過資料)")
+        print("=" * 80)
+        baseline_results = evaluate_baseline_bert(eval_dataset, df_clean)
+        # ============================================================================
+        # 步驟 6:比較結果
+        # ============================================================================
+        print("\n" + "=" * 80)
+        print(f"📊 【對比結果】純 BERT vs {tuning_method} BERT")
+        print("=" * 80)
+        print("\n📋 詳細比較表:")
+        print("-" * 100)
+        print(f"{'指標':<15} {'純 BERT':<20} {tuning_method:<20} {'改善幅度':<20}")
+        print("-" * 100)
+        metrics_to_compare = [
+            ('F1 Score', 'f1', 'eval_f1'),
+            ('Accuracy', 'accuracy', 'eval_accuracy'),
+            ('Precision', 'precision', 'eval_precision'),
+            ('Recall', 'recall', 'eval_recall'),
+            ('Sensitivity', 'sensitivity', 'eval_sensitivity'),
+            ('Specificity', 'specificity', 'eval_specificity'),
+            ('AUC', 'auc', 'eval_auc')
+        ]
+        for name, baseline_key, finetuned_key in metrics_to_compare:
+            baseline_val = baseline_results[baseline_key]
+            finetuned_val = results[finetuned_key]
+            improvement = ((finetuned_val - baseline_val) / baseline_val * 100) if baseline_val > 0 else 0
+            print(f"{name:<15} {baseline_val:<20.4f} {finetuned_val:<20.4f} {improvement:>+18.1f}%")
+        print("-" * 100)
+    else:
+        baseline_results = None
+    # 儲存模型
+    training_label = "second" if is_second_finetuning else "first"
+    save_dir = f'./breast_cancer_bert_{tuning_method.lower().replace(" ", "_")}_{training_label}_{datetime.now().strftime("%Y%m%d_%H%M%S")}'
+    if tuning_method in ["LoRA", "AdaLoRA"] and PEFT_AVAILABLE:
+        # PEFT 模型儲存方式
+        model.save_pretrained(save_dir)
+        tokenizer.save_pretrained(save_dir)
+    else:
+        # 一般模型儲存方式
+        model.save_pretrained(save_dir)
+        tokenizer.save_pretrained(save_dir)
+    # 儲存模型資訊到 JSON 檔案(用於預測頁面選擇)
+    model_info = {
+        'model_path': save_dir,
+        'tuning_method': tuning_method,
+        'training_type': training_type,
+        'best_metric': best_metric,
+        'best_metric_value': float(results[f'eval_{metric_map.get(best_metric, "f1")}']),
+        'timestamp': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
+        'weight_multiplier': weight_multiplier,
+        'epochs': epochs,
+        'is_second_finetuning': is_second_finetuning,
+        'base_model_path': base_model_path if is_second_finetuning else None
+    }
+    # 讀取現有的模型列表
+    models_list_file = './saved_models_list.json'
+    if os.path.exists(models_list_file):
+        with open(models_list_file, 'r') as f:
+            models_list = json.load(f)
+    else:
+        models_list = []
+    # 加入新模型資訊
+    models_list.append(model_info)
+    # 儲存更新後的列表
+    with open(models_list_file, 'w') as f:
+        json.dump(models_list, f, indent=2)
+    # 儲存到全域變數供預測使用
+    LAST_MODEL_PATH = save_dir
+    LAST_TOKENIZER = tokenizer
+    LAST_TUNING_METHOD = tuning_method
+    print(f"\n💾 模型已儲存至: {save_dir}")
+    print("\n" + "=" * 80)
+    print("🎉 訓練完成!")
+    print("=" * 80)
+    print(f"完成時間: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
+    # ==================== 清空記憶體(訓練後) ====================
+    del model
+    del trainer
+    torch.cuda.empty_cache()
+    gc.collect()
+    print("🧹 訓練後記憶體已清空")
+    # 加入所有資訊到結果中
+    results['tuning_method'] = tuning_method
+    results['training_type'] = training_type
+    results['best_metric'] = best_metric
+    results['best_metric_value'] = results[f'eval_{metric_map.get(best_metric, "f1")}']
+    results['baseline_results'] = baseline_results
+    results['model_path'] = save_dir
+    results['is_second_finetuning'] = is_second_finetuning
+    return results
+# ==================== 新增:新數據測試函數 ====================
+def test_on_new_data(test_file_path, baseline_model_path, first_model_path, second_model_path):
+    """
+    在新測試數據上比較三個模型的表現:
+    1. 純 BERT (baseline)
+    2. 第一次微調模型
+    3. 第二次微調模型
+    """
+    print("\n" + "=" * 80)
+    print("📊 新數據測試 - 三模型比較")
+    print("=" * 80)
+    # 載入測試數據
+    df_test = pd.read_csv(test_file_path)
+    df_clean = pd.DataFrame({
+        'text': df_test['Text'],
+        'label': df_test['label']
+    })
+    df_clean = df_clean.dropna()
+    print(f"\n測試數據:")
+    print(f"  總筆數: {len(df_clean)}")
+    print(f"  存活 (0): {sum(df_clean['label']==0)} 筆")
+    print(f"  死亡 (1): {sum(df_clean['label']==1)} 筆")
+    # 準備測試數據
+    tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
+    test_dataset = Dataset.from_pandas(df_clean[['text', 'label']])
+    def preprocess_function(examples):
+        return tokenizer(examples['text'], truncation=True, padding='max_length', max_length=256)
+    test_tokenized = test_dataset.map(preprocess_function, batched=True)
+    # 評估函數
+    def evaluate_model(model, dataset_name):
+        model.eval()
+        trainer_args = TrainingArguments(
+            output_dir='./temp_test',
+            per_device_eval_batch_size=32,
+            report_to="none"
+        )
+        trainer = Trainer(
+            model=model,
+            args=trainer_args,
+        )
+        predictions_output = trainer.predict(test_tokenized)
+        all_preds = predictions_output.predictions.argmax(-1)
+        all_labels = predictions_output.label_ids
+        probs = torch.nn.functional.softmax(torch.tensor(predictions_output.predictions), dim=-1)[:, 1].numpy()
+        precision, recall, f1, _ = precision_recall_fscore_support(
+            all_labels, all_preds, average='binary', pos_label=1, zero_division=0
+        )
+        acc = accuracy_score(all_labels, all_preds)
+        try:
+            auc = roc_auc_score(all_labels, probs)
+        except:
+            auc = 0.0
+        cm = confusion_matrix(all_labels, all_preds)
+        if cm.shape == (2, 2):
+            tn, fp, fn, tp = cm.ravel()
+            sensitivity = tp / (tp + fn) if (tp + fn) > 0 else 0
+            specificity = tn / (tn + fp) if (tn + fp) > 0 else 0
+        else:
+            sensitivity = specificity = 0
+            tn = fp = fn = tp = 0
+        results = {
+            'f1': float(f1),
+            'accuracy': float(acc),
+            'precision': float(precision),
+            'recall': float(recall),
+            'sensitivity': float(sensitivity),
+            'specificity': float(specificity),
+            'auc': float(auc),
+            'tp': int(tp),
+            'tn': int(tn),
+            'fp': int(fp),
+            'fn': int(fn)
+        }
+        print(f"\n✅ {dataset_name} 評估完成")
+        del trainer
+        torch.cuda.empty_cache()
+        gc.collect()
+        return results
+    all_results = {}
+    # 1. 評估純 BERT
+    if baseline_model_path != "跳過":
+        print("\n" + "-" * 80)
+        print("1️⃣ 評估純 BERT (Baseline)")
+        print("-" * 80)
+        baseline_model = BertForSequenceClassification.from_pretrained(
+            "bert-base-uncased",
+            num_labels=2
+        ).to(device)
+        all_results['baseline'] = evaluate_model(baseline_model, "純 BERT")
+        del baseline_model
+        torch.cuda.empty_cache()
+    else:
+        all_results['baseline'] = None
+    # 2. 評估第一次微調模型
+    if first_model_path != "請選擇":
+        print("\n" + "-" * 80)
+        print("2️⃣ 評估第一次微調模型")
+        print("-" * 80)
+        # 讀取模型資訊
+        with open('./saved_models_list.json', 'r') as f:
+            models_list = json.load(f)
+        first_model_info = None
+        for model_info in models_list:
+            if model_info['model_path'] == first_model_path:
+                first_model_info = model_info
+                break
+        if first_model_info:
+            tuning_method = first_model_info['tuning_method']
+            if tuning_method in ["LoRA", "AdaLoRA"] and PEFT_AVAILABLE:
+                base_model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)
+                first_model = PeftModel.from_pretrained(base_model, first_model_path)
+                first_model = first_model.to(device)
+            else:
+                first_model = BertForSequenceClassification.from_pretrained(first_model_path).to(device)
+            all_results['first'] = evaluate_model(first_model, "第一次微調模型")
+            del first_model
+            torch.cuda.empty_cache()
+        else:
+            all_results['first'] = None
+    else:
+        all_results['first'] = None
+    # 3. 評估第二次微調模型
+    if second_model_path != "請選擇":
+        print("\n" + "-" * 80)
+        print("3️⃣ 評估第二次微調模型")
+        print("-" * 80)
+        # 讀取模型資訊
+        with open('./saved_models_list.json', 'r') as f:
+            models_list = json.load(f)
+        second_model_info = None
+        for model_info in models_list:
+            if model_info['model_path'] == second_model_path:
+                second_model_info = model_info
+                break
+        if second_model_info:
+            tuning_method = second_model_info['tuning_method']
+            if tuning_method in ["LoRA", "AdaLoRA"] and PEFT_AVAILABLE:
+                base_model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)
+                second_model = PeftModel.from_pretrained(base_model, second_model_path)
+                second_model = second_model.to(device)
+            else:
+                second_model = BertForSequenceClassification.from_pretrained(second_model_path).to(device)
+            all_results['second'] = evaluate_model(second_model, "第二次微調模型")
+            del second_model
+            torch.cuda.empty_cache()
+        else:
+            all_results['second'] = None
+    else:
+        all_results['second'] = None
+    print("\n" + "=" * 80)
+    print("✅ 新數據測試完成")
+    print("=" * 80)
+    return all_results
+# ==================== 預測函數(保持原樣) ====================
+def predict_text(model_choice, text_input):
+    """
+    預測功能 - 支持選擇已訓練的模型,並同時顯示未微調和微調的預測結果
+    """
+    if not text_input or text_input.strip() == "":
+        return "請輸入文本", "請輸入文本"
+    try:
+        # ==================== 未微調的 BERT 預測 ====================
+        print("\n使用未微調 BERT 預測...")
+        baseline_tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
+        baseline_model = BertForSequenceClassification.from_pretrained(
+            "bert-base-uncased",
+            num_labels=2
+        ).to(device)
+        baseline_model.eval()
+        # Tokenize 輸入(未微調)
+        baseline_inputs = baseline_tokenizer(
+            text_input,
+            truncation=True,
+            padding='max_length',
+            max_length=256,
+            return_tensors='pt'
+        ).to(device)
+        # 預測(未微調)
+        with torch.no_grad():
+            baseline_outputs = baseline_model(**baseline_inputs)
+            baseline_probs = torch.nn.functional.softmax(baseline_outputs.logits, dim=-1)
+            baseline_pred_class = baseline_probs.argmax(-1).item()
+            baseline_confidence = baseline_probs[0][baseline_pred_class].item()
+        baseline_result = "存活" if baseline_pred_class == 0 else "死亡"
+        baseline_prob_survive = baseline_probs[0][0].item()
+        baseline_prob_death = baseline_probs[0][1].item()
+        baseline_output = f"""
+# 🔵 未微調 BERT 預測結果
+## 預測類別: **{baseline_result}**
+## 信心度: **{baseline_confidence:.1%}**
+## 機率分布:
+- 🟢 **存活機率**: {baseline_prob_survive:.2%}
+- 🔴 **死亡機率**: {baseline_prob_death:.2%}
+---
+**說明**: 此為原始 BERT 模型,未經任何領域資料訓練
+        """
+        # 清空記憶體
+        del baseline_model
+        del baseline_tokenizer
+        torch.cuda.empty_cache()
+        # ==================== 微調後的 BERT 預測 ====================
+        if model_choice == "請先訓練模型":
+            finetuned_output = """
+# 🟢 微調 BERT 預測結果
+❌ 尚未訓練任何模型,請先在「模型訓練」頁面訓練模型
+            """
+            return baseline_output, finetuned_output
+        # 解析選擇的模型路徑
+        model_path = model_choice.split(" | ")[0].replace("路徑: ", "")
+        # 從 JSON 讀取模型資訊
+        with open('./saved_models_list.json', 'r') as f:
+            models_list = json.load(f)
+        selected_model_info = None
+        for model_info in models_list:
+            if model_info['model_path'] == model_path:
+                selected_model_info = model_info
+                break
+        if selected_model_info is None:
+            finetuned_output = f"""
+# 🟢 微調 BERT 預測結果
+❌ 找不到模型:{model_path}
+            """
+            return baseline_output, finetuned_output
+        print(f"\n使用微調模型: {model_path}")
+        # 載入 tokenizer
+        finetuned_tokenizer = BertTokenizer.from_pretrained(model_path)
+        # 載入模型
+        tuning_method = selected_model_info['tuning_method']
+        if tuning_method in ["LoRA", "AdaLoRA"] and PEFT_AVAILABLE:
+            # 載入 PEFT 模型
+            base_model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)
+            finetuned_model = PeftModel.from_pretrained(base_model, model_path)
+            finetuned_model = finetuned_model.to(device)
+        else:
+            # 載入一般模型
+            finetuned_model = BertForSequenceClassification.from_pretrained(model_path).to(device)
+        finetuned_model.eval()
+        # Tokenize 輸入(微調)
+        finetuned_inputs = finetuned_tokenizer(
+            text_input,
+            truncation=True,
+            padding='max_length',
+            max_length=256,
+            return_tensors='pt'
+        ).to(device)
+        # 預測(微調)
+        with torch.no_grad():
+            finetuned_outputs = finetuned_model(**finetuned_inputs)
+            finetuned_probs = torch.nn.functional.softmax(finetuned_outputs.logits, dim=-1)
+            finetuned_pred_class = finetuned_probs.argmax(-1).item()
+            finetuned_confidence = finetuned_probs[0][finetuned_pred_class].item()
+        finetuned_result = "存活" if finetuned_pred_class == 0 else "死亡"
+        finetuned_prob_survive = finetuned_probs[0][0].item()
+        finetuned_prob_death = finetuned_probs[0][1].item()
+        training_type_label = "二次微調" if selected_model_info.get('is_second_finetuning', False) else "第一次微調"
+        finetuned_output = f"""
+# 🟢 微調 BERT 預測結果
+## 預測類別: **{finetuned_result}**
+## 信心度: **{finetuned_confidence:.1%}**
+## 機率分布:
+- 🟢 **存活機率**: {finetuned_prob_survive:.2%}
+- 🔴 **死亡機率**: {finetuned_prob_death:.2%}
+---
+### 模型資訊:
+- **訓練類型**: {training_type_label}
+- **微調方法**: {selected_model_info['tuning_method']}
+- **最佳化指標**: {selected_model_info['best_metric']}
+- **訓練時間**: {selected_model_info['timestamp']}
+- **模型路徑**: {model_path}
+---
+**注意**: 此預測僅供參考,實際醫療決策應由專業醫師判斷。
+        """
+        # 清空記憶體
+        del finetuned_model
+        del finetuned_tokenizer
+        torch.cuda.empty_cache()
+        return baseline_output, finetuned_output
+    except Exception as e:
+        import traceback
+        error_msg = f"❌ 預測錯誤:{str(e)}\n\n詳細錯誤訊息:\n{traceback.format_exc()}"
+        return error_msg, error_msg
+def get_available_models():
+    """
+    取得所有已訓練的模型列表
+    """
+    models_list_file = './saved_models_list.json'
+    if not os.path.exists(models_list_file):
+        return ["請先訓練模型"]
+    with open(models_list_file, 'r') as f:
+        models_list = json.load(f)
+    if len(models_list) == 0:
+        return ["請先訓練模型"]
+    # 格式化模型選項
+    model_choices = []
+    for i, model_info in enumerate(models_list, 1):
+        training_type = model_info.get('training_type', '第一次微調')
+        choice = f"路徑: {model_info['model_path']} | 類型: {training_type} | 方法: {model_info['tuning_method']} | 時間: {model_info['timestamp']}"
+        model_choices.append(choice)
+    return model_choices
+def get_first_finetuning_models():
+    """
+    取得所有第一次微調的模型(用於二次微調選擇)
+    """
+    models_list_file = './saved_models_list.json'
+    if not os.path.exists(models_list_file):
+        return ["請先進行第一次微調"]
+    with open(models_list_file, 'r') as f:
+        models_list = json.load(f)
+    # 只返回第一次微調的模型
+    first_models = [m for m in models_list if not m.get('is_second_finetuning', False)]
+    if len(first_models) == 0:
+        return ["請先進行第一次微調"]
+    model_choices = []
+    for model_info in first_models:
+        choice = f"{model_info['model_path']}"
+        model_choices.append(choice)
+    return model_choices
+# ==================== Wrapper 函數 ====================
+def train_first_wrapper(
+    file, tuning_method, weight_mult, epochs, batch_size, lr, warmup, best_metric,
+    lora_r, lora_alpha, lora_dropout, lora_modules,
+    adalora_init_r, adalora_target_r, adalora_tinit, adalora_tfinal, adalora_delta_t
+):
+    """第一次微調的包裝函數"""
+    if file is None:
+        return "請上傳 CSV 檔案", "", ""
+    try:
+        results = run_original_code_with_tuning(
+            file_path=file.name,
+            weight_multiplier=weight_mult,
+            epochs=int(epochs),
+            batch_size=int(batch_size),
+            learning_rate=lr,
+            warmup_steps=int(warmup),
+            tuning_method=tuning_method,
+            best_metric=best_metric,
+            lora_r=lora_r,
+            lora_alpha=lora_alpha,
+            lora_dropout=lora_dropout,
+            lora_modules=lora_modules,
+            adalora_init_r=adalora_init_r,
+            adalora_target_r=adalora_target_r,
+            adalora_tinit=adalora_tinit,
+            adalora_tfinal=adalora_tfinal,
+            adalora_delta_t=adalora_delta_t,
+            is_second_finetuning=False
+        )
+        baseline_results = results['baseline_results']
+        # 格式化輸出
+        data_info = f"""
+# 📊 資料資訊 (第一次微調)
+## 🔧 訓練配置
+- **微調方法**: {results['tuning_method']}
+- **最佳化指標**: {results['best_metric']}
+- **最佳指標值**: {results['best_metric_value']:.4f}
+## ⚙️ 訓練參數
+- **權重倍數**: {weight_mult}x
+- **訓練輪數**: {epochs}
+- **批次大小**: {batch_size}
+- **學習率**: {lr}
+- **Warmup Steps**: {warmup}
+✅ 第一次微調完成!可進行二次微調或預測!
+        """
+        baseline_output = f"""
+# 🔵 純 BERT (Baseline)
+### 📈 評估指標
+| 指標 | 數值 |
+|------|------|
+| **F1 Score** | {baseline_results['f1']:.4f} |
+| **Accuracy** | {baseline_results['accuracy']:.4f} |
+| **Precision** | {baseline_results['precision']:.4f} |
+| **Recall** | {baseline_results['recall']:.4f} |
+| **Sensitivity** | {baseline_results['sensitivity']:.4f} |
+| **Specificity** | {baseline_results['specificity']:.4f} |
+| **AUC** | {baseline_results['auc']:.4f} |
+### 📈 混淆矩陣
+|  | 預測:存活 | 預測:死亡 |
+|---|-----------|-----------|
+| **實際:存活** | TN={baseline_results['tn']} | FP={baseline_results['fp']} |
+| **實際:死亡** | FN={baseline_results['fn']} | TP={baseline_results['tp']} |
+        """
+        finetuned_output = f"""
+# 🟢 第一次微調 BERT
+### 📈 評估指標
+| 指標 | 數值 |
+|------|------|
+| **F1 Score** | {results['eval_f1']:.4f} |
+| **Accuracy** | {results['eval_accuracy']:.4f} |
+| **Precision** | {results['eval_precision']:.4f} |
+| **Recall** | {results['eval_recall']:.4f} |
+| **Sensitivity** | {results['eval_sensitivity']:.4f} |
+| **Specificity** | {results['eval_specificity']:.4f} |
+| **AUC** | {results['eval_auc']:.4f} |
+### 📈 混淆矩陣
+|  | 預測:存活 | 預測:死亡 |
+|---|-----------|-----------|
+| **實際:存活** | TN={results['eval_tn']} | FP={results['eval_fp']} |
+| **實際:死亡** | FN={results['eval_fn']} | TP={results['eval_tp']} |
+        """
+        return data_info, baseline_output, finetuned_output
+    except Exception as e:
+        import traceback
+        error_msg = f"❌ 錯誤:{str(e)}\n\n詳細錯誤訊息:\n{traceback.format_exc()}"
+        return error_msg, "", ""
+def train_second_wrapper(
+    base_model_choice, file, weight_mult, epochs, batch_size, lr, warmup, best_metric
+):
+    """二次微調的包裝函數"""
+    if base_model_choice == "請先進行第一次微調":
+        return "請先在「第一次微調」頁面訓練模型", ""
+    if file is None:
+        return "請上傳新的訓練數據 CSV 檔案", ""
+    try:
+        # 解析基礎模型路徑
+        base_model_path = base_model_choice
+        # 讀取第一次模型資訊
+        with open('./saved_models_list.json', 'r') as f:
+            models_list = json.load(f)
+        base_model_info = None
+        for model_info in models_list:
+            if model_info['model_path'] == base_model_path:
+                base_model_info = model_info
+                break
+        if base_model_info is None:
+            return "找不到基礎模型資訊", ""
+        # 使用第一次的參數(二次微調不更換方法)
+        tuning_method = base_model_info['tuning_method']
+        # 獲取第一次的 PEFT 參數
+        lora_r = 16
+        lora_alpha = 32
+        lora_dropout = 0.1
+        lora_modules = "query,value"
+        adalora_init_r = 12
+        adalora_target_r = 8
+        adalora_tinit = 0
+        adalora_tfinal = 0
+        adalora_delta_t = 1
+        results = run_original_code_with_tuning(
+            file_path=file.name,
+            weight_multiplier=weight_mult,
+            epochs=int(epochs),
+            batch_size=int(batch_size),
+            learning_rate=lr,
+            warmup_steps=int(warmup),
+            tuning_method=tuning_method,
+            best_metric=best_metric,
+            lora_r=lora_r,
+            lora_alpha=lora_alpha,
+            lora_dropout=lora_dropout,
+            lora_modules=lora_modules,
+            adalora_init_r=adalora_init_r,
+            adalora_target_r=adalora_target_r,
+            adalora_tinit=adalora_tinit,
+            adalora_tfinal=adalora_tfinal,
+            adalora_delta_t=adalora_delta_t,
+            is_second_finetuning=True,
+            base_model_path=base_model_path
+        )
+        data_info = f"""
+# 📊 二次微調結果
+## 🔧 訓練配置
+- **基礎模型**: {base_model_path}
+- **微調方法**: {results['tuning_method']} (繼承自第一次)
+- **最佳化指標**: {results['best_metric']}
+- **最佳指標值**: {results['best_metric_value']:.4f}
+## ⚙️ 訓練參數
+- **權重倍數**: {weight_mult}x
+- **訓練輪數**: {epochs}
+- **批次大小**: {batch_size}
+- **學習率**: {lr}
+- **Warmup Steps**: {warmup}
+✅ 二次微調完成!可進行預測或新數據測試!
+        """
+        finetuned_output = f"""
+# 🟢 二次微調 BERT
+### 📈 評估指標
+| 指標 | 數值 |
+|------|------|
+| **F1 Score** | {results['eval_f1']:.4f} |
+| **Accuracy** | {results['eval_accuracy']:.4f} |
+| **Precision** | {results['eval_precision']:.4f} |
+| **Recall** | {results['eval_recall']:.4f} |
+| **Sensitivity** | {results['eval_sensitivity']:.4f} |
+| **Specificity** | {results['eval_specificity']:.4f} |
+| **AUC** | {results['eval_auc']:.4f} |
+### 📈 混淆矩陣
+|  | 預測:存活 | 預測:死亡 |
+|---|-----------|-----------|
+| **實際:存活** | TN={results['eval_tn']} | FP={results['eval_fp']} |
+| **實際:死亡** | FN={results['eval_fn']} | TP={results['eval_tp']} |
+        """
+        return data_info, finetuned_output
+    except Exception as e:
+        import traceback
+        error_msg = f"❌ 錯誤:{str(e)}\n\n詳細錯誤訊息:\n{traceback.format_exc()}"
+        return error_msg, ""
+def test_new_data_wrapper(test_file, baseline_choice, first_choice, second_choice):
+    """新數據測試的包裝函數"""
+    if test_file is None:
+        return "請上傳測試數據 CSV 檔案", "", ""
+    try:
+        all_results = test_on_new_data(
+            test_file.name,
+            baseline_choice,
+            first_choice,
+            second_choice
+        )
+        # 格式化輸出
+        outputs = []
+        # 1. 純 BERT
+        if all_results['baseline']:
+            r = all_results['baseline']
+            baseline_output = f"""
+# 🔵 純 BERT (Baseline)
+| 指標 | 數值 |
+|------|------|
+| **F1 Score** | {r['f1']:.4f} |
+| **Accuracy** | {r['accuracy']:.4f} |
+| **Precision** | {r['precision']:.4f} |
+| **Recall** | {r['recall']:.4f} |
+| **Sensitivity** | {r['sensitivity']:.4f} |
+| **Specificity** | {r['specificity']:.4f} |
+| **AUC** | {r['auc']:.4f} |
+### 混淆矩陣
+|  | 預測:存活 | 預測:死亡 |
+|---|-----------|-----------|
+| **實際:存活** | TN={r['tn']} | FP={r['fp']} |
+| **實際:死亡** | FN={r['fn']} | TP={r['tp']} |
+            """
+        else:
+            baseline_output = "未選擇評估純 BERT"
+        outputs.append(baseline_output)
+        # 2. 第一次微調
+        if all_results['first']:
+            r = all_results['first']
+            first_output = f"""
+# 🟢 第一次微調模型
+| 指標 | 數值 |
+|------|------|
+| **F1 Score** | {r['f1']:.4f} |
+| **Accuracy** | {r['accuracy']:.4f} |
+| **Precision** | {r['precision']:.4f} |
+| **Recall** | {r['recall']:.4f} |
+| **Sensitivity** | {r['sensitivity']:.4f} |
+| **Specificity** | {r['specificity']:.4f} |
+| **AUC** | {r['auc']:.4f} |
+### 混淆矩陣
+|  | 預測:存活 | 預測:死亡 |
+|---|-----------|-----------|
+| **實際:存活** | TN={r['tn']} | FP={r['fp']} |
+| **實際:死亡** | FN={r['fn']} | TP={r['tp']} |
+            """
+        else:
+            first_output = "未選擇第一次微調模型"
+        outputs.append(first_output)
+        # 3. 第二次微調
+        if all_results['second']:
+            r = all_results['second']
+            second_output = f"""
+# 🟡 第二次微調模型
+| 指標 | 數值 |
+|------|------|
+| **F1 Score** | {r['f1']:.4f} |
+| **Accuracy** | {r['accuracy']:.4f} |
+| **Precision** | {r['precision']:.4f} |
+| **Recall** | {r['recall']:.4f} |
+| **Sensitivity** | {r['sensitivity']:.4f} |
+| **Specificity** | {r['specificity']:.4f} |
+| **AUC** | {r['auc']:.4f} |
+### 混淆矩陣
+|  | 預測:存活 | 預測:死亡 |
+|---|-----------|-----------|
+| **實際:存活** | TN={r['tn']} | FP={r['fp']} |
+| **實際:死亡** | FN={r['fn']} | TP={r['tp']} |
+            """
+        else:
+            second_output = "未選擇第二次微調模型"
+        outputs.append(second_output)
+        return outputs[0], outputs[1], outputs[2]
+    except Exception as e:
+        import traceback
+        error_msg = f"❌ 錯誤:{str(e)}\n\n詳細錯誤訊息:\n{traceback.format_exc()}"
+        return error_msg, "", ""
+# ============================================================================
+# Gradio 介面
+# ============================================================================
+with gr.Blocks(title="BERT 二次微調平台", theme=gr.themes.Soft()) as demo:
+    gr.Markdown("""
+    # 🥼 BERT 乳癌存活預測 - 二次微調完整平台
+    ### 🌟 功能特色:
+    - 🎯 第一次微調:從純 BERT 開始訓練
+    - 🔄 第二次微調:基於第一次模型用新數據繼續訓練
+    - 📊 新數據測試:比較三個模型在新數據的表現
+    - 🔮 預測功能:使用訓練好的模型進行預測
+    """)
+    # Tab 1: 第一次微調
+    with gr.Tab("1️⃣ 第一次微調"):
+        with gr.Row():
+            with gr.Column(scale=1):
+                gr.Markdown("### 📤 資料上傳")
+                file_input_first = gr.File(label="上傳訓練數據 CSV", file_types=[".csv"])
+                gr.Markdown("### 🔧 微調方法選擇")
+                tuning_method_first = gr.Radio(
+                    choices=["Full Fine-tuning", "LoRA", "AdaLoRA"],
+                    value="Full Fine-tuning",
+                    label="選擇微調方法"
+                )
+                gr.Markdown("### 🎯 最佳模型選擇")
+                best_metric_first = gr.Dropdown(
+                    choices=["f1", "accuracy", "precision", "recall", "sensitivity", "specificity", "auc"],
+                    value="f1",
+                    label="選擇最佳化指標"
+                )
+                gr.Markdown("### ⚙️ 訓練參數")
+                weight_slider_first = gr.Slider(0.1, 2.0, value=0.8, step=0.1, label="權重倍數")
+                epochs_input_first = gr.Number(value=8, label="訓練輪數")
+                batch_size_input_first = gr.Number(value=16, label="批次大小")
+                lr_input_first = gr.Number(value=2e-5, label="學習率")
+                warmup_input_first = gr.Number(value=200, label="Warmup Steps")
+                # LoRA 參數
+                with gr.Column(visible=False) as lora_params_first:
+                    gr.Markdown("### 🔷 LoRA 參數")
+                    lora_r_first = gr.Slider(4, 64, value=16, step=4, label="LoRA Rank (r)")
+                    lora_alpha_first = gr.Slider(8, 128, value=32, step=8, label="LoRA Alpha")
+                    lora_dropout_first = gr.Slider(0.0, 0.5, value=0.1, step=0.05, label="LoRA Dropout")
+                    lora_modules_first = gr.Textbox(value="query,value", label="目標模組")
+                # AdaLoRA 參數
+                with gr.Column(visible=False) as adalora_params_first:
+                    gr.Markdown("### 🔶 AdaLoRA 參數")
+                    adalora_init_r_first = gr.Slider(4, 64, value=12, step=4, label="初始 Rank")
+                    adalora_target_r_first = gr.Slider(4, 64, value=8, step=4, label="目標 Rank")
+                    adalora_tinit_first = gr.Number(value=0, label="Tinit")
+                    adalora_tfinal_first = gr.Number(value=0, label="Tfinal")
+                    adalora_delta_t_first = gr.Number(value=1, label="Delta T")
+                train_button_first = gr.Button("🚀 開始第一次微調", variant="primary", size="lg")
+            with gr.Column(scale=2):
+                gr.Markdown("### 📊 第一次微調結果")
+                data_info_output_first = gr.Markdown(value="等待訓練...")
+                with gr.Row():
+                    baseline_output_first = gr.Markdown(value="### 純 BERT\n等待訓練...")
+                    finetuned_output_first = gr.Markdown(value="### 第一次微調\n等待訓練...")
+    # Tab 2: 二次微調
+    with gr.Tab("2️⃣ 二次微調"):
+        with gr.Row():
+            with gr.Column(scale=1):
+                gr.Markdown("### 🔄 選擇基礎模型")
+                base_model_dropdown = gr.Dropdown(
+                    label="選擇第一次微調的模型",
+                    choices=["請先進行第一次微調"],
+                    value="請先進行第一次微調"
+                )
+                refresh_base_models = gr.Button("🔄 重新整理模型列表", size="sm")
+                gr.Markdown("### 📤 上傳新訓練數據")
+                file_input_second = gr.File(label="上傳新的訓練數據 CSV", file_types=[".csv"])
+                gr.Markdown("### ⚙️ 訓練參數")
+                gr.Markdown("⚠️ 微調方法將自動繼承第一次微調的方法")
+                best_metric_second = gr.Dropdown(
+                    choices=["f1", "accuracy", "precision", "recall", "sensitivity", "specificity", "auc"],
+                    value="f1",
+                    label="選擇最佳化指標"
+                )
+                weight_slider_second = gr.Slider(0.1, 2.0, value=0.8, step=0.1, label="權重倍數")
+                epochs_input_second = gr.Number(value=5, label="訓練輪數", info="建議比第一次少")
+                batch_size_input_second = gr.Number(value=16, label="批次大小")
+                lr_input_second = gr.Number(value=1e-5, label="學習率", info="建議比第一次小")
+                warmup_input_second = gr.Number(value=100, label="Warmup Steps")
+                train_button_second = gr.Button("🚀 開始二次微調", variant="primary", size="lg")
+            with gr.Column(scale=2):
+                gr.Markdown("### 📊 二次微調結果")
+                data_info_output_second = gr.Markdown(value="等待訓練...")
+                finetuned_output_second = gr.Markdown(value="### 二次微調\n等待訓練...")
+    # Tab 3: 新數據測試
+    with gr.Tab("3️⃣ 新數據測試"):
+        with gr.Row():
+            with gr.Column(scale=1):
+                gr.Markdown("### 📤 上傳測試數據")
+                test_file_input = gr.File(label="上傳測試數據 CSV", file_types=[".csv"])
+                gr.Markdown("### 🎯 選擇要比較的模型")
+                gr.Markdown("可選擇 1-3 個模型進行比較")
+                baseline_test_choice = gr.Radio(
+                    choices=["評估純 BERT", "跳過"],
+                    value="評估純 BERT",
+                    label="純 BERT (Baseline)"
+                )
+                first_model_test_dropdown = gr.Dropdown(
+                    label="第一次微調模型",
+                    choices=["請選擇"],
+                    value="請選擇"
+                )
+                second_model_test_dropdown = gr.Dropdown(
+                    label="第二次微調模型",
+                    choices=["請選擇"],
+                    value="請選擇"
+                )
+                refresh_test_models = gr.Button("🔄 重新整理模型列表", size="sm")
+                test_button = gr.Button("📊 開始測試", variant="primary", size="lg")
+            with gr.Column(scale=2):
+                gr.Markdown("### 📊 新數據測試結果 - 三模型比較")
+                with gr.Row():
+                    baseline_test_output = gr.Markdown(value="### 純 BERT\n等待測試...")
+                    first_test_output = gr.Markdown(value="### 第一次微調\n等待測試...")
+                    second_test_output = gr.Markdown(value="### 二次微調\n等待測試...")
+    # Tab 4: 預測
+    with gr.Tab("4️⃣ 模型預測"):
+        gr.Markdown("""
+        ### 使用訓練好的模型進行預測
+        選擇已訓練的模型,輸入病歷文本進行預測。
+        """)
+        with gr.Row():
+            with gr.Column():
+                model_dropdown = gr.Dropdown(
+                    label="選擇模型",
+                    choices=["請先訓練模型"],
+                    value="請先訓練模型"
+                )
+                refresh_predict_models = gr.Button("🔄 重新整理模型列表", size="sm")
+                text_input = gr.Textbox(
+                    label="輸入病歷文本",
+                    placeholder="請輸入患者的病歷描述(英文)...",
+                    lines=10
+                )
+                predict_button = gr.Button("🔮 開始預測", variant="primary", size="lg")
+            with gr.Column():
+                gr.Markdown("### 預測結果比較")
+                baseline_prediction_output = gr.Markdown(label="未微調 BERT", value="等待預測...")
+                finetuned_prediction_output = gr.Markdown(label="微調 BERT", value="等待預測...")
+    # Tab 5: 使用說明
+    with gr.Tab("📖 使用說明"):
+        gr.Markdown("""
+        ## 🔄 二次微調流程說明
+        ### 步驟 1: 第一次微調
+        1. 上傳訓練數據 A (CSV 格式: Text, label)
+        2. 選擇微調方法 (Full Fine-tuning / LoRA / AdaLoRA)
+        3. 調整訓練參數
+        4. 開始訓練
+        5. 系統會自動比較純 BERT vs 第一次微調的表現
+        ### 步驟 2: 二次微調
+        1. 選擇已訓練的第一次微調模型
+        2. 上傳新的訓練數據 B
+        3. 調整訓練參數 (建議 epochs 更少, learning rate 更小)
+        4. 開始訓練 (方法自動繼承第一次)
+        5. 模型會基於第一次的權重繼續學習
+        ### 步驟 3: 新數據測試
+        1. 上傳測試數據 C
+        2. 選擇要比較的模型 (純 BERT / 第一次 / 第二次)
+        3. 系統會並排顯示三個模型的表現
+        ### 步驟 4: 預測
+        1. 選擇任一已訓練模型
+        2. 輸入病歷文本
+        3. 查看預測結果
+        ## 🎯 微調方法說明
+        | 方法 | 訓練速度 | 記憶體 | 效果 |
+        |------|---------|--------|------|
+        | **Full Fine-tuning** | 1x (基準) | 高 | 最佳 |
+        | **LoRA** | 3-5x 快 | 低 | 良好 |
+        | **AdaLoRA** | 3-5x 快 | 低 | 良好 |
+        ## 💡 二次微調建議
+        ### 訓練參數調整:
+        - **Epochs**: 第二次建議 3-5 輪 (第一次通常 8-10 輪)
+        - **Learning Rate**: 第二次建議 1e-5 (第一次通常 2e-5)
+        - **Warmup Steps**: 第二次建議減半
+        ### 適用場景:
+        1. **領域適應**: 第一次用通用醫療數據,第二次用特定醫院數據
+        2. **增量學習**: 隨時間增加新病例數據
+        3. **數據稀缺**: 先用大量相關數據預訓練,再用少量目標數據微調
+        ## ⚠️ 注意事項
+        - CSV 格式必須包含 `Text` 和 `label` 欄位
+        - 第二次微調會自動使用第一次的微調方法
+        - 建議第二次的學習率比第一次小,避免破壞已學習的知識
+        - 新數據測試可以同時評估最多 3 個模型
+        ## 📊 指標說明
+        - **F1 Score**: 平衡指標,綜合考慮精確率和召回率
+        - **Accuracy**: 整體準確率
+        - **Precision**: 預測為死亡中的準確率
+        - **Recall/Sensitivity**: 實際死亡中被正確識別的比例
+        - **Specificity**: 實際存活中被正確識別的比例
+        - **AUC**: ROC 曲線下面積,整體分類能力
+        """)
+    # ==================== 事件綁定 ====================
+    # 第一次微調 - 參數面板顯示/隱藏
+    def update_first_params(method):
+        if method == "LoRA":
+            return gr.update(visible=True), gr.update(visible=False)
+        elif method == "AdaLoRA":
+            return gr.update(visible=True), gr.update(visible=True)
+        else:
+            return gr.update(visible=False), gr.update(visible=False)
+    tuning_method_first.change(
+        fn=update_first_params,
+        inputs=[tuning_method_first],
+        outputs=[lora_params_first, adalora_params_first]
+    )
+    # 第一次微調按鈕
+    train_button_first.click(
+        fn=train_first_wrapper,
+        inputs=[
+            file_input_first, tuning_method_first, weight_slider_first,
+            epochs_input_first, batch_size_input_first, lr_input_first,
+            warmup_input_first, best_metric_first,
+            lora_r_first, lora_alpha_first, lora_dropout_first, lora_modules_first,
+            adalora_init_r_first, adalora_target_r_first, adalora_tinit_first,
+            adalora_tfinal_first, adalora_delta_t_first
+        ],
+        outputs=[data_info_output_first, baseline_output_first, finetuned_output_first]
+    )
+    # 刷新基礎模型列表
+    def refresh_base_models_list():
+        choices = get_first_finetuning_models()
+        return gr.update(choices=choices, value=choices[0])
+    refresh_base_models.click(
+        fn=refresh_base_models_list,
+        outputs=[base_model_dropdown]
+    )
+    # 二次微調按鈕
+    train_button_second.click(
+        fn=train_second_wrapper,
+        inputs=[
+            base_model_dropdown, file_input_second, weight_slider_second,
+            epochs_input_second, batch_size_input_second, lr_input_second,
+            warmup_input_second, best_metric_second
+        ],
+        outputs=[data_info_output_second, finetuned_output_second]
+    )
+    # 刷新測試模型列表
+    def refresh_test_models_list():
+        all_models = get_available_models()
+        first_models = get_first_finetuning_models()
+        # 篩選第二次微調模型
+        with open('./saved_models_list.json', 'r') as f:
+            models_list = json.load(f)
+        second_models = [m['model_path'] for m in models_list if m.get('is_second_finetuning', False)]
+        if len(second_models) == 0:
+            second_models = ["請選擇"]
+        return (
+            gr.update(choices=first_models if first_models[0] != "請先進行第一次微調" else ["請選擇"], value="請選擇"),
+            gr.update(choices=second_models, value="請選擇")
+        )
+    refresh_test_models.click(
+        fn=refresh_test_models_list,
+        outputs=[first_model_test_dropdown, second_model_test_dropdown]
+    )
+    # 測試按鈕
+    test_button.click(
+        fn=test_new_data_wrapper,
+        inputs=[test_file_input, baseline_test_choice, first_model_test_dropdown, second_model_test_dropdown],
+        outputs=[baseline_test_output, first_test_output, second_test_output]
+    )
+    # 刷新預測模型列表
+    def refresh_predict_models_list():
+        choices = get_available_models()
+        return gr.update(choices=choices, value=choices[0])
+    refresh_predict_models.click(
+        fn=refresh_predict_models_list,
+        outputs=[model_dropdown]
+    )
+    # 預測按鈕
+    predict_button.click(
+        fn=predict_text,
+        inputs=[model_dropdown, text_input],
+        outputs=[baseline_prediction_output, finetuned_prediction_output]
+    )
+if __name__ == "__main__":
+    demo.launch()