haeylee
/

ssl_ft_pron

Feature Extraction

self-supervised-learning

pronunciation-assessment

Model card Files Files and versions

haeylee commited on Sep 24

Commit

ac652a1

·

verified ·

1 Parent(s): 566ddfd

Update README.md

Files changed (1) hide show

README.md +0 -39

README.md CHANGED Viewed

@@ -57,21 +57,13 @@ Three strategies are provided per backbone:
 ---
 ## Uses
-### Direct Use
 - Research/prototyping for **pronunciation scoring** and **representation analysis** (e.g., PCA on hidden states).
 - Feature extraction for downstream APA tasks.
-### Downstream Use
-- Integrate APA scores into CALL and assessment tools.
-- Use **CTC** variants in ASR-aligned pipelines; use **General/Freeze** for regression of APA scores.
 ---
 ## Bias, Risks, and Limitations
 - Trained/evaluated on **Speechocean762** (read English by L2 speakers). Generalization to other languages/speaking styles is not guaranteed.
 - APA relies on subjective human scores; apply domain calibration and monitor subgroup performance.
 **Recommendation:** Validate on in-domain data; report uncertainty and subgroup metrics.
 ---
@@ -178,37 +170,6 @@ python train/ctc.py \
 - **Test set:** Speechocean762 (held-out split prepared by `preprocess_dataset.py`)
 - **Factors:** Backbone (Wav2Vec2 / HuBERT / WavLM) × strategy (CTC / General / Freeze)
 - **Metric:** `pearsonr` (Pearson correlation coefficient, PCC) for Accuracy, Fluency, Prosody, and Total.
-### Results (PCC highlights)
-- **Best Total PCC (paper):** ~**0.745** (HuBERT xlarge ls960-ft; strong results for CTC/Freeze variants).
-- Wav2Vec2-large/960h show strong **Fluency**/**Total** under General.
-- Full table is in the paper and GitHub README.
-#### Summary
-- **CTC** benefits ASR-aligned objectives.
-- **General/Freeze** directly regress APA scores and support representation analysis (e.g., PCA).
----
-## Model Examination (Intrinsic Analysis)
-PCA on hidden representations reveals distinct geometries:
-- **Wav2Vec2:** conical (score continuity)
-- **HuBERT:** V-shape (two-axis decision)
-- **WavLM:** S-shape (diverse scoring factors)
----
-## Technical Specifications
-### Architecture & Objective
-- Backbones: Wav2Vec2.0 / HuBERT / WavLM
-- Objectives:
-  - **CTC:** ASR-style CTC head
-  - **General/Freeze:** regression head predicting 4 APA scores
-### Compute Infrastructure
-- See saved configs/logs per run (`trainer_state.json`, `training_args.bin`, `args.json`).
 ---
 ## Citation

 ---
 ## Uses
 - Research/prototyping for **pronunciation scoring** and **representation analysis** (e.g., PCA on hidden states).
 - Feature extraction for downstream APA tasks.
 ---
 ## Bias, Risks, and Limitations
 - Trained/evaluated on **Speechocean762** (read English by L2 speakers). Generalization to other languages/speaking styles is not guaranteed.
 - APA relies on subjective human scores; apply domain calibration and monitor subgroup performance.
 **Recommendation:** Validate on in-domain data; report uncertainty and subgroup metrics.
 ---
 - **Test set:** Speechocean762 (held-out split prepared by `preprocess_dataset.py`)
 - **Factors:** Backbone (Wav2Vec2 / HuBERT / WavLM) × strategy (CTC / General / Freeze)
 - **Metric:** `pearsonr` (Pearson correlation coefficient, PCC) for Accuracy, Fluency, Prosody, and Total.
 ---
 ## Citation