haeylee commited on
Commit
ac652a1
·
verified ·
1 Parent(s): 566ddfd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -39
README.md CHANGED
@@ -57,21 +57,13 @@ Three strategies are provided per backbone:
57
  ---
58
 
59
  ## Uses
60
-
61
- ### Direct Use
62
  - Research/prototyping for **pronunciation scoring** and **representation analysis** (e.g., PCA on hidden states).
63
  - Feature extraction for downstream APA tasks.
64
-
65
- ### Downstream Use
66
- - Integrate APA scores into CALL and assessment tools.
67
- - Use **CTC** variants in ASR-aligned pipelines; use **General/Freeze** for regression of APA scores.
68
-
69
  ---
70
 
71
  ## Bias, Risks, and Limitations
72
  - Trained/evaluated on **Speechocean762** (read English by L2 speakers). Generalization to other languages/speaking styles is not guaranteed.
73
  - APA relies on subjective human scores; apply domain calibration and monitor subgroup performance.
74
-
75
  **Recommendation:** Validate on in-domain data; report uncertainty and subgroup metrics.
76
 
77
  ---
@@ -178,37 +170,6 @@ python train/ctc.py \
178
  - **Test set:** Speechocean762 (held-out split prepared by `preprocess_dataset.py`)
179
  - **Factors:** Backbone (Wav2Vec2 / HuBERT / WavLM) × strategy (CTC / General / Freeze)
180
  - **Metric:** `pearsonr` (Pearson correlation coefficient, PCC) for Accuracy, Fluency, Prosody, and Total.
181
-
182
- ### Results (PCC highlights)
183
- - **Best Total PCC (paper):** ~**0.745** (HuBERT xlarge ls960-ft; strong results for CTC/Freeze variants).
184
- - Wav2Vec2-large/960h show strong **Fluency**/**Total** under General.
185
- - Full table is in the paper and GitHub README.
186
-
187
- #### Summary
188
- - **CTC** benefits ASR-aligned objectives.
189
- - **General/Freeze** directly regress APA scores and support representation analysis (e.g., PCA).
190
-
191
- ---
192
-
193
- ## Model Examination (Intrinsic Analysis)
194
- PCA on hidden representations reveals distinct geometries:
195
- - **Wav2Vec2:** conical (score continuity)
196
- - **HuBERT:** V-shape (two-axis decision)
197
- - **WavLM:** S-shape (diverse scoring factors)
198
-
199
- ---
200
-
201
- ## Technical Specifications
202
-
203
- ### Architecture & Objective
204
- - Backbones: Wav2Vec2.0 / HuBERT / WavLM
205
- - Objectives:
206
- - **CTC:** ASR-style CTC head
207
- - **General/Freeze:** regression head predicting 4 APA scores
208
-
209
- ### Compute Infrastructure
210
- - See saved configs/logs per run (`trainer_state.json`, `training_args.bin`, `args.json`).
211
-
212
  ---
213
 
214
  ## Citation
 
57
  ---
58
 
59
  ## Uses
 
 
60
  - Research/prototyping for **pronunciation scoring** and **representation analysis** (e.g., PCA on hidden states).
61
  - Feature extraction for downstream APA tasks.
 
 
 
 
 
62
  ---
63
 
64
  ## Bias, Risks, and Limitations
65
  - Trained/evaluated on **Speechocean762** (read English by L2 speakers). Generalization to other languages/speaking styles is not guaranteed.
66
  - APA relies on subjective human scores; apply domain calibration and monitor subgroup performance.
 
67
  **Recommendation:** Validate on in-domain data; report uncertainty and subgroup metrics.
68
 
69
  ---
 
170
  - **Test set:** Speechocean762 (held-out split prepared by `preprocess_dataset.py`)
171
  - **Factors:** Backbone (Wav2Vec2 / HuBERT / WavLM) × strategy (CTC / General / Freeze)
172
  - **Metric:** `pearsonr` (Pearson correlation coefficient, PCC) for Accuracy, Fluency, Prosody, and Total.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
173
  ---
174
 
175
  ## Citation