Update README.md
Browse files
README.md
CHANGED
|
@@ -57,21 +57,13 @@ Three strategies are provided per backbone:
|
|
| 57 |
---
|
| 58 |
|
| 59 |
## Uses
|
| 60 |
-
|
| 61 |
-
### Direct Use
|
| 62 |
- Research/prototyping for **pronunciation scoring** and **representation analysis** (e.g., PCA on hidden states).
|
| 63 |
- Feature extraction for downstream APA tasks.
|
| 64 |
-
|
| 65 |
-
### Downstream Use
|
| 66 |
-
- Integrate APA scores into CALL and assessment tools.
|
| 67 |
-
- Use **CTC** variants in ASR-aligned pipelines; use **General/Freeze** for regression of APA scores.
|
| 68 |
-
|
| 69 |
---
|
| 70 |
|
| 71 |
## Bias, Risks, and Limitations
|
| 72 |
- Trained/evaluated on **Speechocean762** (read English by L2 speakers). Generalization to other languages/speaking styles is not guaranteed.
|
| 73 |
- APA relies on subjective human scores; apply domain calibration and monitor subgroup performance.
|
| 74 |
-
|
| 75 |
**Recommendation:** Validate on in-domain data; report uncertainty and subgroup metrics.
|
| 76 |
|
| 77 |
---
|
|
@@ -178,37 +170,6 @@ python train/ctc.py \
|
|
| 178 |
- **Test set:** Speechocean762 (held-out split prepared by `preprocess_dataset.py`)
|
| 179 |
- **Factors:** Backbone (Wav2Vec2 / HuBERT / WavLM) × strategy (CTC / General / Freeze)
|
| 180 |
- **Metric:** `pearsonr` (Pearson correlation coefficient, PCC) for Accuracy, Fluency, Prosody, and Total.
|
| 181 |
-
|
| 182 |
-
### Results (PCC highlights)
|
| 183 |
-
- **Best Total PCC (paper):** ~**0.745** (HuBERT xlarge ls960-ft; strong results for CTC/Freeze variants).
|
| 184 |
-
- Wav2Vec2-large/960h show strong **Fluency**/**Total** under General.
|
| 185 |
-
- Full table is in the paper and GitHub README.
|
| 186 |
-
|
| 187 |
-
#### Summary
|
| 188 |
-
- **CTC** benefits ASR-aligned objectives.
|
| 189 |
-
- **General/Freeze** directly regress APA scores and support representation analysis (e.g., PCA).
|
| 190 |
-
|
| 191 |
-
---
|
| 192 |
-
|
| 193 |
-
## Model Examination (Intrinsic Analysis)
|
| 194 |
-
PCA on hidden representations reveals distinct geometries:
|
| 195 |
-
- **Wav2Vec2:** conical (score continuity)
|
| 196 |
-
- **HuBERT:** V-shape (two-axis decision)
|
| 197 |
-
- **WavLM:** S-shape (diverse scoring factors)
|
| 198 |
-
|
| 199 |
-
---
|
| 200 |
-
|
| 201 |
-
## Technical Specifications
|
| 202 |
-
|
| 203 |
-
### Architecture & Objective
|
| 204 |
-
- Backbones: Wav2Vec2.0 / HuBERT / WavLM
|
| 205 |
-
- Objectives:
|
| 206 |
-
- **CTC:** ASR-style CTC head
|
| 207 |
-
- **General/Freeze:** regression head predicting 4 APA scores
|
| 208 |
-
|
| 209 |
-
### Compute Infrastructure
|
| 210 |
-
- See saved configs/logs per run (`trainer_state.json`, `training_args.bin`, `args.json`).
|
| 211 |
-
|
| 212 |
---
|
| 213 |
|
| 214 |
## Citation
|
|
|
|
| 57 |
---
|
| 58 |
|
| 59 |
## Uses
|
|
|
|
|
|
|
| 60 |
- Research/prototyping for **pronunciation scoring** and **representation analysis** (e.g., PCA on hidden states).
|
| 61 |
- Feature extraction for downstream APA tasks.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 62 |
---
|
| 63 |
|
| 64 |
## Bias, Risks, and Limitations
|
| 65 |
- Trained/evaluated on **Speechocean762** (read English by L2 speakers). Generalization to other languages/speaking styles is not guaranteed.
|
| 66 |
- APA relies on subjective human scores; apply domain calibration and monitor subgroup performance.
|
|
|
|
| 67 |
**Recommendation:** Validate on in-domain data; report uncertainty and subgroup metrics.
|
| 68 |
|
| 69 |
---
|
|
|
|
| 170 |
- **Test set:** Speechocean762 (held-out split prepared by `preprocess_dataset.py`)
|
| 171 |
- **Factors:** Backbone (Wav2Vec2 / HuBERT / WavLM) × strategy (CTC / General / Freeze)
|
| 172 |
- **Metric:** `pearsonr` (Pearson correlation coefficient, PCC) for Accuracy, Fluency, Prosody, and Total.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 173 |
---
|
| 174 |
|
| 175 |
## Citation
|