Commit
·
5388d50
1
Parent(s):
3b66fd5
update README.md
Browse files
README.md
CHANGED
|
@@ -18,11 +18,16 @@ tags:
|
|
| 18 |
- Speaker Recognition
|
| 19 |
---
|
| 20 |
|
| 21 |
-
## SpeechT5 SID
|
| 22 |
|
| 23 |
| [**Github**](https://github.com/microsoft/SpeechT5) | [**Huggingface**](https://huggingface.co/mechanicalsea/speecht5-sid) |
|
| 24 |
|
| 25 |
-
This manifest is an attempt to recreate the Speaker Identification recipe used for training [SpeechT5](https://aclanthology.org/2022.acl-long.393). This manifest was constructed using [
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
|
| 27 |
### Requirements
|
| 28 |
|
|
@@ -35,8 +40,9 @@ This manifest is an attempt to recreate the Speaker Identification recipe used f
|
|
| 35 |
|
| 36 |
### Model and Results
|
| 37 |
|
| 38 |
-
- [`speecht5_sid.pt`](.) are reimplemented Speaker Identification fine-tuning on the released manifest **but with a smaller batch size** (Ensure the manifest is ok).
|
| 39 |
-
- `results` are reproduced by the released fine-tuned model.
|
|
|
|
| 40 |
|
| 41 |
### Reference
|
| 42 |
|
|
|
|
| 18 |
- Speaker Recognition
|
| 19 |
---
|
| 20 |
|
| 21 |
+
## SpeechT5 SID
|
| 22 |
|
| 23 |
| [**Github**](https://github.com/microsoft/SpeechT5) | [**Huggingface**](https://huggingface.co/mechanicalsea/speecht5-sid) |
|
| 24 |
|
| 25 |
+
This manifest is an attempt to recreate the Speaker Identification recipe used for training [SpeechT5](https://aclanthology.org/2022.acl-long.393). This manifest was constructed using [VoxCeleb1](https://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox1.html) containing over 100,000 utterances for 1,251 celebrities. The identification split are given as follows.
|
| 26 |
+
|
| 27 |
+
| | train | valid | test |
|
| 28 |
+
| ------------------- | ------: | ----: | ----: |
|
| 29 |
+
| **# of speakers** | 1,251 | 1,251 | 1,251 |
|
| 30 |
+
| **# of utterances** | 138,361 | 6,904 | 8,251 |
|
| 31 |
|
| 32 |
### Requirements
|
| 33 |
|
|
|
|
| 40 |
|
| 41 |
### Model and Results
|
| 42 |
|
| 43 |
+
- [`speecht5_sid.pt`](./speecht5_sid.pt) are reimplemented Speaker Identification fine-tuning on the released manifest **but with a smaller batch size** (Ensure the manifest is ok).
|
| 44 |
+
- `results` are reproduced by the released fine-tuned model and the accuracy is $96.194\%$.
|
| 45 |
+
- `log` is the tensorboard log of fine-tuning the released model.
|
| 46 |
|
| 47 |
### Reference
|
| 48 |
|