Commit
·
a3f055a
1
Parent(s):
67be3d4
Update README.md
Browse files
README.md
CHANGED
|
@@ -24,9 +24,17 @@ tags:
|
|
| 24 |
|
| 25 |
# Bark
|
| 26 |
|
| 27 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
|
| 29 |
-
The original github repo and model card can be found [here](https://github.com/suno-ai/bark)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
The following is additional information about the models released here.
|
| 32 |
|
|
@@ -80,9 +88,9 @@ Bark is a series of three transformer models that turn text into audio.
|
|
| 80 |
### Architecture
|
| 81 |
| Model | Parameters | Attention | Output Vocab size |
|
| 82 |
|:-------------------------:|:----------:|------------|:-----------------:|
|
| 83 |
-
| Text to semantic tokens | 80 M | Causal | 10,000 |
|
| 84 |
-
| Semantic to coarse tokens | 80 M | Causal | 2x 1,024 |
|
| 85 |
-
| Coarse to fine tokens | 80 M | Non-causal | 6x 1,024 |
|
| 86 |
|
| 87 |
|
| 88 |
### Release date
|
|
@@ -90,9 +98,8 @@ April 2023
|
|
| 90 |
|
| 91 |
## Broader Implications
|
| 92 |
We anticipate that this model's text to audio capabilities can be used to improve accessbility tools in a variety of languages.
|
| 93 |
-
Straightforward improvements will allow models to run faster than realtime, rendering them useful for applications such as virtual assistants.
|
| 94 |
|
| 95 |
While we hope that this release will enable users to express their creativity and build applications that are a force
|
| 96 |
for good, we acknowledge that any text to audio model has the potential for dual use. While it is not straightforward
|
| 97 |
-
to voice clone known people with Bark,
|
| 98 |
we also release a simple classifier to detect Bark-generated audio with high accuracy (see notebooks section of the main repository).
|
|
|
|
| 24 |
|
| 25 |
# Bark
|
| 26 |
|
| 27 |
+
Bark is a transformer-based text-to-audio model created by [Suno](https://www.suno.ai).
|
| 28 |
+
Bark can generate highly realistic, multilingual speech as well as other audio - including music,
|
| 29 |
+
background noise and simple sound effects. The model can also produce nonverbal
|
| 30 |
+
communications like laughing, sighing and crying. To support the research community,
|
| 31 |
+
we are providing access to pretrained model checkpoints ready for inference.
|
| 32 |
|
| 33 |
+
The original github repo and model card can be found [here](https://github.com/suno-ai/bark).
|
| 34 |
+
|
| 35 |
+
This model is meant for research purposes only.
|
| 36 |
+
The model output is not censored and the authors do not endorse the opinions in the generated content.
|
| 37 |
+
Use at your own risk.
|
| 38 |
|
| 39 |
The following is additional information about the models released here.
|
| 40 |
|
|
|
|
| 88 |
### Architecture
|
| 89 |
| Model | Parameters | Attention | Output Vocab size |
|
| 90 |
|:-------------------------:|:----------:|------------|:-----------------:|
|
| 91 |
+
| Text to semantic tokens | 80/300 M | Causal | 10,000 |
|
| 92 |
+
| Semantic to coarse tokens | 80/300 M | Causal | 2x 1,024 |
|
| 93 |
+
| Coarse to fine tokens | 80/300 M | Non-causal | 6x 1,024 |
|
| 94 |
|
| 95 |
|
| 96 |
### Release date
|
|
|
|
| 98 |
|
| 99 |
## Broader Implications
|
| 100 |
We anticipate that this model's text to audio capabilities can be used to improve accessbility tools in a variety of languages.
|
|
|
|
| 101 |
|
| 102 |
While we hope that this release will enable users to express their creativity and build applications that are a force
|
| 103 |
for good, we acknowledge that any text to audio model has the potential for dual use. While it is not straightforward
|
| 104 |
+
to voice clone known people with Bark, it can still be used for nefarious purposes. To further reduce the chances of unintended use of Bark,
|
| 105 |
we also release a simple classifier to detect Bark-generated audio with high accuracy (see notebooks section of the main repository).
|