Instructions to use kyutai/stt-1b-en_fr with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Moshi
How to use kyutai/stt-1b-en_fr with Moshi:
# pip install moshi # Run the interactive web server python -m moshi.server --hf-repo "kyutai/stt-1b-en_fr" # Then open https://localhost:8998 in your browser
# pip install moshi import torch from moshi.models import loaders # Load checkpoint info from HuggingFace checkpoint = loaders.CheckpointInfo.from_hf_repo("kyutai/stt-1b-en_fr") # Load the Mimi audio codec mimi = checkpoint.get_mimi(device="cuda") mimi.set_num_codebooks(8) # Encode audio (24kHz, mono) wav = torch.randn(1, 1, 24000 * 10) # [batch, channels, samples] with torch.no_grad(): codes = mimi.encode(wav.cuda()) decoded = mimi.decode(codes) - Notebooks
- Google Colab
- Kaggle
GGUF + pure-C++ runtime in CrispASR — Kyutai STT (Mimi codec + LM)
#11 opened 23 days ago
by
cstr
AutoProcessor.from_pretrained(model_id, language="pa", task="transcribe") - Error - Transformers does not recognize this architecture - model type `stt`
3
#10 opened 8 months ago
by
jssaluja
What is the tokenization and alignment approach? i.e. collation
11
#9 opened 8 months ago
by
RonanMcGovern
Improve model card: Add pipeline tag, paper link, and sample usage
#8 opened 9 months ago
by
nielsr
Update README.md
#4 opened 11 months ago
by
huhe-2024
Local Installation Video and Testing - Step by Step
#2 opened 11 months ago
by
fahdmirzac
Thank you!
#1 opened 11 months ago
by
ndgold