Instructions to use junnei/gemma-3-4b-it-speech with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use junnei/gemma-3-4b-it-speech with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="junnei/gemma-3-4b-it-speech", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("junnei/gemma-3-4b-it-speech", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
speech?
Hi! Sorry for the stupid question, but does this generate a speech?
Hi! This is audio adapter model which trained for ASR / AST tasks. but it's ongoing project, so it'll disappear and re-uploaded soon.
wow, very cool, we'll wait for instructions on how to screw such a miracle XD
Hi Sir, @junnei !
I was looking into ASR models and I want to train one specifically for English and Telugu (Indian language). I would love to try training it the same way you did. If you could help me with some guidance or point me to a resource that could help, I’d really appreciate it.
Hi @jsbeaudry @salmankhanpm .
Here is new update for finetuning python file you can use : Link
Let me know if there is any issues!
Hi @jsbeaudry @salmankhanpm .
Here is new update for finetuning python file you can use : Link
Let me know if there is any issues!
Thank you @junnei . I will let you know, good job.
can you confirm if the model junnei/gemma-3-4b-it-speech is a fresh base model with only the Phi-4-MM audio encoder attached to a gemma 3 model and no korean ASR/AST finetuning done on top of this stack.
as i plan to use this model for finetuning this model on multiple multilingual ASR/AST datasets. i wanted to ensure that the junnei/gemma-3-4b-it-speech is a fresh base model with no korean finetuning, as this would create interference in my multilingual finetuning.
additionally could you also share if you saw improvements with scale ? how were your evals trained on models from 1B, 4B, 12B, 27B ?