speech?

by MikuXvDev - opened Mar 22, 2025

Discussion

MikuXvDev

Mar 22, 2025

Hi! Sorry for the stupid question, but does this generate a speech?

junnei

Owner Mar 23, 2025

Hi! This is audio adapter model which trained for ASR / AST tasks. but it's ongoing project, so it'll disappear and re-uploaded soon.

MikuXvDev

Mar 23, 2025

wow, very cool, we'll wait for instructions on how to screw such a miracle XD

jsbeaudry

Apr 4, 2025

Hi @junnei , Can you please provide us with some guidance or a Colab URL so we can finetune this model for other low-resource languages?

salmankhanpm

Apr 5, 2025

Hi Sir, @junnei !

I was looking into ASR models and I want to train one specifically for English and Telugu (Indian language). I would love to try training it the same way you did. If you could help me with some guidance or point me to a resource that could help, I’d really appreciate it.

junnei

Owner Apr 10, 2025

Hi @jsbeaudry @salmankhanpm .
Here is new update for finetuning python file you can use : Link
Let me know if there is any issues!

jsbeaudry

Apr 10, 2025

Hi @jsbeaudry @salmankhanpm .
Here is new update for finetuning python file you can use : Link
Let me know if there is any issues!

Thank you @junnei . I will let you know, good job.

StephennFernandes

May 2, 2025

@junnei

can you confirm if the model junnei/gemma-3-4b-it-speech is a fresh base model with only the Phi-4-MM audio encoder attached to a gemma 3 model and no korean ASR/AST finetuning done on top of this stack.
as i plan to use this model for finetuning this model on multiple multilingual ASR/AST datasets. i wanted to ensure that the junnei/gemma-3-4b-it-speech is a fresh base model with no korean finetuning, as this would create interference in my multilingual finetuning.

additionally could you also share if you saw improvements with scale ? how were your evals trained on models from 1B, 4B, 12B, 27B ?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment