Commits · nsfwalex/whisper-transcribe-new

Update speaker diarization model and refactor WhisperTranscriber alignment process. Introduce align_timestamp method for improved word-level alignment and streamline segment handling. Adjusted print statements for clarity and removed unnecessary comments.

28823e9

liuyang commited on 9 days ago

fix params

5d33cf4

liuyang commited on 27 days ago

test word_timestamps=True,

af2c324

liuyang commited on 27 days ago

test word_timestamps=True,

f932439

liuyang commited on 27 days ago

fix audio param

63a373b

liuyang commited on 27 days ago

fix audio param

ba3077f

liuyang commited on 27 days ago

log

1d18680

liuyang commited on 27 days ago

Remove 'batch_size' from fw_kwargs in WhisperTranscriber to streamline transcription options.

976003d

liuyang commited on 27 days ago

Enhance audio transcription by adding support for 'faster_whisper' engine alongside 'whisperx'. Implement lazy loading for both transcription models and improve handling of transcribe options. Update transcribe_full_audio method to accommodate engine selection and adjust alignment process accordingly.

25a2b6b

liuyang commited on 27 days ago

disable deletion for test

57aeeb0

liuyang commited on 28 days ago

modify out dir

c998073

liuyang commited on 28 days ago

log audio path

245f6e3

liuyang commited on 28 days ago

segments log

6f9bd28

liuyang commited on 28 days ago

add log

fe1fbc5

liuyang commited on Oct 10

modify preload logic

ae73284

liuyang commited on Oct 10

switch to whisperX

d36869b

liuyang commited on Oct 10

preload

62ed41c

liuyang commited on Oct 10

cuda check

b5b0753

liuyang commited on Oct 10

remove cuda check

d3ed5e3

liuyang commited on Oct 10

restore to whisper

726a091

liuyang commited on Oct 10

switch transcribing back to faster_whisper

12b670c

liuyang commited on Oct 10

add log

4dc1641

liuyang commited on Oct 10

transcribe

8f6476d

liuyang commited on Oct 10

model load

80e245d

liuyang commited on Oct 10

preload

c088b23

liuyang commited on Oct 10

preload

c18172e

liuyang commited on Oct 10

cuda on init

02f099e

liuyang commited on Oct 10

switch to whisperX

3de05cb

liuyang commited on Oct 10

fix typo

d2ef882

liuyang commited on Oct 8

clean up

abc1edf

liuyang commited on Oct 7

modify params

c97acaf

liuyang commited on Oct 7

Refactor audio processing: Simplified the handling of audio chunks in prepare_and_save_audio_for_model and updated preprocess_from_task_json to support both single and multiple chunk tasks, enhancing flexibility in audio preparation.

6c3a671

liuyang commited on Oct 7

fix field

64397b6

liuyang commited on Oct 7

fix field key

9e14752

liuyang commited on Oct 7

Refactor transcription methods to return results: Updated the transcribe_chunk and transcribe_segments methods to return their results instead of processing them directly, improving the flow of data handling in the WhisperTranscriber class.

646c8e8

liuyang commited on Sep 20

update params

ba746a9

liuyang commited on Sep 20

add fields

9b80850

liuyang commited on Sep 20

fix typo

3dae8f9

liuyang commited on Sep 20

add fields

5d0a1ef

liuyang commited on Sep 20

fix bug

d45c437

liuyang commited on Sep 20

fix bug

9fc1e97

liuyang commited on Sep 20

download all models on startup

d29acc5

liuyang commited on Sep 19

fix value

6bea290

liuyang commited on Sep 19

Add audio diarization task to Gradio interface: Introduced a new button and function for audio diarization, allowing users to process audio with speaker separation. Updated existing button labels for clarity.

e79159f

liuyang commited on Sep 19

Refactor model management and transcription process: Introduced a model registry for easier management of Whisper models, added functionality to download models on startup, and streamlined the audio processing pipeline to support both chunk and segment transcriptions with improved error handling and cleanup.

e3d9c9e

liuyang commited on Sep 19

unmatched_diarization_segments

a4568c6

liuyang commited on Sep 16

disable clip_timestamps

b68d580

liuyang commited on Sep 16

disable unmatched_diarization_segments

f425ecd

liuyang commited on Sep 16

update threshold

78d61ea

liuyang commited on Sep 16

try use diarization as clip_timestamp

0b6cc7c

liuyang commited on Sep 16

Commit History

Update speaker diarization model and refactor WhisperTranscriber alignment process. Introduce align_timestamp method for improved word-level alignment and streamline segment handling. Adjusted print statements for clarity and removed unnecessary comments. 28823e9

fix params 5d33cf4

test word_timestamps=True, af2c324

test word_timestamps=True, f932439

fix audio param 63a373b

fix audio param ba3077f

log 1d18680

Remove 'batch_size' from fw_kwargs in WhisperTranscriber to streamline transcription options. 976003d

disable deletion for test 57aeeb0

modify out dir c998073

log audio path 245f6e3

segments log 6f9bd28

add log fe1fbc5

modify preload logic ae73284

switch to whisperX d36869b

preload 62ed41c

cuda check b5b0753

remove cuda check d3ed5e3

restore to whisper 726a091

switch transcribing back to faster_whisper 12b670c

add log 4dc1641

transcribe 8f6476d

model load 80e245d

preload c088b23

preload c18172e

cuda on init 02f099e

switch to whisperX 3de05cb

fix typo d2ef882

clean up abc1edf

modify params c97acaf

Refactor audio processing: Simplified the handling of audio chunks in prepare_and_save_audio_for_model and updated preprocess_from_task_json to support both single and multiple chunk tasks, enhancing flexibility in audio preparation. 6c3a671

fix field 64397b6

fix field key 9e14752

Refactor transcription methods to return results: Updated the transcribe_chunk and transcribe_segments methods to return their results instead of processing them directly, improving the flow of data handling in the WhisperTranscriber class. 646c8e8

update params ba746a9

add fields 9b80850

fix typo 3dae8f9

add fields 5d0a1ef

fix bug d45c437

fix bug 9fc1e97

download all models on startup d29acc5

fix value 6bea290

Add audio diarization task to Gradio interface: Introduced a new button and function for audio diarization, allowing users to process audio with speaker separation. Updated existing button labels for clarity. e79159f

unmatched_diarization_segments a4568c6

disable clip_timestamps b68d580

disable unmatched_diarization_segments f425ecd

update threshold 78d61ea

try use diarization as clip_timestamp 0b6cc7c

Update speaker diarization model and refactor WhisperTranscriber alignment process. Introduce align_timestamp method for improved word-level alignment and streamline segment handling. Adjusted print statements for clarity and removed unnecessary comments.

28823e9

fix params

5d33cf4

test word_timestamps=True,

af2c324

test word_timestamps=True,

f932439

fix audio param

63a373b

fix audio param

ba3077f

log

1d18680

Remove 'batch_size' from fw_kwargs in WhisperTranscriber to streamline transcription options.

976003d

disable deletion for test

57aeeb0

modify out dir

c998073

log audio path

245f6e3

segments log

6f9bd28

add log

fe1fbc5

modify preload logic

ae73284

switch to whisperX

d36869b

preload

62ed41c

cuda check

b5b0753

remove cuda check

d3ed5e3

restore to whisper

726a091

switch transcribing back to faster_whisper

12b670c

add log

4dc1641

transcribe

8f6476d

model load

80e245d

preload

c088b23

preload

c18172e

cuda on init

02f099e

switch to whisperX

3de05cb

fix typo

d2ef882

clean up

abc1edf

modify params

c97acaf

Refactor audio processing: Simplified the handling of audio chunks in prepare_and_save_audio_for_model and updated preprocess_from_task_json to support both single and multiple chunk tasks, enhancing flexibility in audio preparation.

6c3a671

fix field

64397b6

fix field key

9e14752

Refactor transcription methods to return results: Updated the transcribe_chunk and transcribe_segments methods to return their results instead of processing them directly, improving the flow of data handling in the WhisperTranscriber class.

646c8e8

update params

ba746a9

add fields

9b80850

fix typo

3dae8f9

add fields

5d0a1ef

fix bug

d45c437

fix bug

9fc1e97

download all models on startup

d29acc5

fix value

6bea290

Add audio diarization task to Gradio interface: Introduced a new button and function for audio diarization, allowing users to process audio with speaker separation. Updated existing button labels for clarity.

e79159f

unmatched_diarization_segments

a4568c6

disable clip_timestamps

b68d580

disable unmatched_diarization_segments

f425ecd

update threshold

78d61ea

try use diarization as clip_timestamp

0b6cc7c