Commit History

Update speaker diarization model and refactor WhisperTranscriber alignment process. Introduce align_timestamp method for improved word-level alignment and streamline segment handling. Adjusted print statements for clarity and removed unnecessary comments.
28823e9

liuyang commited on

fix params
5d33cf4

liuyang commited on

test word_timestamps=True,
af2c324

liuyang commited on

test word_timestamps=True,
f932439

liuyang commited on

fix audio param
63a373b

liuyang commited on

fix audio param
ba3077f

liuyang commited on

log
1d18680

liuyang commited on

Remove 'batch_size' from fw_kwargs in WhisperTranscriber to streamline transcription options.
976003d

liuyang commited on

Enhance audio transcription by adding support for 'faster_whisper' engine alongside 'whisperx'. Implement lazy loading for both transcription models and improve handling of transcribe options. Update transcribe_full_audio method to accommodate engine selection and adjust alignment process accordingly.
25a2b6b

liuyang commited on

disable deletion for test
57aeeb0

liuyang commited on

modify out dir
c998073

liuyang commited on

log audio path
245f6e3

liuyang commited on

segments log
6f9bd28

liuyang commited on

add log
fe1fbc5

liuyang commited on

modify preload logic
ae73284

liuyang commited on

switch to whisperX
d36869b

liuyang commited on

preload
62ed41c

liuyang commited on

cuda check
b5b0753

liuyang commited on

remove cuda check
d3ed5e3

liuyang commited on

restore to whisper
726a091

liuyang commited on

switch transcribing back to faster_whisper
12b670c

liuyang commited on

add log
4dc1641

liuyang commited on

transcribe
8f6476d

liuyang commited on

model load
80e245d

liuyang commited on

preload
c088b23

liuyang commited on

preload
c18172e

liuyang commited on

cuda on init
02f099e

liuyang commited on

switch to whisperX
3de05cb

liuyang commited on

fix typo
d2ef882

liuyang commited on

clean up
abc1edf

liuyang commited on

modify params
c97acaf

liuyang commited on

Refactor audio processing: Simplified the handling of audio chunks in prepare_and_save_audio_for_model and updated preprocess_from_task_json to support both single and multiple chunk tasks, enhancing flexibility in audio preparation.
6c3a671

liuyang commited on

fix field
64397b6

liuyang commited on

fix field key
9e14752

liuyang commited on

Refactor transcription methods to return results: Updated the transcribe_chunk and transcribe_segments methods to return their results instead of processing them directly, improving the flow of data handling in the WhisperTranscriber class.
646c8e8

liuyang commited on

update params
ba746a9

liuyang commited on

add fields
9b80850

liuyang commited on

fix typo
3dae8f9

liuyang commited on

add fields
5d0a1ef

liuyang commited on

fix bug
d45c437

liuyang commited on

fix bug
9fc1e97

liuyang commited on

download all models on startup
d29acc5

liuyang commited on

fix value
6bea290

liuyang commited on

Add audio diarization task to Gradio interface: Introduced a new button and function for audio diarization, allowing users to process audio with speaker separation. Updated existing button labels for clarity.
e79159f

liuyang commited on

Refactor model management and transcription process: Introduced a model registry for easier management of Whisper models, added functionality to download models on startup, and streamlined the audio processing pipeline to support both chunk and segment transcriptions with improved error handling and cleanup.
e3d9c9e

liuyang commited on

unmatched_diarization_segments
a4568c6

liuyang commited on

disable clip_timestamps
b68d580

liuyang commited on

disable unmatched_diarization_segments
f425ecd

liuyang commited on

update threshold
78d61ea

liuyang commited on

try use diarization as clip_timestamp
0b6cc7c

liuyang commited on