haolinwu commited on
Commit
802470f
·
verified ·
1 Parent(s): 0a09522

[Bug Fix] Update demo code to work with the latest transformers library version

Browse files

The current demo code fails to run correctly with newer versions of the transformers library.

First, the `audios` parameter in the `__call__` method of `Qwen2AudioProcessor` has been renamed to `audio` starting from transformers v4.54.0. While this doesn't throw an explicit error, the model silently fails to receive audio input. This is particularly confusing for beginners, who may mistakenly assume the model itself is corrupted rather than identifying the parameter mismatch.

Second, the existing GPU migration code (`inputs.input_ids = inputs.input_ids.to("cuda")`) causes a device mismatch error: `RuntimeError: Expected all tensors to be on the same device, but got index is on cpu, different from other tensors on cuda:0 (when checking argument in method wrapper_CUDA__index_select)`. This can be fixed by moving the entire `inputs` object to CUDA instead of just `input_ids`.

This PR updates the demo code to address both issues, ensuring compatibility with the latest transformers versions .

Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -68,8 +68,8 @@ for message in conversation:
68
  sr=processor.feature_extractor.sampling_rate)[0]
69
  )
70
 
71
- inputs = processor(text=text, audios=audios, return_tensors="pt", padding=True)
72
- inputs.input_ids = inputs.input_ids.to("cuda")
73
 
74
  generate_ids = model.generate(**inputs, max_length=256)
75
  generate_ids = generate_ids[:, inputs.input_ids.size(1):]
@@ -116,8 +116,8 @@ for message in conversation:
116
  sr=processor.feature_extractor.sampling_rate)[0]
117
  )
118
 
119
- inputs = processor(text=text, audios=audios, return_tensors="pt", padding=True)
120
- inputs.input_ids = inputs.input_ids.to("cuda")
121
 
122
  generate_ids = model.generate(**inputs, max_length=256)
123
  generate_ids = generate_ids[:, inputs.input_ids.size(1):]
@@ -171,9 +171,9 @@ for conversation in conversations:
171
  sr=processor.feature_extractor.sampling_rate)[0]
172
  )
173
 
174
- inputs = processor(text=text, audios=audios, return_tensors="pt", padding=True)
175
  inputs['input_ids'] = inputs['input_ids'].to("cuda")
176
- inputs.input_ids = inputs.input_ids.to("cuda")
177
 
178
  generate_ids = model.generate(**inputs, max_length=256)
179
  generate_ids = generate_ids[:, inputs.input_ids.size(1):]
 
68
  sr=processor.feature_extractor.sampling_rate)[0]
69
  )
70
 
71
+ inputs = processor(text=text, audio=audios, return_tensors="pt", padding=True)
72
+ inputs = inputs.to("cuda")
73
 
74
  generate_ids = model.generate(**inputs, max_length=256)
75
  generate_ids = generate_ids[:, inputs.input_ids.size(1):]
 
116
  sr=processor.feature_extractor.sampling_rate)[0]
117
  )
118
 
119
+ inputs = processor(text=text, audio=audios, return_tensors="pt", padding=True)
120
+ inputs = inputs.to("cuda")
121
 
122
  generate_ids = model.generate(**inputs, max_length=256)
123
  generate_ids = generate_ids[:, inputs.input_ids.size(1):]
 
171
  sr=processor.feature_extractor.sampling_rate)[0]
172
  )
173
 
174
+ inputs = processor(text=text, audio=audios, return_tensors="pt", padding=True)
175
  inputs['input_ids'] = inputs['input_ids'].to("cuda")
176
+ inputs = inputs.to("cuda")
177
 
178
  generate_ids = model.generate(**inputs, max_length=256)
179
  generate_ids = generate_ids[:, inputs.input_ids.size(1):]