ZoneTwelve commited on
Commit
271125a
·
1 Parent(s): 7f77d8b

Summary of changes

Browse files
Files changed (1) hide show
  1. README.md +116 -0
README.md CHANGED
@@ -174,6 +174,122 @@ We would like to thank [Vary](https://github.com/Ucas-HaoranWei/Vary/), [GOT-OCR
174
 
175
  We also appreciate the benchmarks: [Fox](https://github.com/ucaslcl/Fox), [OminiDocBench](https://github.com/opendatalab/OmniDocBench).
176
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
177
 
178
  ## Citation
179
  ```bibtex
 
174
 
175
  We also appreciate the benchmarks: [Fox](https://github.com/ucaslcl/Fox), [OminiDocBench](https://github.com/opendatalab/OmniDocBench).
176
 
177
+ ---
178
+
179
+ ## 🔍 Summary of Changes
180
+
181
+ ### 1. **Formatting & Clean-Up**
182
+
183
+ * Removed extra spaces, blank lines, and inconsistent indentation.
184
+ * Fixed small style issues (like missing spaces in comments).
185
+ * Added missing newline at the end of the file.
186
+
187
+ ---
188
+
189
+ ### 2. **Device and Dtype Handling**
190
+
191
+ * Added automatic device detection:
192
+
193
+ ```python
194
+ model_device = next(self.parameters()).device
195
+ ```
196
+ * Added adaptive dtype logic:
197
+
198
+ ```python
199
+ image_dtype = torch.bfloat16 if model_device.type == "cuda" else torch.float32
200
+ ```
201
+ * Replaced all hardcoded `.cuda()` and `.to(torch.bfloat16)` with:
202
+
203
+ ```python
204
+ .to(model_device)
205
+ .to(image_dtype)
206
+ ```
207
+
208
+ ✅ **Now works automatically on both GPU and CPU**, without device mismatch errors.
209
+
210
+ ---
211
+
212
+ ### 3. **Autocast and Inference Improvements**
213
+
214
+ * Wrapped generation in a conditional autocast block:
215
+
216
+ ```python
217
+ use_autocast = model_device.type == "cuda"
218
+ if use_autocast:
219
+ with torch.autocast("cuda", dtype=torch.bfloat16):
220
+ with torch.no_grad():
221
+ ...
222
+ else:
223
+ with torch.no_grad():
224
+ ...
225
+ ```
226
+ * Reduces memory usage and speeds up inference on GPU.
227
+ * Added `torch.no_grad()` for safer evaluation (no gradient tracking).
228
+
229
+ ---
230
+
231
+ ### 4. **Image Preprocessing**
232
+
233
+ * All image tensors now use:
234
+
235
+ ```python
236
+ .to(image_dtype)
237
+ ```
238
+
239
+ instead of hardcoded `torch.bfloat16`.
240
+ * Improves flexibility and prevents dtype errors when running on CPU.
241
+
242
+ ---
243
+
244
+ ### 5. **Generation Parameter Updates**
245
+
246
+ * Adjusted text generation settings:
247
+
248
+ ```python
249
+ do_sample = False
250
+ num_beams = 1
251
+ max_new_tokens = 4096 # was 8192
252
+ min_new_tokens = 1
253
+ repetition_penalty = 1.2
254
+ pad_token_id = tokenizer.pad_token_id or tokenizer.eos_token_id
255
+ ```
256
+
257
+ 🧠 Results: Faster, more controlled generation; avoids repetitive or runaway outputs.
258
+
259
+ ---
260
+
261
+ ### 6. **Safer Decoding**
262
+
263
+ * Cleaned up decoding logic:
264
+
265
+ ```python
266
+ input_length = input_ids.unsqueeze(0).to(model_device).shape[1]
267
+ outputs = tokenizer.decode(output_ids[0, input_length:])
268
+ ```
269
+
270
+ ✅ Avoids CUDA-specific assumptions, consistent across devices.
271
+
272
+ ---
273
+
274
+ ### 7. **Miscellaneous**
275
+
276
+ * Added helpful comments for clarity.
277
+ * Improved readability around image transformation and saving results.
278
+ * Added extra blank lines for cleaner structure.
279
+
280
+ ---
281
+
282
+ ## ⚙️ Overall Impact
283
+
284
+ | Category | Before | After |
285
+ | --------------- | --------------------- | ---------------------------------------- |
286
+ | Device handling | Hardcoded `.cuda()` | Auto-detected and flexible |
287
+ | Dtype | Always bfloat16 | Adaptive: bfloat16 (GPU) / float32 (CPU) |
288
+ | Inference | Could crash on CPU | Runs safely everywhere |
289
+ | Generation | Unbounded, repetitive | Tuned and stable |
290
+ | Readability | Mixed formatting | Clean and consistent |
291
+
292
+ ---
293
 
294
  ## Citation
295
  ```bibtex