ZoneTwelve
/

DeepSeek-OCR-AnyDevice

@@ -174,6 +174,122 @@ We would like to thank [Vary](https://github.com/Ucas-HaoranWei/Vary/), [GOT-OCR
 We also appreciate the benchmarks: [Fox](https://github.com/ucaslcl/Fox), [OminiDocBench](https://github.com/opendatalab/OmniDocBench).
 ## Citation
 ```bibtex

 We also appreciate the benchmarks: [Fox](https://github.com/ucaslcl/Fox), [OminiDocBench](https://github.com/opendatalab/OmniDocBench).
+---
+## 🔍 Summary of Changes
+### 1. **Formatting & Clean-Up**
+* Removed extra spaces, blank lines, and inconsistent indentation.
+* Fixed small style issues (like missing spaces in comments).
+* Added missing newline at the end of the file.
+---
+### 2. **Device and Dtype Handling**
+* Added automatic device detection:
+  ```python
+  model_device = next(self.parameters()).device
+  ```
+* Added adaptive dtype logic:
+  ```python
+  image_dtype = torch.bfloat16 if model_device.type == "cuda" else torch.float32
+  ```
+* Replaced all hardcoded `.cuda()` and `.to(torch.bfloat16)` with:
+  ```python
+  .to(model_device)
+  .to(image_dtype)
+  ```
+✅ **Now works automatically on both GPU and CPU**, without device mismatch errors.
+---
+### 3. **Autocast and Inference Improvements**
+* Wrapped generation in a conditional autocast block:
+  ```python
+  use_autocast = model_device.type == "cuda"
+  if use_autocast:
+      with torch.autocast("cuda", dtype=torch.bfloat16):
+          with torch.no_grad():
+              ...
+  else:
+      with torch.no_grad():
+          ...
+  ```
+* Reduces memory usage and speeds up inference on GPU.
+* Added `torch.no_grad()` for safer evaluation (no gradient tracking).
+---
+### 4. **Image Preprocessing**
+* All image tensors now use:
+  ```python
+  .to(image_dtype)
+  ```
+  instead of hardcoded `torch.bfloat16`.
+* Improves flexibility and prevents dtype errors when running on CPU.
+---
+### 5. **Generation Parameter Updates**
+* Adjusted text generation settings:
+  ```python
+  do_sample = False
+  num_beams = 1
+  max_new_tokens = 4096  # was 8192
+  min_new_tokens = 1
+  repetition_penalty = 1.2
+  pad_token_id = tokenizer.pad_token_id or tokenizer.eos_token_id
+  ```
+🧠 Results: Faster, more controlled generation; avoids repetitive or runaway outputs.
+---
+### 6. **Safer Decoding**
+* Cleaned up decoding logic:
+  ```python
+  input_length = input_ids.unsqueeze(0).to(model_device).shape[1]
+  outputs = tokenizer.decode(output_ids[0, input_length:])
+  ```
+✅ Avoids CUDA-specific assumptions, consistent across devices.
+---
+### 7. **Miscellaneous**
+* Added helpful comments for clarity.
+* Improved readability around image transformation and saving results.
+* Added extra blank lines for cleaner structure.
+---
+## ⚙️ Overall Impact
+| Category        | Before                | After                                    |
+| --------------- | --------------------- | ---------------------------------------- |
+| Device handling | Hardcoded `.cuda()`   | Auto-detected and flexible               |
+| Dtype           | Always bfloat16       | Adaptive: bfloat16 (GPU) / float32 (CPU) |
+| Inference       | Could crash on CPU    | Runs safely everywhere                   |
+| Generation      | Unbounded, repetitive | Tuned and stable                         |
+| Readability     | Mixed formatting      | Clean and consistent                     |
+---
 ## Citation
 ```bibtex