Merge Request

by Dogacel - opened 12 days ago

12 days ago

Hi,

If there is any update that needs to be applied to Dogacel/DeepSeek-OCR-Metal-MPS, please let me know, I can test and merge it.

Even though my repo was targeting MPS, it actually supports CPU inference too.

Thanks.

th1nhhdk

Owner 12 days ago

Hi,

If there is any update that needs to be applied to Dogacel/DeepSeek-OCR-Metal-MPS, please let me know, I can test and merge it.

Even though my repo was targeting MPS, it actually supports CPU inference too.

Thanks.

I just took your patches and applied it on top of prithivMLmods/DeepSeek-OCR-Latest-BF16, that's all

Your repo have the patches for DeepSeek-OCR to work with CPU/MPS, prithivMLmods/DeepSeek-OCR-Latest-BF16 have the patches for it to work with new Hugging Face Transformers versions, so I merged both together

Dogacel

12 days ago

Hi,

If there is any update that needs to be applied to Dogacel/DeepSeek-OCR-Metal-MPS, please let me know, I can test and merge it.

Even though my repo was targeting MPS, it actually supports CPU inference too.

Thanks.

I just took your patches and applied it on top of prithivMLmods/DeepSeek-OCR-Latest-BF16, that's all

Your repo have the patches for DeepSeek-OCR to work with CPU/MPS, prithivMLmods/DeepSeek-OCR-Latest-BF16 have the patches for it to work with new Hugging Face Transformers versions, so I merged both together

Thanks!

How did you mark the model as "finetune" to mine? I wish I did that for the original DeepSeek OCR.

It seems like the patched model also supports different attention implementations.

This version allows flexible configuration of attention implementations—such as flash_attention or sdpa—for performance optimization or standardization. Users can also opt out of specific attention implementations if desired.

I would be interested in applying those patches to my fine-tune if they are backwards compatible as well.

th1nhhdk

Owner 12 days ago

Thanks!

How did you mark the model as "finetune" to mine? I wish I did that for the original DeepSeek OCR.

It seems like the patched model also supports different attention implementations.

base_model:
- deepseek-ai/DeepSeek-OCR

Add this to your model card, it should work

Dogacel

3 days ago

Can you test my version as well? I've applied my own patch based on https://github.com/huggingface/transformers/pull/29467/files

I didn't like some of the changes that version has, as I thought them irrelevant and confusing.

th1nhhdk

Owner 2 days ago

Can you test my version as well? I've applied my own patch based on https://github.com/huggingface/transformers/pull/29467/files

I didn't like some of the changes that version has, as I thought them irrelevant and confusing.

$ python run_dpsk_ocr.py
You are using a model of type deepseek_vl_v2 to instantiate a model of type DeepseekOCR. This is not supported for all configurations of models and can yield errors.
Some weights of DeepseekOCRForCausalLM were not initialized from the model checkpoint at ../ and are newly initialized: ['model.vision_model.embeddings.position_ids']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
directly resize
/home/th1nhhdk/deepseek-ocr/venv/lib/python3.12/site-packages/transformers/generation/configuration_utils.py:590: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.0` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
  warnings.warn(
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
`get_max_cache()` is deprecated for all Cache classes. Use `get_max_cache_shape()` instead. Calling `get_max_cache()` will raise error from v4.48
=====================
BASE:  torch.Size([1, 100, 1280])
NO PATCHES
=====================
The attention layers in this model are transitioning from computing the RoPE embeddings internally through `position_ids` (2D tensor with the indexes of the tokens), to using externally computed `position_embeddings` (Tuple of tensors, containing cos and sin). In v4.46 `position_ids` will be removed and `position_embeddings` will be mandatory.
<|ref|>title<|/ref|><|det|>[[100, 94, 859, 150]]<|/det|>
# Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation
...

It seems to work fine

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment