Make compatible with newer transformers
#38
by
harpreetsahota
- opened
Issue
The model fails to load with new Transformers versions due to removed classes:
ImportError: cannot import name 'LlamaFlashAttention2' from 'transformers.models.llama.modeling_llama'
Root Cause
In modeling_deepseekv2.py (lines 37-39), the code imports:
from transformers.models.llama.modeling_llama import (
LlamaAttention,
LlamaFlashAttention2
)
These classes were removed in Transformers 4.47+ as part of the attention refactoring.
Proposed Fix
Since DeepSeek-OCR uses MLA (Multi-head Latent Attention) by default (config.use_mla = True), the Llama attention classes are only used as fallbacks for MHA mode.
Option 1: Remove MHA support (simplest)
- Remove the imports (lines 37-39)
- Update
ATTENTION_CLASSESdict (lines 1022-1029):
ATTENTION_CLASSES = {
"eager": DeepseekV2Attention,
"flash_attention_2": DeepseekV2FlashAttention2,
"mla_eager": DeepseekV2Attention,
"mla_flash_attention_2": DeepseekV2FlashAttention2,
# Removed mha_eager and mha_flash_attention_2
}
Option 2: Use DeepSeek attention for MHA mode (backward compatible)
Keep the same keys but map to DeepSeek classes:
ATTENTION_CLASSES = {
"eager": DeepseekV2Attention,
"flash_attention_2": DeepseekV2FlashAttention2,
"mla_eager": DeepseekV2Attention,
"mla_flash_attention_2": DeepseekV2FlashAttention2,
"mha_eager": DeepseekV2Attention, # Changed
"mha_flash_attention_2": DeepseekV2FlashAttention2, # Changed
}
Option 3: Conditional import (most flexible)
try:
from transformers.models.llama.modeling_llama import (
LlamaAttention,
LlamaFlashAttention2
)
HAS_LLAMA_ATTENTION = True
except ImportError:
HAS_LLAMA_ATTENTION = False
ATTENTION_CLASSES = {
"eager": DeepseekV2Attention,
"flash_attention_2": DeepseekV2FlashAttention2,
"mla_eager": DeepseekV2Attention,
"mla_flash_attention_2": DeepseekV2FlashAttention2,
}
if HAS_LLAMA_ATTENTION:
ATTENTION_CLASSES.update({
"mha_eager": LlamaAttention,
"mha_flash_attention_2": LlamaFlashAttention2
})
else:
ATTENTION_CLASSES.update({
"mha_eager": DeepseekV2Attention,
"mha_flash_attention_2": DeepseekV2FlashAttention2
})
This works because DeepSeek-OCR uses MLA by default anyway.
All the same issues are still there nothing will open and nothing works this is a useless app fix it