Is there a way to train Jina-embeddings-v3 using Transformers Trainer API? (Encountering NameError in rotary.py)
Hi everyone,
I'm attempting to fine-tune the Jina-embeddings-v3 for a sequence classification task. My goal is to use the standard Hugging Face transformers.Trainer API for this process.
To achieve this, I've been following these steps:
- Loading the base model using
AutoModel.from_pretrained("Jina-embeddings-v3", trust_remote_code=True, config=modified_config), wheremodified_configmight havelora_main_params_trainableadjusted. - Creating a custom PyTorch
nn.Modulewrapper around this base Jina model. This wrapper:- Adds a classification head (
nn.Linear). - Implements mean pooling over the base model's
last_hidden_state(as the model doesn't use a CLS token for sequence representation). - Handles passing an
adapter_maskto the base model's forward pass if a specific LoRA adaptation (like "classification") is targeted. - Computes
CrossEntropyLossif labels are provided.
- Adds a classification head (
- Setting
requires_gradappropriately to fine-tune only the LoRA adapter parameters (identified by "parametrizations" in their names) and the new classification head, keeping other base model weights frozen (iflora_main_params_trainable=False).
The forward pass of my custom wrapped model works correctly when tested with tokenized inputs. However, when I try to train it using trainer.train(), I consistently encounter a NameError during the backward pass.
Here's the relevant part of the traceback:
Traceback (most recent call last):
File "/path/to/your/script.py", line X, in <your_training_initiation_function_or_cell> # Adjust path
trainer.train()
File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2245, in train
return inner_training_loop(
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 2560, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/transformers/trainer.py", line 3782, in training_step
self.accelerator.backward(loss, **kwargs)
File "/usr/local/lib/python3.11/dist-packages/accelerate/accelerator.py", line 2359, in backward
loss.backward(**kwargs)
File "/usr/local/lib/python3.11/dist-packages/torch/_tensor.py", line 626, in backward
torch.autograd.backward(
File "/usr/local/lib/python3.11/dist-packages/torch/autograd/__init__.py", line 347, in backward
_engine_run_backward(
File "/usr/local/lib/python3.11/dist-packages/torch/autograd/graph.py", line 823, in _engine_run_backward
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/autograd/function.py", line 307, in apply
return user_fn(self, *args)
^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/huggingface/modules/transformers_modules/jinaai/xlm-roberta-flash-implementation/2b6bc3f30750b3a9648fe9b63448c09920efe9be/rotary.py", line 257, in backward
apply_rotary( # <--- THIS IS THE LINE CAUSING THE NameError
^^^^^^^^^^^^
NameError: name 'apply_rotary' is not defined
The error NameError: name 'apply_rotary' is not defined seems to originate from the model's custom rotary.py file (specifically the one cached from jinaai/xlm-roberta-flash-implementation). This suggests an issue within the model's own implementation of the backward pass for its rotary position embeddings, where the apply_rotary function is not accessible within the scope of the custom torch.autograd.Function's backward method.
My questions are:
- Has anyone else encountered this
NameErrorwhen trying to fine-tune this Jina model (or related Jina embedding models that might use the samerotary.pymodule)? - Is there a known fix or workaround for this issue that would allow training with the
transformers.Trainer? - Given this internal error, what is the recommended approach by Jina AI for fine-tuning these models for a downstream task like sequence classification if the standard
Trainerapproach hits this roadblock? (The model card mentionsSentenceTransformerTrainer, but my aim was to use the standardtransformers.Trainerwith a classification head.)
Any insights or suggestions would be greatly appreciated!
Thanks!
I'm experiencing the exact same problem when trying to fine-tune my own dataset using BatchAllTripletLoss function and BinaryClassificationEvaluator. And I'm using the SentenceTransformerTrainer.
Then:
trainer = SentenceTransformerTrainer(
model=model,
args=args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
loss=loss_all,
evaluator=evaluator
)
trainer.train()
gives me the same message you're getting.