Newest commit breaks compatibility with transformers==4.57.6, while 5.3.0 is broken as well
#22
by leucht - opened
Anything above transformers==5 with AutoModelForImageTextToText has seemingly always been broken. Hence I stayed locked on 4.57.6 and continued to use AutoModelForCausalLM for both vl & vl-1.5. With the most recent commit 4.57.6 is now also broken, meaning using Paddle through Hugging Faces is functionally impossible now.
python==3.13.0 inside rocm/pytorch container on Ubuntu 24.04.4
File "/opt/venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/venv/lib/python3.13/site-packages/transformers/utils/generic.py", line 1081, in wrapper
raise original_exception
File "/opt/venv/lib/python3.13/site-packages/transformers/utils/generic.py", line 1072, in wrapper
outputs = func(self, *args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/PaddlePaddle/PaddleOCR_hyphen_VL/a1e4dc4c0e1e16eb0f614a0da0dcacf58b2d18de/modeling_paddleocr_vl.py", line 606, in forward
causal_mask = create_causal_mask(
config=self.config,
...<4 lines>...
cache_position=cache_position,
)
TypeError: create_causal_mask() got an unexpected keyword argument 'inputs_embeds'. Did you mean 'input_embeds'?
This affects both vl & vl-1.5.
Updating to transformers==5.3.0 is not a valid solution due to this issue resulting in this error:
Traceback (most recent call last):
File "/workspaces/sup-convert/main.py", line 285, in <module>
main()
~~~~^^
File "/workspaces/sup-convert/main.py", line 200, in main
gpu_core = OCRModelCore(options=options)
File "/workspaces/sup-convert/model_core.py", line 87, in __init__
AutoModelForImageTextToText.from_pretrained(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
model_name,
^^^^^^^^^^^
...<2 lines>...
attn_implementation=attn_implementation,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/opt/venv/lib/python3.13/site-packages/transformers/models/auto/auto_factory.py", line 374, in from_pretrained
return model_class.from_pretrained(
~~~~~~~~~~~~~~~~~~~~~~~~~~~^
pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/opt/venv/lib/python3.13/site-packages/transformers/modeling_utils.py", line 4094, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "/opt/venv/lib/python3.13/site-packages/transformers/models/paddleocr_vl/modeling_paddleocr_vl.py", line 1402, in __init__
self.model = PaddleOCRVLModel(config)
~~~~~~~~~~~~~~~~^^^^^^^^
File "/opt/venv/lib/python3.13/site-packages/transformers/models/paddleocr_vl/modeling_paddleocr_vl.py", line 1051, in __init__
self.language_model = PaddleOCRTextModel._from_config(config.text_config)
^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.13/site-packages/transformers/configuration_utils.py", line 164, in __getattribute__
return super().__getattribute__(key)
~~~~~~~~~~~~~~~~~~~~~~~~^^^^^
AttributeError: 'PaddleOCRVLConfig' object has no attribute 'text_config'. Did you mean: 'get_text_config'?
2026-03-19 12:42:17,845 - httpcore.connection - DEBUG - close.started
2026-03-19 12:42:17,846 - httpcore.connection - DEBUG - close.complete
which mostly likely comes from this deprecation:
Unrecognized keys in `rope_parameters` for 'rope_type'='default': {'mrope_section'}
modeling_rope_utils.py:936: FutureWarning: `rope_config_validation` is deprecated and has been removed.
Its functionality has been moved to RotaryEmbeddingConfigMixin.validate_rope method.
PreTrainedConfig inherits this class, so please call self.validate_rope() instead.
Also, make sure to use the new rope_parameters syntax. You can call self.standardize_rope_params() in the meantime.