PaddlePaddle/PaddleOCR-VL-1.5 · Newest commit breaks compatibility with transformers==4.57.6, while 5.3.0 is broken as well

Newest commit breaks compatibility with transformers==4.57.6, while 5.3.0 is broken as well

#22

by leucht - opened Mar 19

Mar 19

Anything above transformers==5 with AutoModelForImageTextToText has seemingly always been broken. Hence I stayed locked on 4.57.6 and continued to use AutoModelForCausalLM for both vl & vl-1.5. With the most recent commit 4.57.6 is now also broken, meaning using Paddle through Hugging Faces is functionally impossible now.

python==3.13.0 inside rocm/pytorch container on Ubuntu 24.04.4

  File "/opt/venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/venv/lib/python3.13/site-packages/transformers/utils/generic.py", line 1081, in wrapper
    raise original_exception
  File "/opt/venv/lib/python3.13/site-packages/transformers/utils/generic.py", line 1072, in wrapper
    outputs = func(self, *args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/PaddlePaddle/PaddleOCR_hyphen_VL/a1e4dc4c0e1e16eb0f614a0da0dcacf58b2d18de/modeling_paddleocr_vl.py", line 606, in forward
    causal_mask = create_causal_mask(
        config=self.config,
    ...<4 lines>...
        cache_position=cache_position,
    )
TypeError: create_causal_mask() got an unexpected keyword argument 'inputs_embeds'. Did you mean 'input_embeds'?

This affects both vl & vl-1.5.

Updating to transformers==5.3.0 is not a valid solution due to this issue resulting in this error:

Traceback (most recent call last):
  File "/workspaces/sup-convert/main.py", line 285, in <module>
    main()
    ~~~~^^
  File "/workspaces/sup-convert/main.py", line 200, in main
    gpu_core = OCRModelCore(options=options)
  File "/workspaces/sup-convert/model_core.py", line 87, in __init__
    AutoModelForImageTextToText.from_pretrained(
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        model_name,
        ^^^^^^^^^^^
    ...<2 lines>...
        attn_implementation=attn_implementation,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/opt/venv/lib/python3.13/site-packages/transformers/models/auto/auto_factory.py", line 374, in from_pretrained
    return model_class.from_pretrained(
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/opt/venv/lib/python3.13/site-packages/transformers/modeling_utils.py", line 4094, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
  File "/opt/venv/lib/python3.13/site-packages/transformers/models/paddleocr_vl/modeling_paddleocr_vl.py", line 1402, in __init__
    self.model = PaddleOCRVLModel(config)
                 ~~~~~~~~~~~~~~~~^^^^^^^^
  File "/opt/venv/lib/python3.13/site-packages/transformers/models/paddleocr_vl/modeling_paddleocr_vl.py", line 1051, in __init__
    self.language_model = PaddleOCRTextModel._from_config(config.text_config)
                                                          ^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.13/site-packages/transformers/configuration_utils.py", line 164, in __getattribute__
    return super().__getattribute__(key)
           ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^
AttributeError: 'PaddleOCRVLConfig' object has no attribute 'text_config'. Did you mean: 'get_text_config'?
2026-03-19 12:42:17,845 - httpcore.connection - DEBUG - close.started
2026-03-19 12:42:17,846 - httpcore.connection - DEBUG - close.complete

which mostly likely comes from this deprecation:

Unrecognized keys in `rope_parameters` for 'rope_type'='default': {'mrope_section'}

modeling_rope_utils.py:936: FutureWarning: `rope_config_validation` is deprecated and has been removed. 
Its functionality has been moved to RotaryEmbeddingConfigMixin.validate_rope method. 
PreTrainedConfig inherits this class, so please call self.validate_rope() instead. 
Also, make sure to use the new rope_parameters syntax. You can call self.standardize_rope_params() in the meantime.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment