Papers
arxiv:2311.04257

mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

Published on Nov 7, 2023
· Submitted by
AK
on Nov 8, 2023
Authors:
,
,
,

Abstract

mPLUG-Owl2, a versatile multi-modal large language model, uses a modularized network design to enhance performance in both text and multi-modal tasks through modality collaboration and modality-adaptive modules.

AI-generated summary

Multi-modal Large Language Models (MLLMs) have demonstrated impressive instruction abilities across various open-ended tasks. However, previous methods primarily focus on enhancing multi-modal capabilities. In this work, we introduce a versatile multi-modal large language model, mPLUG-Owl2, which effectively leverages modality collaboration to improve performance in both text and multi-modal tasks. mPLUG-Owl2 utilizes a modularized network design, with the language decoder acting as a universal interface for managing different modalities. Specifically, mPLUG-Owl2 incorporates shared functional modules to facilitate modality collaboration and introduces a modality-adaptive module that preserves modality-specific features. Extensive experiments reveal that mPLUG-Owl2 is capable of generalizing both text tasks and multi-modal tasks and achieving state-of-the-art performances with a single generic model. Notably, mPLUG-Owl2 is the first MLLM model that demonstrates the modality collaboration phenomenon in both pure-text and multi-modal scenarios, setting a pioneering path in the development of future multi-modal foundation models.

Community

Training, inference and evaluation code, plus trained models with a demo. If only all papers were released this complete.

Outstanding release! Been looking forward to this one!

Thank you team!

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2311.04257
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2311.04257 in a dataset README.md to link it from this page.

Spaces citing this paper 2

Collections including this paper 9