Papers
arxiv:2106.09685

LoRA: Low-Rank Adaptation of Large Language Models

Published on Jun 17, 2021
Authors:
,
,
,
,
,
,

Abstract

Low-Rank Adaptation (LoRA) reduces the number of trainable parameters and GPU memory usage in large-scale pre-trained models while maintaining or improving performance on downstream tasks.

AI-generated summary

An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example -- deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce the number of trainable parameters by 10,000 times and the GPU memory requirement by 3 times. LoRA performs on-par or better than fine-tuning in model quality on RoBERTa, DeBERTa, GPT-2, and GPT-3, despite having fewer trainable parameters, a higher training throughput, and, unlike adapters, no additional inference latency. We also provide an empirical investigation into rank-deficiency in language model adaptation, which sheds light on the efficacy of LoRA. We release a package that facilitates the integration of LoRA with PyTorch models and provide our implementations and model checkpoints for RoBERTa, DeBERTa, and GPT-2 at https://github.com/microsoft/LoRA.

Community

one of the most impactful papers of the last 2 years IMO

Ā·

Just seeing this notification. Glad you found my write-up useful!

LoRA: Revolutionizing Fine-Tuning for Large Language Models

Links šŸ”—:

šŸ‘‰ Subscribe: https://www.youtube.com/@Arxflix
šŸ‘‰ Twitter: https://x.com/arxflix
šŸ‘‰ LMNT (Partner): https://lmnt.com/

By Arxflix
9t4iCUHx_400x400-1.jpg

Great work!

This comment has been hidden (marked as Off-Topic)

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2106.09685
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 361

Browse 361 models citing this paper

Datasets citing this paper 10

Browse 10 datasets citing this paper

Spaces citing this paper 273

Collections including this paper 45