Hierarchical Low-Rank Compression for LLMs
Compressed model checkpoints from the paper "Hierarchical Low-Rank Compression for LLMs" (NeurIPS 2026 submission).
Models
This repository contains compressed models organized by base model, compression ratio, and optimization stage:
| Base Model | Ratio | Stage B (SVD) | Stage D (Block-opt) | Stage F (Fine-tuned) |
|---|---|---|---|---|
| LLaMA-7B | r=0.2 (80% compressed) | β | β | β |
| LLaMA-7B | r=0.4 (60% compressed) | β | β | β |
| LLaMA-7B | r=0.6 (40% compressed) | β | β | β |
| LLaMA-7B | r=0.8 (20% compressed) | β | β | β |
| Qwen3-14B | r=0.2 (80% compressed) | β | β | β |
Stages
- B (Base SVD): Per-matrix whitened SVD compression
- D (Block-optimized): Block-level reconstruction optimization on top of B
- F (Fine-tuned): End-to-end LM-loss refinement via Sequential LoRA
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load a specific model variant
repo = "zhc12/HLC-compressed-models"
# Example: LLaMA-7B at 60% compression, fine-tuned (Stage F)
model = AutoModelForCausalLM.from_pretrained(repo, subfolder="llama7b/r04/F")
tokenizer = AutoTokenizer.from_pretrained(repo, subfolder="llama7b/r04/F")
# Example: Qwen3-14B at 80% compression, fine-tuned
model = AutoModelForCausalLM.from_pretrained(repo, subfolder="qwen3_14b/r02/F")
tokenizer = AutoTokenizer.from_pretrained(repo, subfolder="qwen3_14b/r02/F")
Directory Structure
βββ llama7b/
β βββ r02/{B,D,F}/ # 80% compression
β βββ r04/{B,D,F}/ # 60% compression
β βββ r06/{B,D,F}/ # 40% compression
β βββ r08/{B,D,F}/ # 20% compression
βββ qwen3_14b/
βββ r02/{B,F}/ # 80% compression
Citation
@inproceedings{hlc2026,
title={Hierarchical Low-Rank Compression for LLMs},
author={Anonymous},
booktitle={NeurIPS},
year={2026}
}
License
Apache 2.0