Hierarchical Low-Rank Compression for LLMs

Compressed model checkpoints from the paper "Hierarchical Low-Rank Compression for LLMs" (NeurIPS 2026 submission).

Models

This repository contains compressed models organized by base model, compression ratio, and optimization stage:

Base Model	Ratio	Stage B (SVD)	Stage D (Block-opt)	Stage F (Fine-tuned)
LLaMA-7B	r=0.2 (80% compressed)	✅	✅	✅
LLaMA-7B	r=0.4 (60% compressed)	✅	✅	✅
LLaMA-7B	r=0.6 (40% compressed)	✅	✅	✅
LLaMA-7B	r=0.8 (20% compressed)	✅	✅	✅
Qwen3-14B	r=0.2 (80% compressed)	✅	—	✅

Stages

B (Base SVD): Per-matrix whitened SVD compression
D (Block-optimized): Block-level reconstruction optimization on top of B
F (Fine-tuned): End-to-end LM-loss refinement via Sequential LoRA

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load a specific model variant
repo = "zhc12/HLC-compressed-models"

# Example: LLaMA-7B at 60% compression, fine-tuned (Stage F)
model = AutoModelForCausalLM.from_pretrained(repo, subfolder="llama7b/r04/F")
tokenizer = AutoTokenizer.from_pretrained(repo, subfolder="llama7b/r04/F")

# Example: Qwen3-14B at 80% compression, fine-tuned
model = AutoModelForCausalLM.from_pretrained(repo, subfolder="qwen3_14b/r02/F")
tokenizer = AutoTokenizer.from_pretrained(repo, subfolder="qwen3_14b/r02/F")

Directory Structure

├── llama7b/
│   ├── r02/{B,D,F}/    # 80% compression
│   ├── r04/{B,D,F}/    # 60% compression
│   ├── r06/{B,D,F}/    # 40% compression
│   └── r08/{B,D,F}/    # 20% compression
└── qwen3_14b/
    └── r02/{B,F}/       # 80% compression

Citation

@inproceedings{hlc2026,
  title={Hierarchical Low-Rank Compression for LLMs},
  author={Anonymous},
  booktitle={NeurIPS},
  year={2026}
}

License

Apache 2.0

Downloads last month: -; Downloads are not tracked for this model. How to track