Hierarchical Low-Rank Compression for LLMs

Compressed model checkpoints from the paper "Hierarchical Low-Rank Compression for LLMs" (NeurIPS 2026 submission).

Models

This repository contains compressed models organized by base model, compression ratio, and optimization stage:

Base Model Ratio Stage B (SVD) Stage D (Block-opt) Stage F (Fine-tuned)
LLaMA-7B r=0.2 (80% compressed) βœ… βœ… βœ…
LLaMA-7B r=0.4 (60% compressed) βœ… βœ… βœ…
LLaMA-7B r=0.6 (40% compressed) βœ… βœ… βœ…
LLaMA-7B r=0.8 (20% compressed) βœ… βœ… βœ…
Qwen3-14B r=0.2 (80% compressed) βœ… β€” βœ…

Stages

  • B (Base SVD): Per-matrix whitened SVD compression
  • D (Block-optimized): Block-level reconstruction optimization on top of B
  • F (Fine-tuned): End-to-end LM-loss refinement via Sequential LoRA

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load a specific model variant
repo = "zhc12/HLC-compressed-models"

# Example: LLaMA-7B at 60% compression, fine-tuned (Stage F)
model = AutoModelForCausalLM.from_pretrained(repo, subfolder="llama7b/r04/F")
tokenizer = AutoTokenizer.from_pretrained(repo, subfolder="llama7b/r04/F")

# Example: Qwen3-14B at 80% compression, fine-tuned
model = AutoModelForCausalLM.from_pretrained(repo, subfolder="qwen3_14b/r02/F")
tokenizer = AutoTokenizer.from_pretrained(repo, subfolder="qwen3_14b/r02/F")

Directory Structure

β”œβ”€β”€ llama7b/
β”‚   β”œβ”€β”€ r02/{B,D,F}/    # 80% compression
β”‚   β”œβ”€β”€ r04/{B,D,F}/    # 60% compression
β”‚   β”œβ”€β”€ r06/{B,D,F}/    # 40% compression
β”‚   └── r08/{B,D,F}/    # 20% compression
└── qwen3_14b/
    └── r02/{B,F}/       # 80% compression

Citation

@inproceedings{hlc2026,
  title={Hierarchical Low-Rank Compression for LLMs},
  author={Anonymous},
  booktitle={NeurIPS},
  year={2026}
}

License

Apache 2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support