PatchSVAE — Fresnel (128×128)

The Image Geometric Compression Engine: A patch-based SVD autoencoder that achieves 99.993% reconstruction fidelity on ImageNet-1K using pure deterministic geometry. Named for Augustin-Jean Fresnel's lighthouse lens — massive optical power compressed into thin concentric rings.

The Fresnel SVAE line is dedicated to images specifically for autoencoded decomposition and reconstruction.

Key Results

Dataset	Resolution	MSE	Fidelity	Params	Epochs
ImageNet-1K	128×128	0.0000734	99.993%	17M	50
ImageNet-1K	128×128	0.000206	99.98%	17M	12
TinyImageNet	64×64	0.000478	99.95%	17M	200

No KL divergence. No perceptual loss. No adversarial training.

Six linear layers, one F.normalize, one eigendecomposition, one convolution for stitching only, and a 2,272-parameter cross-attention.

Architecture

Image (B, 3, 128, 128)
  → 64 patches of 16×16
  → shared MLP encoder per patch (4 residual blocks, hidden=768)
  → (256, 16) matrix per patch
  → F.normalize(M, dim=-1)  — rows to S^15
  → SVD via fp64 Gram + eigh
  → 64 spectral vectors S ∈ ℝ^16
  → 2-layer spectral cross-attention (learned per-mode α)
  → coordinated S + per-patch U, Vt
  → shared MLP decoder per patch (4 residual blocks)
  → stitch patches → boundary smooth
  → Reconstructed image (B, 3, 128, 128)

Latent shape: (B, 16, 8, 8) = 1,024 values — 48:1 compression, nearly lossless.

Usage

Uses the geolip-core repo for EIGH optimization, not currently supporting the SVD triton optimizations yet.

!pip install "git+https://github.com/AbstractEyes/geolip-core.git"

from transformers import AutoModel
import torch

model = AutoModel.from_pretrained("AbstractPhil/svae-fresnel-128", trust_remote_code=True)

# Full reconstruction
output = model(images)
recon = output.recon          # (B, 3, 128, 128)
latent = output.latent        # (B, 16, 8, 8) — omega tokens

# Encode only (for downstream tasks)
omega_tokens = model.encode(images)  # (B, 16, 8, 8)

# Full SVD decomposition
svd = model.encode_full(images)
# svd['U'], svd['S'], svd['Vt'], svd['M'] per patch

# Decode from omega tokens (requires U, Vt for lossless)
recon = model.decode(omega_tokens, U=svd['U'], Vt=svd['Vt'])

Example Script

This works in practice, the usage script may not.

"""Fresnel 128×128 — AutoModel Inference Test"""

import torch
import torch.nn.functional as F
import torchvision.transforms as T
from transformers import AutoModel
from datasets import load_dataset
import matplotlib.pyplot as plt
import numpy as np

REPO = "AbstractPhil/svae-fresnel-128"
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

# ── Load model ──
print(f"Loading Fresnel from {REPO}...")
model = AutoModel.from_pretrained(REPO, trust_remote_code=True).to(DEVICE).eval()
config = model.config
print(f"  Params: {sum(p.numel() for p in model.parameters()):,}")
print(f"  Latent: ({config.latent_channels}, {config.latent_size}, {config.latent_size})")

# ── Grab 4 images via streaming ──
transform = T.Compose([
    T.ToTensor(),
    T.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
])
denorm_mean = torch.tensor([0.485, 0.456, 0.406]).reshape(1, 3, 1, 1).to(DEVICE)
denorm_std = torch.tensor([0.229, 0.224, 0.225]).reshape(1, 3, 1, 1).to(DEVICE)
def denorm(t):
    return (t * denorm_std + denorm_mean).clamp(0, 1).cpu()

ds = load_dataset('benjamin-paine/imagenet-1k-128x128', split='validation', streaming=True)
images = []
for i, sample in enumerate(ds):
    img = sample['image'].convert('RGB')
    images.append(transform(img))
    if i >= 3:
        break

batch = torch.stack(images).to(DEVICE)
print(f"  Batch: {batch.shape}")

# ── Full reconstruction ──
with torch.no_grad():
    output = model(batch)

recon = output["recon"]
latent = output["latent"]
mse = F.mse_loss(recon, batch).item()
print(f"\n  Recon MSE: {mse:.6f} ({(1-mse)*100:.3f}% fidelity)")
print(f"  Latent: {latent.shape} — {batch.numel()//latent.numel()}:1 compression")

# ── Encode omega tokens ──
with torch.no_grad():
    omega = model.encode(batch)
print(f"  Omega: {omega.shape}, mean={omega.mean():.3f}, std={omega.std():.3f}")

# ── Full SVD ──
with torch.no_grad():
    svd = model.encode_full(batch)
S = svd['S'][0].mean(0)
print(f"\n  Spectrum (mean over patches):")
for i in range(len(S)):
    print(f"    S[{i:2d}]: {S[i]:.4f}  {'#' * int(S[i].item() * 8)}")

# ── Lossless round-trip ──
with torch.no_grad():
    lossless = model.decode(latent, U=svd['U'], Vt=svd['Vt'])
print(f"\n  Lossless MSE: {F.mse_loss(lossless, batch).item():.6f}")

# ── Visualize ──
n = len(images)
fig, axes = plt.subplots(n, 4, figsize=(12, 3*n))
for i in range(n):
    axes[i,0].imshow(denorm(batch[i:i+1])[0].permute(1,2,0).numpy())
    axes[i,1].imshow(denorm(recon[i:i+1])[0].permute(1,2,0).numpy())
    axes[i,2].imshow((denorm(batch[i:i+1])-denorm(recon[i:i+1])).abs()[0].permute(1,2,0).numpy()*10)
    omega_vis = omega[i,:3].cpu()
    omega_vis = (omega_vis - omega_vis.min()) / (omega_vis.max() - omega_vis.min() + 1e-8)
    axes[i,3].imshow(omega_vis.permute(1,2,0).numpy())
    for j in range(4):
        axes[i,j].axis('off')
axes[0,0].set_title('Original')
axes[0,1].set_title('Recon')
axes[0,2].set_title('|Err|×10')
axes[0,3].set_title('Omega (ch0-2)')
plt.suptitle(f"Fresnel 128×128 — MSE={mse:.6f}", y=1.02)
plt.tight_layout()
plt.savefig('fresnel_inference.png', dpi=150, bbox_inches='tight')
plt.show()

Geometric Properties

The model exhibits structural geometric attractors:

S₀ ≈ 5.16: Dominant singular value, constant across all classes
Ratio ≈ 1.58: S₀/S_D spectral contrast
Erank ≈ 15.9: Effective rank (of 16), all modes active
CV → 0.29: Row coefficient of variation drifts toward the binding constant

These properties are determined by the (V=256, D=16) matrix structure on S^15, not by the data. The data inhabits the geometry; it doesn't define it.

Omega Tokens

The spectral vectors S from each patch constitute omega tokens: modality-agnostic, geometrically structured representations suitable for:

Diffusion models: Latent (16, 8, 8) is 16× smaller than SD1.5's (4, 64, 64)
Cross-modal alignment: Same (V, D) structure works for any modality
Classification: Spectrum encodes content, directions encode structure
Retrieval: Geometric distance in spectral space

Training

Trained with:

Soft hand loss: Adaptive reweighting that boosts reconstruction near target CV
Sphere normalization: F.normalize(M, dim=-1) — the single most important line
fp64 SVD: Gram matrix + eigendecomposition in float64 for numerical stability
Gradient clipping: Cross-attention only, max_norm=0.5
Adam: lr=1e-4, cosine annealing
No data augmentation

Citation

@misc{patchsvae2026,
  title={The Geometric Engine: Structural Attractors in Neural Network Weight Space},
  author={AbstractPhil},
  year={2026},
  url={https://huggingface.co/AbstractPhil/svae-fresnel-128}
}

License

Apache 2.0

Downloads last month: 106

Collection including AbstractPhil/svae-fresnel-128

GEOLIP Research Concepts

Collection

A series of repos dedicated to geolip research and results, some with stored weights, some without. All entirely based on the progress of geolip. • 19 items • Updated 4 days ago