PatchSVAE β€” Fresnel (128Γ—128)

The Image Geometric Compression Engine: A patch-based SVD autoencoder that achieves 99.993% reconstruction fidelity on ImageNet-1K using pure deterministic geometry. Named for Augustin-Jean Fresnel's lighthouse lens β€” massive optical power compressed into thin concentric rings.

The Fresnel SVAE line is dedicated to images specifically for autoencoded decomposition and reconstruction.

Key Results

Dataset Resolution MSE Fidelity Params Epochs
ImageNet-1K 128Γ—128 0.0000734 99.993% 17M 50
ImageNet-1K 128Γ—128 0.000206 99.98% 17M 12
TinyImageNet 64Γ—64 0.000478 99.95% 17M 200

No KL divergence. No perceptual loss. No adversarial training.

Six linear layers, one F.normalize, one eigendecomposition, one convolution for stitching only, and a 2,272-parameter cross-attention.

Architecture

Image (B, 3, 128, 128)
  β†’ 64 patches of 16Γ—16
  β†’ shared MLP encoder per patch (4 residual blocks, hidden=768)
  β†’ (256, 16) matrix per patch
  β†’ F.normalize(M, dim=-1)  β€” rows to S^15
  β†’ SVD via fp64 Gram + eigh
  β†’ 64 spectral vectors S ∈ ℝ^16
  β†’ 2-layer spectral cross-attention (learned per-mode Ξ±)
  β†’ coordinated S + per-patch U, Vt
  β†’ shared MLP decoder per patch (4 residual blocks)
  β†’ stitch patches β†’ boundary smooth
  β†’ Reconstructed image (B, 3, 128, 128)

Latent shape: (B, 16, 8, 8) = 1,024 values β€” 48:1 compression, nearly lossless.

Usage

Uses the geolip-core repo for EIGH optimization, not currently supporting the SVD triton optimizations yet.

!pip install "git+https://github.com/AbstractEyes/geolip-core.git"
from transformers import AutoModel
import torch

model = AutoModel.from_pretrained("AbstractPhil/svae-fresnel-128", trust_remote_code=True)

# Full reconstruction
output = model(images)
recon = output.recon          # (B, 3, 128, 128)
latent = output.latent        # (B, 16, 8, 8) β€” omega tokens

# Encode only (for downstream tasks)
omega_tokens = model.encode(images)  # (B, 16, 8, 8)

# Full SVD decomposition
svd = model.encode_full(images)
# svd['U'], svd['S'], svd['Vt'], svd['M'] per patch

# Decode from omega tokens (requires U, Vt for lossless)
recon = model.decode(omega_tokens, U=svd['U'], Vt=svd['Vt'])

Example Script

This works in practice, the usage script may not.

"""Fresnel 128Γ—128 β€” AutoModel Inference Test"""

import torch
import torch.nn.functional as F
import torchvision.transforms as T
from transformers import AutoModel
from datasets import load_dataset
import matplotlib.pyplot as plt
import numpy as np

REPO = "AbstractPhil/svae-fresnel-128"
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

# ── Load model ──
print(f"Loading Fresnel from {REPO}...")
model = AutoModel.from_pretrained(REPO, trust_remote_code=True).to(DEVICE).eval()
config = model.config
print(f"  Params: {sum(p.numel() for p in model.parameters()):,}")
print(f"  Latent: ({config.latent_channels}, {config.latent_size}, {config.latent_size})")

# ── Grab 4 images via streaming ──
transform = T.Compose([
    T.ToTensor(),
    T.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
])
denorm_mean = torch.tensor([0.485, 0.456, 0.406]).reshape(1, 3, 1, 1).to(DEVICE)
denorm_std = torch.tensor([0.229, 0.224, 0.225]).reshape(1, 3, 1, 1).to(DEVICE)
def denorm(t):
    return (t * denorm_std + denorm_mean).clamp(0, 1).cpu()

ds = load_dataset('benjamin-paine/imagenet-1k-128x128', split='validation', streaming=True)
images = []
for i, sample in enumerate(ds):
    img = sample['image'].convert('RGB')
    images.append(transform(img))
    if i >= 3:
        break

batch = torch.stack(images).to(DEVICE)
print(f"  Batch: {batch.shape}")

# ── Full reconstruction ──
with torch.no_grad():
    output = model(batch)

recon = output["recon"]
latent = output["latent"]
mse = F.mse_loss(recon, batch).item()
print(f"\n  Recon MSE: {mse:.6f} ({(1-mse)*100:.3f}% fidelity)")
print(f"  Latent: {latent.shape} β€” {batch.numel()//latent.numel()}:1 compression")

# ── Encode omega tokens ──
with torch.no_grad():
    omega = model.encode(batch)
print(f"  Omega: {omega.shape}, mean={omega.mean():.3f}, std={omega.std():.3f}")

# ── Full SVD ──
with torch.no_grad():
    svd = model.encode_full(batch)
S = svd['S'][0].mean(0)
print(f"\n  Spectrum (mean over patches):")
for i in range(len(S)):
    print(f"    S[{i:2d}]: {S[i]:.4f}  {'#' * int(S[i].item() * 8)}")

# ── Lossless round-trip ──
with torch.no_grad():
    lossless = model.decode(latent, U=svd['U'], Vt=svd['Vt'])
print(f"\n  Lossless MSE: {F.mse_loss(lossless, batch).item():.6f}")

# ── Visualize ──
n = len(images)
fig, axes = plt.subplots(n, 4, figsize=(12, 3*n))
for i in range(n):
    axes[i,0].imshow(denorm(batch[i:i+1])[0].permute(1,2,0).numpy())
    axes[i,1].imshow(denorm(recon[i:i+1])[0].permute(1,2,0).numpy())
    axes[i,2].imshow((denorm(batch[i:i+1])-denorm(recon[i:i+1])).abs()[0].permute(1,2,0).numpy()*10)
    omega_vis = omega[i,:3].cpu()
    omega_vis = (omega_vis - omega_vis.min()) / (omega_vis.max() - omega_vis.min() + 1e-8)
    axes[i,3].imshow(omega_vis.permute(1,2,0).numpy())
    for j in range(4):
        axes[i,j].axis('off')
axes[0,0].set_title('Original')
axes[0,1].set_title('Recon')
axes[0,2].set_title('|Err|Γ—10')
axes[0,3].set_title('Omega (ch0-2)')
plt.suptitle(f"Fresnel 128Γ—128 β€” MSE={mse:.6f}", y=1.02)
plt.tight_layout()
plt.savefig('fresnel_inference.png', dpi=150, bbox_inches='tight')
plt.show()

Geometric Properties

The model exhibits structural geometric attractors:

  • Sβ‚€ β‰ˆ 5.16: Dominant singular value, constant across all classes
  • Ratio β‰ˆ 1.58: Sβ‚€/S_D spectral contrast
  • Erank β‰ˆ 15.9: Effective rank (of 16), all modes active
  • CV β†’ 0.29: Row coefficient of variation drifts toward the binding constant

These properties are determined by the (V=256, D=16) matrix structure on S^15, not by the data. The data inhabits the geometry; it doesn't define it.

Omega Tokens

The spectral vectors S from each patch constitute omega tokens: modality-agnostic, geometrically structured representations suitable for:

  • Diffusion models: Latent (16, 8, 8) is 16Γ— smaller than SD1.5's (4, 64, 64)
  • Cross-modal alignment: Same (V, D) structure works for any modality
  • Classification: Spectrum encodes content, directions encode structure
  • Retrieval: Geometric distance in spectral space

Training

Trained with:

  • Soft hand loss: Adaptive reweighting that boosts reconstruction near target CV
  • Sphere normalization: F.normalize(M, dim=-1) β€” the single most important line
  • fp64 SVD: Gram matrix + eigendecomposition in float64 for numerical stability
  • Gradient clipping: Cross-attention only, max_norm=0.5
  • Adam: lr=1e-4, cosine annealing
  • No data augmentation

Citation

@misc{patchsvae2026,
  title={The Geometric Engine: Structural Attractors in Neural Network Weight Space},
  author={AbstractPhil},
  year={2026},
  url={https://huggingface.co/AbstractPhil/svae-fresnel-128}
}

License

Apache 2.0

Downloads last month
106
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including AbstractPhil/svae-fresnel-128