PatchSVAE β Fresnel (128Γ128)
The Image Geometric Compression Engine: A patch-based SVD autoencoder that achieves 99.993% reconstruction fidelity on ImageNet-1K using pure deterministic geometry. Named for Augustin-Jean Fresnel's lighthouse lens β massive optical power compressed into thin concentric rings.
The Fresnel SVAE line is dedicated to images specifically for autoencoded decomposition and reconstruction.
Key Results
| Dataset | Resolution | MSE | Fidelity | Params | Epochs |
|---|---|---|---|---|---|
| ImageNet-1K | 128Γ128 | 0.0000734 | 99.993% | 17M | 50 |
| ImageNet-1K | 128Γ128 | 0.000206 | 99.98% | 17M | 12 |
| TinyImageNet | 64Γ64 | 0.000478 | 99.95% | 17M | 200 |
No KL divergence. No perceptual loss. No adversarial training.
Six linear layers, one F.normalize, one eigendecomposition, one convolution for stitching only, and a 2,272-parameter cross-attention.
Architecture
Image (B, 3, 128, 128)
β 64 patches of 16Γ16
β shared MLP encoder per patch (4 residual blocks, hidden=768)
β (256, 16) matrix per patch
β F.normalize(M, dim=-1) β rows to S^15
β SVD via fp64 Gram + eigh
β 64 spectral vectors S β β^16
β 2-layer spectral cross-attention (learned per-mode Ξ±)
β coordinated S + per-patch U, Vt
β shared MLP decoder per patch (4 residual blocks)
β stitch patches β boundary smooth
β Reconstructed image (B, 3, 128, 128)
Latent shape: (B, 16, 8, 8) = 1,024 values β 48:1 compression, nearly lossless.
Usage
Uses the geolip-core repo for EIGH optimization, not currently supporting the SVD triton optimizations yet.
!pip install "git+https://github.com/AbstractEyes/geolip-core.git"
from transformers import AutoModel
import torch
model = AutoModel.from_pretrained("AbstractPhil/svae-fresnel-128", trust_remote_code=True)
# Full reconstruction
output = model(images)
recon = output.recon # (B, 3, 128, 128)
latent = output.latent # (B, 16, 8, 8) β omega tokens
# Encode only (for downstream tasks)
omega_tokens = model.encode(images) # (B, 16, 8, 8)
# Full SVD decomposition
svd = model.encode_full(images)
# svd['U'], svd['S'], svd['Vt'], svd['M'] per patch
# Decode from omega tokens (requires U, Vt for lossless)
recon = model.decode(omega_tokens, U=svd['U'], Vt=svd['Vt'])
Example Script
This works in practice, the usage script may not.
"""Fresnel 128Γ128 β AutoModel Inference Test"""
import torch
import torch.nn.functional as F
import torchvision.transforms as T
from transformers import AutoModel
from datasets import load_dataset
import matplotlib.pyplot as plt
import numpy as np
REPO = "AbstractPhil/svae-fresnel-128"
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
# ββ Load model ββ
print(f"Loading Fresnel from {REPO}...")
model = AutoModel.from_pretrained(REPO, trust_remote_code=True).to(DEVICE).eval()
config = model.config
print(f" Params: {sum(p.numel() for p in model.parameters()):,}")
print(f" Latent: ({config.latent_channels}, {config.latent_size}, {config.latent_size})")
# ββ Grab 4 images via streaming ββ
transform = T.Compose([
T.ToTensor(),
T.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
])
denorm_mean = torch.tensor([0.485, 0.456, 0.406]).reshape(1, 3, 1, 1).to(DEVICE)
denorm_std = torch.tensor([0.229, 0.224, 0.225]).reshape(1, 3, 1, 1).to(DEVICE)
def denorm(t):
return (t * denorm_std + denorm_mean).clamp(0, 1).cpu()
ds = load_dataset('benjamin-paine/imagenet-1k-128x128', split='validation', streaming=True)
images = []
for i, sample in enumerate(ds):
img = sample['image'].convert('RGB')
images.append(transform(img))
if i >= 3:
break
batch = torch.stack(images).to(DEVICE)
print(f" Batch: {batch.shape}")
# ββ Full reconstruction ββ
with torch.no_grad():
output = model(batch)
recon = output["recon"]
latent = output["latent"]
mse = F.mse_loss(recon, batch).item()
print(f"\n Recon MSE: {mse:.6f} ({(1-mse)*100:.3f}% fidelity)")
print(f" Latent: {latent.shape} β {batch.numel()//latent.numel()}:1 compression")
# ββ Encode omega tokens ββ
with torch.no_grad():
omega = model.encode(batch)
print(f" Omega: {omega.shape}, mean={omega.mean():.3f}, std={omega.std():.3f}")
# ββ Full SVD ββ
with torch.no_grad():
svd = model.encode_full(batch)
S = svd['S'][0].mean(0)
print(f"\n Spectrum (mean over patches):")
for i in range(len(S)):
print(f" S[{i:2d}]: {S[i]:.4f} {'#' * int(S[i].item() * 8)}")
# ββ Lossless round-trip ββ
with torch.no_grad():
lossless = model.decode(latent, U=svd['U'], Vt=svd['Vt'])
print(f"\n Lossless MSE: {F.mse_loss(lossless, batch).item():.6f}")
# ββ Visualize ββ
n = len(images)
fig, axes = plt.subplots(n, 4, figsize=(12, 3*n))
for i in range(n):
axes[i,0].imshow(denorm(batch[i:i+1])[0].permute(1,2,0).numpy())
axes[i,1].imshow(denorm(recon[i:i+1])[0].permute(1,2,0).numpy())
axes[i,2].imshow((denorm(batch[i:i+1])-denorm(recon[i:i+1])).abs()[0].permute(1,2,0).numpy()*10)
omega_vis = omega[i,:3].cpu()
omega_vis = (omega_vis - omega_vis.min()) / (omega_vis.max() - omega_vis.min() + 1e-8)
axes[i,3].imshow(omega_vis.permute(1,2,0).numpy())
for j in range(4):
axes[i,j].axis('off')
axes[0,0].set_title('Original')
axes[0,1].set_title('Recon')
axes[0,2].set_title('|Err|Γ10')
axes[0,3].set_title('Omega (ch0-2)')
plt.suptitle(f"Fresnel 128Γ128 β MSE={mse:.6f}", y=1.02)
plt.tight_layout()
plt.savefig('fresnel_inference.png', dpi=150, bbox_inches='tight')
plt.show()
Geometric Properties
The model exhibits structural geometric attractors:
- Sβ β 5.16: Dominant singular value, constant across all classes
- Ratio β 1.58: Sβ/S_D spectral contrast
- Erank β 15.9: Effective rank (of 16), all modes active
- CV β 0.29: Row coefficient of variation drifts toward the binding constant
These properties are determined by the (V=256, D=16) matrix structure on S^15, not by the data. The data inhabits the geometry; it doesn't define it.
Omega Tokens
The spectral vectors S from each patch constitute omega tokens: modality-agnostic, geometrically structured representations suitable for:
- Diffusion models: Latent
(16, 8, 8)is 16Γ smaller than SD1.5's(4, 64, 64) - Cross-modal alignment: Same (V, D) structure works for any modality
- Classification: Spectrum encodes content, directions encode structure
- Retrieval: Geometric distance in spectral space
Training
Trained with:
- Soft hand loss: Adaptive reweighting that boosts reconstruction near target CV
- Sphere normalization:
F.normalize(M, dim=-1)β the single most important line - fp64 SVD: Gram matrix + eigendecomposition in float64 for numerical stability
- Gradient clipping: Cross-attention only, max_norm=0.5
- Adam: lr=1e-4, cosine annealing
- No data augmentation
Citation
@misc{patchsvae2026,
title={The Geometric Engine: Structural Attractors in Neural Network Weight Space},
author={AbstractPhil},
year={2026},
url={https://huggingface.co/AbstractPhil/svae-fresnel-128}
}
License
Apache 2.0
- Downloads last month
- 106