elijahcole Saarthak-GenBio-AI commited on
Commit
6eab2a4
·
1 Parent(s): 586b282

Update README.md (#7)

Browse files

- Update README.md (4a48e83d89feea890c2bf429a588b5377e34b6e8)


Co-authored-by: Saarthak Kapse <Saarthak-GenBio-AI@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +35 -0
README.md CHANGED
@@ -24,6 +24,41 @@ For more details:
24
  * [GenBio AI Blog Post](https://genbio.ai/genbio-pathfm)
25
  * [Paper](https://www.biorxiv.org/content/10.64898/2026.03.17.712534v1)
26
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
  ## Abstract
28
 
29
  Recent advancements in histopathology foundation models (FMs) have largely been driven by scaling the training data, often utilizing massive proprietary datasets. However, the long-tailed distribution of morphological features in whole-slide images (WSIs) makes simple scaling inefficient, as common morphologies dominate the learning signal. We introduce GenBio-PathFM, a 1.1B-parameter FM that achieves state-of-the-art performance on public benchmarks while using a fraction of the training data required by current leading models. The efficiency of GenBio-PathFM is underpinned by two primary innovations: an automated data curation pipeline that prioritizes morphological diversity and a novel dual-stage learning strategy which we term JEDI (**JE**PA + **DI**NO). Across the THUNDER, HEST, and PathoROB benchmarks, GenBio-PathFM demonstrates state-of-the-art accuracy and robustness. GenBio-PathFM is the strongest open-weight model to date and the only state-of-the-art model trained exclusively on public data.
 
24
  * [GenBio AI Blog Post](https://genbio.ai/genbio-pathfm)
25
  * [Paper](https://www.biorxiv.org/content/10.64898/2026.03.17.712534v1)
26
 
27
+ ## Usage
28
+
29
+ GenBio-PathFM can be loaded directly via HuggingFace `AutoModel` (tested on `transformers==4.57.1`):
30
+
31
+ ```python
32
+ from transformers import AutoModel
33
+ from torchvision import transforms
34
+
35
+ # Load model
36
+ model = AutoModel.from_pretrained("genbio-ai/genbio-pathfm", trust_remote_code=True)
37
+ model.eval()
38
+
39
+ # Transform
40
+ transform = transforms.Compose([
41
+ transforms.Resize((224, 224)),
42
+ transforms.ToTensor(),
43
+ transforms.Normalize(
44
+ mean=(0.697, 0.575, 0.728),
45
+ std=(0.188, 0.240, 0.187),
46
+ ),
47
+ ])
48
+
49
+ # Inference
50
+ import torch
51
+ from PIL import Image
52
+
53
+ image = Image.open("path/to/image.png").convert("RGB")
54
+ x = transform(image).unsqueeze(0) # [1, 3, 224, 224]
55
+
56
+ with torch.no_grad():
57
+ cls_features = model(x) # [1, 4608]
58
+ # Or with patch tokens:
59
+ cls_features, patch_features = model.forward_with_patches(x) # [1, 4608], [1, 196, 4608]
60
+ ```
61
+
62
  ## Abstract
63
 
64
  Recent advancements in histopathology foundation models (FMs) have largely been driven by scaling the training data, often utilizing massive proprietary datasets. However, the long-tailed distribution of morphological features in whole-slide images (WSIs) makes simple scaling inefficient, as common morphologies dominate the learning signal. We introduce GenBio-PathFM, a 1.1B-parameter FM that achieves state-of-the-art performance on public benchmarks while using a fraction of the training data required by current leading models. The efficiency of GenBio-PathFM is underpinned by two primary innovations: an automated data curation pipeline that prioritizes morphological diversity and a novel dual-stage learning strategy which we term JEDI (**JE**PA + **DI**NO). Across the THUNDER, HEST, and PathoROB benchmarks, GenBio-PathFM demonstrates state-of-the-art accuracy and robustness. GenBio-PathFM is the strongest open-weight model to date and the only state-of-the-art model trained exclusively on public data.