Commit ·
6eab2a4
1
Parent(s): 586b282
Update README.md (#7)
Browse files- Update README.md (4a48e83d89feea890c2bf429a588b5377e34b6e8)
Co-authored-by: Saarthak Kapse <Saarthak-GenBio-AI@users.noreply.huggingface.co>
README.md
CHANGED
|
@@ -24,6 +24,41 @@ For more details:
|
|
| 24 |
* [GenBio AI Blog Post](https://genbio.ai/genbio-pathfm)
|
| 25 |
* [Paper](https://www.biorxiv.org/content/10.64898/2026.03.17.712534v1)
|
| 26 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
## Abstract
|
| 28 |
|
| 29 |
Recent advancements in histopathology foundation models (FMs) have largely been driven by scaling the training data, often utilizing massive proprietary datasets. However, the long-tailed distribution of morphological features in whole-slide images (WSIs) makes simple scaling inefficient, as common morphologies dominate the learning signal. We introduce GenBio-PathFM, a 1.1B-parameter FM that achieves state-of-the-art performance on public benchmarks while using a fraction of the training data required by current leading models. The efficiency of GenBio-PathFM is underpinned by two primary innovations: an automated data curation pipeline that prioritizes morphological diversity and a novel dual-stage learning strategy which we term JEDI (**JE**PA + **DI**NO). Across the THUNDER, HEST, and PathoROB benchmarks, GenBio-PathFM demonstrates state-of-the-art accuracy and robustness. GenBio-PathFM is the strongest open-weight model to date and the only state-of-the-art model trained exclusively on public data.
|
|
|
|
| 24 |
* [GenBio AI Blog Post](https://genbio.ai/genbio-pathfm)
|
| 25 |
* [Paper](https://www.biorxiv.org/content/10.64898/2026.03.17.712534v1)
|
| 26 |
|
| 27 |
+
## Usage
|
| 28 |
+
|
| 29 |
+
GenBio-PathFM can be loaded directly via HuggingFace `AutoModel` (tested on `transformers==4.57.1`):
|
| 30 |
+
|
| 31 |
+
```python
|
| 32 |
+
from transformers import AutoModel
|
| 33 |
+
from torchvision import transforms
|
| 34 |
+
|
| 35 |
+
# Load model
|
| 36 |
+
model = AutoModel.from_pretrained("genbio-ai/genbio-pathfm", trust_remote_code=True)
|
| 37 |
+
model.eval()
|
| 38 |
+
|
| 39 |
+
# Transform
|
| 40 |
+
transform = transforms.Compose([
|
| 41 |
+
transforms.Resize((224, 224)),
|
| 42 |
+
transforms.ToTensor(),
|
| 43 |
+
transforms.Normalize(
|
| 44 |
+
mean=(0.697, 0.575, 0.728),
|
| 45 |
+
std=(0.188, 0.240, 0.187),
|
| 46 |
+
),
|
| 47 |
+
])
|
| 48 |
+
|
| 49 |
+
# Inference
|
| 50 |
+
import torch
|
| 51 |
+
from PIL import Image
|
| 52 |
+
|
| 53 |
+
image = Image.open("path/to/image.png").convert("RGB")
|
| 54 |
+
x = transform(image).unsqueeze(0) # [1, 3, 224, 224]
|
| 55 |
+
|
| 56 |
+
with torch.no_grad():
|
| 57 |
+
cls_features = model(x) # [1, 4608]
|
| 58 |
+
# Or with patch tokens:
|
| 59 |
+
cls_features, patch_features = model.forward_with_patches(x) # [1, 4608], [1, 196, 4608]
|
| 60 |
+
```
|
| 61 |
+
|
| 62 |
## Abstract
|
| 63 |
|
| 64 |
Recent advancements in histopathology foundation models (FMs) have largely been driven by scaling the training data, often utilizing massive proprietary datasets. However, the long-tailed distribution of morphological features in whole-slide images (WSIs) makes simple scaling inefficient, as common morphologies dominate the learning signal. We introduce GenBio-PathFM, a 1.1B-parameter FM that achieves state-of-the-art performance on public benchmarks while using a fraction of the training data required by current leading models. The efficiency of GenBio-PathFM is underpinned by two primary innovations: an automated data curation pipeline that prioritizes morphological diversity and a novel dual-stage learning strategy which we term JEDI (**JE**PA + **DI**NO). Across the THUNDER, HEST, and PathoROB benchmarks, GenBio-PathFM demonstrates state-of-the-art accuracy and robustness. GenBio-PathFM is the strongest open-weight model to date and the only state-of-the-art model trained exclusively on public data.
|