At a glance
- 9 tasks · 4 kingdoms · 6 imaging modalities
- 3.1 M images · 337 GB total
- Evaluation: frozen backbone → simple
scikit-learn
-style classifier → macro-F1
Leaderboard
Why this exists
Web-photo benchmarks reward features that don’t transfer to camera traps, drone RGB, microscope micrographs, or specimen shots. Above 75 % ImageNet top-1, model rankings on ecology tasks become noise. BioBench replaces proxy metrics with direct measurement.
How to reproduce
git clone https://github.com/samuelstevens/biobench.git
uv run benchmark.py --cfgs configs/all-models.toml
uv run report.py
ViT-L runs in about 1h on a single A6000; results saved as a SQLite database; report.py
converts to a statistically validated JSON for easy analysis.
How to cite
@software{stevens2025biobench, author = {Stevens, Samuel and Gu, Jianyang}, license = {MIT}, title = {{BioBench}}, url = {https://github.com/samuelstevens/biobench/} }