đŸ”¬ BioBench Leaderboard

GitHub arXiv (soon)

ImageNet-1K scores no longer predict how models behave on real-world ecology tasks. BioBench measures what actually matters: 9 field tasks, 6 imaging modalities, 3.1M images, continuously updated with new models.

At a glance

Leaderboard

Why this exists

Web-photo benchmarks reward features that don’t transfer to camera traps, drone RGB, microscope micrographs, or specimen shots. Above 75 % ImageNet top-1, model rankings on ecology tasks become noise. BioBench replaces proxy metrics with direct measurement.

How to reproduce

  1. git clone https://github.com/samuelstevens/biobench.git
  2. uv run benchmark.py --cfgs configs/all-models.toml
  3. uv run report.py

ViT-L runs in about 1h on a single A6000; results saved as a SQLite database; report.py converts to a statistically validated JSON for easy analysis.

How to cite

@software{stevens2025biobench,
  author = {Stevens, Samuel and Gu, Jianyang},
  license = {MIT},
  title = {{BioBench}},
  url = {https://github.com/samuelstevens/biobench/}
}