Benchmarks

Explore benchmarks used to evaluate and compare ML models and AI-powered tools. Understanding how benchmarks are constructed — and their limitations — is critical for interpreting leaderboard results and making informed decisions about model selection.

Also on Nexus

  • Data: INQUIRE: A text-to-image retrieval benchmark with 250 expert-level ecological queries over 5M iNaturalist images.
  • Talk: LabelBench: A framework for benchmarking label-efficient learning combining active learning, semi-supervised learning, and transfer learning.