Explore popular datasets that you can leverage for your next machine learning project.
A curated dataset of 161M+ images across 324K species for biodiversity AI, built from iNaturalist research-grade observations.
A massive citizen-science biodiversity dataset with millions of species photos, rich spatial/temporal metadata, and fine-grained labels for computer vision research.
A benchmark for evaluating text-to-image retrieval on expert-level ecological queries, built from iNaturalist data with domain-specific relevance judgments.
A classic image classification benchmark of 60,000 32x32 color images across 10 classes, widely used for evaluating computer vision models.
High-resolution leaf images with pixel-wise vein annotations across 36 leaf types, designed for segmentation and plant phenotyping research.
A large open collection of free eBooks and audiobooks useful for NLP tasks like language modeling, text classification, and speech synthesis.