MaveDB: Protein Variant Effect Prediction

Projects
ML Marathon
MLM25
Deep learning
Protein language models
Bioinformatics
Foundation models
Author

Chris Endemann

Published

September 11, 2025

The MaveDB challenge was featured in the 2025 Machine Learning Marathon (MLM25). Participants explored protein language models and other ML methods to predict variant effects using data from MaveDB, an open-source database of multiplexed assays of variant effect (MAVEs) containing over 7 million variant effect measurements.

Challenge design

  • Task: Predict the functional impact of protein variants using deep mutational scanning data.
  • Domain: Computational biology – understanding how single amino acid changes affect protein function is critical for clinical variant interpretation and protein engineering.
  • Methods: Protein language models (e.g., ESM), fine-tuning strategies, and variant effect predictors.

Questions

If you have any lingering questions about this project, please feel free to post to the Nexus Q&A on GitHub. We will improve materials on this website as additional questions come in.