MaveDB: Protein Variant Effect Prediction

Projects
ML Marathon
MLM25
Deep learning
Protein language models
Foundation models
Author

Chris Endemann

Published

September 11, 2025

The MaveDB challenge was featured in the 2025 Machine Learning Marathon (MLM25). Participants explored protein language models and other ML methods to predict variant effects using data from MaveDB, an open-source database of multiplexed assays of variant effect (MAVEs) containing over 7 million variant effect measurements.

Challenge design

  • Task: Predict the functional impact of protein variants using deep mutational scanning data.
  • Domain: Computational biology – understanding how single amino acid changes affect protein function is critical for clinical variant interpretation and protein engineering.
  • Methods: Protein language models (e.g., ESM), fine-tuning strategies, and variant effect predictors.

Comments