robert_banner

CircleCI Codecov Downloads Documentation Status


Welcome to ROBERT's documentation!

ROBERT is an ensemble of automated machine learning protocols that can be run sequentially through a single command line or a graphical user interface. The program works for regression and classification problems. Comprehensive workflows have been designed to meet state-of-the-art standards for cheminformatics studies, including:

  • Atomic and molecular descriptor generation from SMILES, including an RDKit conformer sampling and the generation of 200+ steric, electronic and structural descriptors using RDKit, xTB and ᴍᴏʀғᴇᴜs. Requires the AQME program.

  • Data curation, including filters for correlated descriptors, noise, and duplicates, as well as conversion of categorical descriptors.

  • Model selection, including the comparison of multiple hyperoptimized models using multiple cross-validation techniques. This approach mitigates overfitting in low-data regimes.

  • Prediction of external test sets, as well as SHAP and PFI feature analysis.

  • VERIFY tests to assess the predictive ability of the models, including y-shuffle, y-mean, and one-hot encoding tests.

The code has been designed for:

  • Inexperienced researchers in the field of ML. ROBERT workflows are fully automated, and provide users with comprehensive explanations of the resulting models and their prediction reliability. Moreover, ready-to-use examples and tutorials can be accessed on ReadtheDocs and YouTube.

  • ML experts aiming to automate workflows, enhance reproducibility, or save time. Entire workflows can be executed using a single command line while following modern standards of reproducibility and transparency. Additionally, individual ROBERT modules can be integrated into customized ML workflows.

Don't miss out the latest hands-on tutorials from our YouTube channel.

How to cite ROBERT

If you use any of the ROBERT modules, please include this citation:

  • Dalmau, D.; Alegre Requena, J. V. ROBERT: Bridging the Gap between Machine Learning and Chemistry. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2024, 14, e1733.

If you use the AQME module, please include this citation:

  • Alegre-Requena et al., AQME: Automated Quantum Mechanical Environments for Researchers and Educators. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2023, 13, e1663.

Additionally, please include the corresponding reference for Scikit-learn, SHAP and BayesianOptimization:

  • Pedregosa et al., Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825-2830.

  • Lundberg et al., From local explanations to global understanding with explainable AI for trees, Nat. Mach. Intell. 2020, 2, 56–67.

  • Fernando Nogueira, {Bayesian Optimization}: Open source constrained global optimization tool for {Python}, 2014, https://github.com/bayesian-optimization/BayesianOptimization

Special acknowledgments

J.V.A.R. - The acronym ROBERT is dedicated to ROBERT Paton, who was a mentor to me throughout my years at Colorado State University and who introduced me to the field of cheminformatics. Cheers mate!

D.D.G. - The style of the ROBERT_report.pdf file was created with the help of Oliver Lee (2023, Zysman-Colman group at University of St Andrews).

J.V.A.R. and D.D.G. - The improvements from v1.0 to v1.2 are largely the result of insightful discussions with Matthew Sigman and his students, Jamie Cadge and Simone Gallarati (2024, University of Utah).

We really THANK all the testers for their feedback and for participating in the reproducibility tests, including:

  • David Valiente (2022-2023, Universidad Miguel Hernández)

  • Heidi Klem (2023, Paton group at Colorado State University)

  • Iñigo Iribarren (2023, Trujillo group at Trinity College Dublin)

  • Guilian Luchini (2023, Paton group at Colorado State University)

  • Alex Platt (2023, Paton group at Colorado State University)

  • Oliver Lee (2023, Zysman-Colman group at University of St Andrews)

  • Xinchun Ran (2023, Yang group at Vanderbilt University)

Video Tutorials

API Reference