robert_banner

CircleCI Codecov Downloads Documentation Status

Welcome to ROBERT's documentation!

ROBERT is an ensemble of automated machine learning protocols that can be run sequentially through a single command line. The program works for regression and classification problems. Comprehensive workflows have been designed to meet state-of-the-art standards for cheminformatics studies, including:

  • Atomic and molecular descriptor generation from SMILES, including an RDKit conformer sampling and the generation of 200+ steric, electronic and structural descriptors using RDKit, xTB and DBSTEP. Requires the AQME program.

  • Data curation, including filters for correlated descriptors, noise, and duplicates, as well as conversion of categorical descriptors.

  • Model selection, including comparison of multiple hyperoptimized models from scikit-learn and training sizes.

  • Prediction of external test sets, as well as SHAP and PFI feature analysis.

  • VERIFY tests to asses the predictive ability of the models, including y-shuffle and y-mean tests, k-fold cross-validation, and predictions with one-hot features.

The code has been designed for:

  • Inexperienced researchers in the field of ML. ROBERT workflows are fully automated, and provide users with comprehensive explanations of the resulting models and their prediction reliability. Moreover, ready-to-use examples and tutorials can be accessed on ReadtheDocs and YouTube.

  • ML experts aiming to automate workflows, enhance reproducibility, or save time. Entire workflows can be executed using a single command line while following modern standards of reproducibility and transparency. Additionally, individual ROBERT modules can be integrated into customized ML workflows.

Don't miss out the latest hands-on tutorials from our YouTube channel.

How to cite ROBERT

If you use any of the ROBERT modules, please include this citation:

  • Dalmau, D.; Alegre Requena, J. V. ChemRxiv, 2023, DOI: 10.26434/chemrxiv-2023-k994h.

If you use the AQME module, please include this citation:

  • Alegre-Requena et al., AQME: Automated Quantum Mechanical Environments for Researchers and Educators. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2023, 13, e1663.

Additionally, please include the corresponding reference for Scikit-learn and SHAP:

  • Pedregosa et al., Scikit-learn: Machine Learning in Python, J. Mach. Learn. Res. 2011, 12, 2825-2830.

  • Lundberg et al., From local explanations to global understanding with explainable AI for trees, Nat. Mach. Intell. 2020, 2, 56–67.

Special acknowledgments

J.V.A.R. - The acronym ROBERT is dedicated to ROBERT Paton, who was a mentor to me throughout my years at Colorado State University and who introduced me to the field of cheminformatics. Cheers mate!

D.D.G. - The style of the ROBERT_report.pdf file was created with the help of Oliver Lee (2023, Zysman-Colman group at University of St Andrews).

We really THANK all the testers for their feedback and for participating in the reproducibility tests, including:

  • David Valiente (2022-2023, Universidad Miguel Hernández)

  • Heidi Klem (2023, Paton group at Colorado State University)

  • Iñigo Iribarren (2023, Trujillo group at Trinity College Dublin)

  • Guilian Luchini (2023, Paton group at Colorado State University)

  • Alex Platt (2023, Paton group at Colorado State University)

  • Oliver Lee (2023, Zysman-Colman group at University of St Andrews)

  • Xinchun Ran (2023, Yang group at Vanderbilt University)

Video Tutorials

API Reference