Welcome to ROBERT's documentation!
ROBERT is an ensemble of automated machine learning protocols that can be run sequentially through a single command line. The program works for regression and classification problems. Comprehensive workflows have been designed to meet state-of-the-art standards for cheminformatics studies, including:
Atomic and molecular descriptor generation from SMILES, including an RDKit conformer sampling and the generation of 200+ steric, electronic and structural descriptors using RDKit, xTB and DBSTEP. Requires the AQME program.
Data curation, including filters for correlated descriptors, noise, and duplicates, as well as conversion of categorical descriptors.
Model selection, including comparison of multiple hyperoptimized models from scikit-learn and training sizes.
Prediction of external test sets, as well as SHAP and PFI feature analysis.
VERIFY tests to asses the predictive ability of the models, including y-shuffle and y-mean tests, k-fold cross-validation, and predictions with one-hot features.
The code has been designed for:
Inexperienced researchers in the field of ML. ROBERT workflows are fully automated, and provide users with comprehensive explanations of the resulting models and their prediction reliability. Moreover, ready-to-use examples and tutorials can be accessed on ReadtheDocs and YouTube.
ML experts aiming to automate workflows, enhance reproducibility, or save time. Entire workflows can be executed using a single command line while following modern standards of reproducibility and transparency. Additionally, individual ROBERT modules can be integrated into customized ML workflows.
Don't miss out the latest hands-on tutorials from our YouTube channel.
How to cite ROBERT
If you use any of the ROBERT modules, please include this citation:
Dalmau, D.; Alegre Requena, J. V. ChemRxiv, 2023, DOI: 10.26434/chemrxiv-2023-k994h.
If you use the AQME module, please include this citation:
Alegre-Requena et al., AQME: Automated Quantum Mechanical Environments for Researchers and Educators. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2023, 13, e1663.
Additionally, please include the corresponding reference for Scikit-learn and SHAP:
Pedregosa et al., Scikit-learn: Machine Learning in Python, J. Mach. Learn. Res. 2011, 12, 2825-2830.
Lundberg et al., From local explanations to global understanding with explainable AI for trees, Nat. Mach. Intell. 2020, 2, 56–67.
Special acknowledgments
J.V.A.R. - The acronym ROBERT is dedicated to ROBERT Paton, who was a mentor to me throughout my years at Colorado State University and who introduced me to the field of cheminformatics. Cheers mate!
D.D.G. - The style of the ROBERT_report.pdf file was created with the help of Oliver Lee (2023, Zysman-Colman group at University of St Andrews).
We really THANK all the testers for their feedback and for participating in the reproducibility tests, including:
David Valiente (2022-2023, Universidad Miguel Hernández)
Heidi Klem (2023, Paton group at Colorado State University)
Iñigo Iribarren (2023, Trujillo group at Trinity College Dublin)
Guilian Luchini (2023, Paton group at Colorado State University)
Alex Platt (2023, Paton group at Colorado State University)
Oliver Lee (2023, Zysman-Colman group at University of St Andrews)
Xinchun Ran (2023, Yang group at Vanderbilt University)