From SMILES to predictors

Overview of the AQME module

Input required

This module uses a CSV file equivalent to an input of the AQME program. This CSV file must include, at least, these three columns:

code_name
smiles
y_NAME (name of the target y values, i.e., solubility)

All additional variables will be kept during the whole ROBERT workflow. By default, the workflow sets:

--ignore "[code_name,smiles]" (variables ignored in the model)

--names code_name (name of the column containing the names of the datapoints)

Automated protocols

When executing the command line python -m robert --aqme [OPTIONS], ROBERT's AQME module connects to the AQME program to perform an initial CSEARCH-RDKit conformer sampling with the following options:

python -m aqme --csearch --program rdkit --input CSV_NAME.csv --sample 5

Then, the AQME program is run again to generate more than 200 RDKit and xTB Boltzmann-averaged molecular descriptors with QDESCP, using the following options:

python -m aqme --qdescp --files "CSEARCH/*.sdf" --program xtb --csv_name CSV_NAME.csv

A CSV file called AQME-ROBERT_CSV_NAME.csv is created in the folder where the command line was executed. Afterwards, ROBERT uses this new CSV file to start a full workflow.

Example

An example is available in Examples/Full workflow from SMILES.