PREDICT (predict external test set and feature importance analysis)
Overview
Required inputs
Previous folder from a GENERATE job.
Robert_example_test.csv: CSV file with data to use as the external test set. The full CSV file can be found in the Examples folder of the ROBERT repository or downloaded here:

Name |
Target_values |
x1 |
x2 |
x3 |
x4 |
x5 |
x6 |
x7 |
x8 |
x9 |
x10 |
x11 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
38 |
1.854766065 |
12 |
110.9270401 |
70.8240401 |
Csub-H |
89.87553406 |
49.77253406 |
1 |
0 |
0 |
0 |
1 |
39 |
2.034511341 |
11.7 |
110.6553116 |
70.25231158 |
Csub-Csub |
78.65235138 |
55.53135138 |
1 |
0 |
0 |
0 |
1 |
... |
||||||||||||
45 |
0.329517076 |
-101.6 |
115.2292938 |
-38.47370618 |
Csub-O |
70.45233154 |
-65.96866846 |
0 |
2 |
1 |
3 |
3 |
46 |
1.902644865 |
4.29 |
110.7536316 |
62.94063159 |
Csub-H |
89.6808548 |
41.8678548 |
2 |
0 |
0 |
0 |
2 |
Executing the job
Instructions:
First, go to the folder where GENERATE was previously run in your terminal. You should see a folder called GENERATE on it.
Run the following command line:
python -m robert --csv_test Robert_example_test.csv --names Name --predict
Options used:
--csv_test Robert_example_test.csv: CSV with the external test set.--names Name: Name of the column containing the names of the datapoints. This feature allows to print the names of the outlier points (if any).--predict: Use only the PREDICT module.
Execution time
Time: ~10 seconds
System: 4 processors (Intel Xeon Ice Lake 8352Y) using 8.0 GB RAM memory
Results
Two graphs, for No_PFI and for PFI (in /PREDICT), with: representation of predictions, SHAP feature analysis, PFI feature analysis and outlier analysis .
Six CSV files with the predictions of each set, for No_PFI and for PFI (in /PREDICT).
