PREDICT (predict external test set and feature importance analysis)

Overview

Previous folder from a GENERATE job.
Robert_example_test.csv: CSV file with data to use as the external test set. The full CSV file can be found in the Examples folder of the ROBERT repository or downloaded here:

Name	Target_values	x1	x2	x3	x4	x5	x6	x7	x8	x9	x10	x11
38	1.854766065	12	110.9270401	70.8240401	Csub-H	89.87553406	49.77253406	1	0	0	0	1
39	2.034511341	11.7	110.6553116	70.25231158	Csub-Csub	78.65235138	55.53135138	1	0	0	0	1
...
45	0.329517076	-101.6	115.2292938	-38.47370618	Csub-O	70.45233154	-65.96866846	0	2	1	3	3
46	1.902644865	4.29	110.7536316	62.94063159	Csub-H	89.6808548	41.8678548	2	0	0	0	2

Instructions:

First, go to the folder where GENERATE was previously run in your terminal. You should see a folder called GENERATE on it.
Run the following command line:

python -m robert --csv_test Robert_example_test.csv --names Name --predict

Options used:

--csv_test Robert_example_test.csv: CSV with the external test set.
--names Name: Name of the column containing the names of the datapoints. This feature allows to print the names of the outlier points (if any).
--predict: Use only the PREDICT module.

Time: ~10 seconds

System: 4 processors (Intel Xeon Ice Lake 8352Y) using 8.0 GB RAM memory

Two graphs, for No_PFI and for PFI (in /PREDICT), with: representation of predictions, SHAP feature analysis, PFI feature analysis and outlier analysis .
Six CSV files with the predictions of each set, for No_PFI and for PFI (in /PREDICT).