PREDICT (predict external test set and feature importance analysis)

Overview

predict

Required inputs

  • Previous folder from a GENERATE job.

  • Robert_example_test.csv: CSV file with data to use as the external test set. The full CSV file can be found in the Examples folder of the ROBERT repository or downloaded here: csv_FW_test

Name

Target_values

x1

x2

x3

x4

x5

x6

x7

x8

x9

x10

x11

38

1.854766065

12

110.9270401

70.8240401

Csub-H

89.87553406

49.77253406

1

0

0

0

1

39

2.034511341

11.7

110.6553116

70.25231158

Csub-Csub

78.65235138

55.53135138

1

0

0

0

1

...

45

0.329517076

-101.6

115.2292938

-38.47370618

Csub-O

70.45233154

-65.96866846

0

2

1

3

3

46

1.902644865

4.29

110.7536316

62.94063159

Csub-H

89.6808548

41.8678548

2

0

0

0

2

Executing the job

Instructions:

  1. First, go to the folder where GENERATE was previously run in your terminal. You should see a folder called GENERATE on it.

  2. Run the following command line:

python -m robert --csv_test Robert_example_test.csv --names Name --predict

Options used:

  • --csv_test Robert_example_test.csv: CSV with the external test set.

  • --names Name: Name of the column containing the names of the datapoints. This feature allows to print the names of the outlier points (if any).

  • --predict: Use only the PREDICT module.

Execution time

Time: ~10 seconds

System: 4 processors (Intel Xeon Ice Lake 8352Y) using 8.0 GB RAM memory

Results

  • Two graphs, for No_PFI and for PFI (in /PREDICT), with: representation of predictions, SHAP feature analysis, PFI feature analysis and outlier analysis .

  • Six CSV files with the predictions of each set, for No_PFI and for PFI (in /PREDICT).