CURATE (data curation)
Overview
Required inputs
Robert_example.csv: CSV file with data to curate. The full CSV file can be found in the Examples folder of the ROBERT repository or downloaded here:

Name |
Target_values |
x1 |
x2 |
x3 |
x4 |
x5 |
x6 |
x7 |
x8 |
x9 |
x10 |
x11 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 |
1.854766065 |
12 |
110.9270401 |
70.8240401 |
Csub-H |
89.87553406 |
49.77253406 |
1 |
0 |
0 |
0 |
1 |
2 |
2.034511341 |
11.7 |
110.6553116 |
70.25231158 |
Csub-Csub |
78.65235138 |
55.53135138 |
1 |
0 |
0 |
0 |
1 |
... |
||||||||||||
36 |
0.321084552 |
-101.6 |
110.7593079 |
-42.94369214 |
Csub-O |
59.81459808 |
-76.60640192 |
0 |
2 |
1 |
3 |
3 |
37 |
0.329517076 |
-101.6 |
115.2292938 |
-38.47370618 |
Csub-O |
70.45233154 |
-65.96866846 |
0 |
2 |
1 |
3 |
3 |
Executing the job
Instructions:
First, go to the folder containing the CSV files in your terminal.
Run the following command line:
python -m robert --names Name --y Target_values --csv_name Robert_example.csv --curate
Options used:
--names Name: Name of the column containing the names of the datapoints.--y Target_values: Name of the column containing the response y values.--csv_name Robert_example.csv: CSV with the data to curate.--curate: Use only the CURATE module.
Execution time
Time: ~5 seconds
System: 4 processors (Intel Xeon Ice Lake 8352Y) using 8.0 GB RAM memory
Results
A CSV file containing the curated database (Robert_example_CURATE.csv) should be created inside the CURATE folder.
