CURATE (data curation)

Overview

curate

Required inputs

Name

Target_values

x1

x2

x3

x4

x5

x6

x7

x8

x9

x10

x11

1

1.854766065

12

110.9270401

70.8240401

Csub-H

89.87553406

49.77253406

1

0

0

0

1

2

2.034511341

11.7

110.6553116

70.25231158

Csub-Csub

78.65235138

55.53135138

1

0

0

0

1

...

36

0.321084552

-101.6

110.7593079

-42.94369214

Csub-O

59.81459808

-76.60640192

0

2

1

3

3

37

0.329517076

-101.6

115.2292938

-38.47370618

Csub-O

70.45233154

-65.96866846

0

2

1

3

3

Executing the job

Instructions:

  1. First, go to the folder containing the CSV files in your terminal.

  2. Run the following command line:

python -m robert --names Name --y Target_values --csv_name Robert_example.csv --curate

Options used:

  • --names Name: Name of the column containing the names of the datapoints.

  • --y Target_values: Name of the column containing the response y values.

  • --csv_name Robert_example.csv: CSV with the data to curate.

  • --curate: Use only the CURATE module.

Execution time

Time: ~5 seconds

System: 4 processors (Intel Xeon Ice Lake 8352Y) using 8.0 GB RAM memory

Results

  • A CSV file containing the curated database (Robert_example_CURATE.csv) should be created inside the CURATE folder.