Python API (sklearn-style)

RobertModel runs the full ROBERT workflow (CURATE, GENERATE, VERIFY, PREDICT) and exposes fit / predict / score on pandas.DataFrame or numpy.ndarray inputs. PREDICT writes CSV columns aligned with the pipeline:

  • {y}_pred: point prediction from the selected estimator refit on all training data (deployment-style mean).

  • {y}_pred_sd: per-row standard deviation across repeated cross-validation predictions (disagreement between refits on overlapping training folds; related to epistemic instability, not a calibrated predictive distribution).

  • {y}_pred_conformal_hw (regression only): a single symmetric interval half-width from split-style conformal calibration (absolute residuals on a held-out calibration slice of the training set when large enough, otherwise residuals vs CV out-of-fold means). Because the reported point predictor is refit on all training data, finite-sample coverage is approximate at the nominal conformal_coverage (default 0.9); tune with conformal_enable, conformal_calib_frac, and conformal_coverage in RobertModel kwargs. For classification, this column is present but filled with NaN; {y}_pred_sd reflects vote spread across CV refits, not class probabilities.

predict returns {y}_pred values aligned to input rows. Uncertainty:

  • return_std=True is equivalent to return_uncertainty="cv_sd" and returns (y, sd_cv).

  • return_uncertainty="conformal" (regression only) returns (y, half_width).

  • return_uncertainty="both" returns (y, sd_cv, half_width).

  • If both return_std and return_uncertainty are set, return_uncertainty wins and a warning is issued.

Pipeline semantics

  • Single high-level estimator. Encoding and curation happen in CURATE; training matrices are scaled inside ROBERT (StandardScaler on the design matrix in prepare_sets). This is not a composable sklearn Pipeline of separate TransformerMixin steps on RobertModel itself.

  • Do not stack another StandardScaler (or similar) in front of the same raw descriptor table unless you know exactly how it interacts with CURATE outputs; you would usually double-scale or break column semantics.

  • Row order. predict returns one value per input row, aligned to X even if ROBERT writes prediction CSVs in a different row order (alignment uses the names column from CURATE).

Matplotlib

During fit and predict, Matplotlib is switched to the non-interactive Agg backend so plotting does not require a GUI; the prior backend is restored afterward. Figures are still written under workdir like the CLI workflow.

Example

from robert import RobertModel
import pandas as pd

df = pd.read_csv("Robert_example.csv")
X = df.drop(columns=["Target_values"])
y = df["Target_values"]

model = RobertModel(
    problem_type="reg",
    workdir="./robert_run",
    model=["RF"],
    n_iter=2,
    init_points=2,
)
model.fit(X.iloc[:25], y.iloc[:25])
preds = model.predict(X.iloc[25:])
preds, sd_cv = model.predict(X.iloc[25:], return_std=True)
preds2, hw = model.predict(X.iloc[25:], return_uncertainty="conformal")
preds3, sd_cv2, hw2 = model.predict(X.iloc[25:], return_uncertainty="both")
r2 = model.score(X.iloc[25:], y.iloc[25:])
class RobertModel(problem_type: Literal['reg', 'clas'] = 'reg', filter_mode: Literal['pfi', 'no_pfi'] = 'pfi', workdir: str | Path | None = None, names: str | None = None, report: bool = False, seed: int | None = None, y_column: str | None = None, **kwargs: Any)

High-level ROBERT workflow with fit, predict, and score.

Subclasses BaseEstimator for get_params and set_params. Preprocessing (CURATE encoding, ROBERT-internal StandardScaler on the design matrix) runs inside ROBERT; do not stack a separate StandardScaler on the same raw descriptors. Refitting with the same workdir leaves prior outputs on disk. fit and predict change the process working directory and are not safe to run concurrently on multiple instances in one process.

Parameters:
  • problem_type -- "reg" or "clas".

  • filter_mode -- "pfi" or "no_pfi"; which GENERATE/Best_model variant predict loads.

  • workdir -- Directory for ROBERT outputs, or None for a managed temp dir.

  • names -- Column in X used as row identifiers (CURATE names), or None.

  • report -- If True, run REPORT after PREDICT during fit.

  • seed -- Random seed forwarded to ROBERT, or None for package default.

  • y_column -- If fit(X) is called with y is None, name of the target column in X (DataFrame only).

  • kwargs -- Additional ROBERT options (keys in robert.argument_parser.var_dict), e.g. model, n_iter. Regression uncertainty tuning includes conformal_enable, conformal_calib_frac, and conformal_coverage.

cleanup() None

Remove managed temporary workdir, if any. Safe to call multiple times.

get_params(deep: bool = True) dict[str, Any]

Estimator and ROBERT option names from __init__ (no nested estimators).

set_fit_request(*, names: bool | None | str = '$UNCHANGED$') RobertModel

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters

namesstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for names parameter in fit.

Returns

selfobject

The updated object.

set_params(**params: Any) RobertModel

Update RobertModel fields and ROBERT keys accepted by var_dict.

set_predict_request(*, return_std: bool | None | str = '$UNCHANGED$', return_uncertainty: bool | None | str = '$UNCHANGED$') RobertModel

Configure whether metadata should be requested to be passed to the predict method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters

return_stdstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for return_std parameter in predict.

return_uncertaintystr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for return_uncertainty parameter in predict.

Returns

selfobject

The updated object.