Python API (sklearn-style)
RobertModel runs the full ROBERT workflow (CURATE, GENERATE,
VERIFY, PREDICT) and exposes fit / predict / score on
pandas.DataFrame or numpy.ndarray inputs. PREDICT writes CSV
columns aligned with the pipeline:
{y}_pred: point prediction from the selected estimator refit on all training data (deployment-style mean).{y}_pred_sd: per-row standard deviation across repeated cross-validation predictions (disagreement between refits on overlapping training folds; related to epistemic instability, not a calibrated predictive distribution).{y}_pred_conformal_hw(regression only): a single symmetric interval half-width from split-style conformal calibration (absolute residuals on a held-out calibration slice of the training set when large enough, otherwise residuals vs CV out-of-fold means). Because the reported point predictor is refit on all training data, finite-sample coverage is approximate at the nominalconformal_coverage(default 0.9); tune withconformal_enable,conformal_calib_frac, andconformal_coverageinRobertModelkwargs. For classification, this column is present but filled with NaN;{y}_pred_sdreflects vote spread across CV refits, not class probabilities.
predict returns {y}_pred values aligned to input rows. Uncertainty:
return_std=Trueis equivalent toreturn_uncertainty="cv_sd"and returns(y, sd_cv).return_uncertainty="conformal"(regression only) returns(y, half_width).return_uncertainty="both"returns(y, sd_cv, half_width).If both
return_stdandreturn_uncertaintyare set,return_uncertaintywins and a warning is issued.
Pipeline semantics
Single high-level estimator. Encoding and curation happen in CURATE; training matrices are scaled inside ROBERT (
StandardScaleron the design matrix inprepare_sets). This is not a composable sklearnPipelineof separateTransformerMixinsteps onRobertModelitself.Do not stack another
StandardScaler(or similar) in front of the same raw descriptor table unless you know exactly how it interacts with CURATE outputs; you would usually double-scale or break column semantics.Row order.
predictreturns one value per input row, aligned toXeven if ROBERT writes prediction CSVs in a different row order (alignment uses the names column from CURATE).
Matplotlib
During fit and predict, Matplotlib is switched to the non-interactive Agg
backend so plotting does not require a GUI; the prior backend is restored afterward.
Figures are still written under workdir like the CLI workflow.
Example
from robert import RobertModel
import pandas as pd
df = pd.read_csv("Robert_example.csv")
X = df.drop(columns=["Target_values"])
y = df["Target_values"]
model = RobertModel(
problem_type="reg",
workdir="./robert_run",
model=["RF"],
n_iter=2,
init_points=2,
)
model.fit(X.iloc[:25], y.iloc[:25])
preds = model.predict(X.iloc[25:])
preds, sd_cv = model.predict(X.iloc[25:], return_std=True)
preds2, hw = model.predict(X.iloc[25:], return_uncertainty="conformal")
preds3, sd_cv2, hw2 = model.predict(X.iloc[25:], return_uncertainty="both")
r2 = model.score(X.iloc[25:], y.iloc[25:])
- class RobertModel(problem_type: Literal['reg', 'clas'] = 'reg', filter_mode: Literal['pfi', 'no_pfi'] = 'pfi', workdir: str | Path | None = None, names: str | None = None, report: bool = False, seed: int | None = None, y_column: str | None = None, **kwargs: Any)
High-level ROBERT workflow with
fit,predict, andscore.Subclasses
BaseEstimatorforget_paramsandset_params. Preprocessing (CURATE encoding, ROBERT-internalStandardScaleron the design matrix) runs inside ROBERT; do not stack a separateStandardScaleron the same raw descriptors. Refitting with the sameworkdirleaves prior outputs on disk.fitandpredictchange the process working directory and are not safe to run concurrently on multiple instances in one process.- Parameters:
problem_type --
"reg"or"clas".filter_mode --
"pfi"or"no_pfi"; whichGENERATE/Best_modelvariantpredictloads.workdir -- Directory for ROBERT outputs, or
Nonefor a managed temp dir.names -- Column in
Xused as row identifiers (CURATEnames), orNone.report -- If
True, run REPORT after PREDICT duringfit.seed -- Random seed forwarded to ROBERT, or
Nonefor package default.y_column -- If
fit(X)is called withy is None, name of the target column inX(DataFrame only).kwargs -- Additional ROBERT options (keys in
robert.argument_parser.var_dict), e.g.model,n_iter. Regression uncertainty tuning includesconformal_enable,conformal_calib_frac, andconformal_coverage.
- cleanup() None
Remove managed temporary workdir, if any. Safe to call multiple times.
- get_params(deep: bool = True) dict[str, Any]
Estimator and ROBERT option names from
__init__(no nested estimators).
- set_fit_request(*, names: bool | None | str = '$UNCHANGED$') RobertModel
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Parameters
- namesstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
namesparameter infit.
Returns
- selfobject
The updated object.
- set_params(**params: Any) RobertModel
Update
RobertModelfields and ROBERT keys accepted byvar_dict.
- set_predict_request(*, return_std: bool | None | str = '$UNCHANGED$', return_uncertainty: bool | None | str = '$UNCHANGED$') RobertModel
Configure whether metadata should be requested to be passed to the
predictmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed topredictif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it topredict.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Parameters
- return_stdstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
return_stdparameter inpredict.- return_uncertaintystr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
return_uncertaintyparameter inpredict.
Returns
- selfobject
The updated object.