-
Notifications
You must be signed in to change notification settings - Fork 741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Grouping and other changes #273
Conversation
1e9b728
to
b57fe03
Compare
6d2a999
to
cb53b5c
Compare
1a07974
to
9d29e1b
Compare
self.z_transformer = None | ||
|
||
if self._n_splits == 1: # special case, no cross validation | ||
folds = None | ||
else: | ||
splitter = check_cv(self._n_splits, [0], classifier=stratify) | ||
# if check_cv produced a new KFold or StratifiedKFold object, we need to set shuffle and random_state | ||
# TODO: ideally, we'd also infer whether we need a GroupKFold (if groups are passed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grouping is more important than stratification for valid inference. So I would prioritize grouping over stratification here, i.e. if groups are enabled then use groupkfold. If not then use stratified if strata is not None else kfold.
Also we should most prob be raising a warning that "cross fitting performed without treatment stratification because grouping was enabled."
Ultimately I feel we should just add our own stratified group kfold that stratifies within each group, so that we really deliver the full version of our API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With small sample sizes, failure to stratify can cause first stage model prediction to fail if no examples from one strata make it into a training fold. I agree, though, that we ought to have a mechanism that supports both simultaneously; there is work in progress to add such a feature to sklearn natively.
3e890fd
to
a81342d
Compare
a81342d
to
b95b96a
Compare
Enabling grouping, robust linear final models.