Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grouping and other changes #273

Merged
merged 3 commits into from
Nov 11, 2020
Merged

Grouping and other changes #273

merged 3 commits into from
Nov 11, 2020

Conversation

kbattocchi
Copy link
Collaborator

Enabling grouping, robust linear final models.

@kbattocchi kbattocchi force-pushed the kebatt/robust branch 3 times, most recently from 1e9b728 to b57fe03 Compare August 28, 2020 16:31
@kbattocchi kbattocchi requested a review from heimengqi August 28, 2020 19:32
@kbattocchi kbattocchi marked this pull request as ready for review October 31, 2020 05:22
@kbattocchi kbattocchi force-pushed the kebatt/robust branch 2 times, most recently from 1a07974 to 9d29e1b Compare October 31, 2020 07:13
@moprescu moprescu self-requested a review November 3, 2020 16:12
self.z_transformer = None

if self._n_splits == 1: # special case, no cross validation
folds = None
else:
splitter = check_cv(self._n_splits, [0], classifier=stratify)
# if check_cv produced a new KFold or StratifiedKFold object, we need to set shuffle and random_state
# TODO: ideally, we'd also infer whether we need a GroupKFold (if groups are passed)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grouping is more important than stratification for valid inference. So I would prioritize grouping over stratification here, i.e. if groups are enabled then use groupkfold. If not then use stratified if strata is not None else kfold.

Also we should most prob be raising a warning that "cross fitting performed without treatment stratification because grouping was enabled."

Ultimately I feel we should just add our own stratified group kfold that stratifies within each group, so that we really deliver the full version of our API.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With small sample sizes, failure to stratify can cause first stage model prediction to fail if no examples from one strata make it into a training fold. I agree, though, that we ought to have a mechanism that supports both simultaneously; there is work in progress to add such a feature to sklearn natively.

@kbattocchi kbattocchi force-pushed the kebatt/robust branch 2 times, most recently from 3e890fd to a81342d Compare November 10, 2020 15:07
@kbattocchi kbattocchi merged commit 9cc7b2e into master Nov 11, 2020
@kbattocchi kbattocchi deleted the kebatt/robust branch November 11, 2020 14:30
@vsyrgkanis vsyrgkanis added the enhancement New feature or request label Nov 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants