Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CV macro with stratification column doesn't work #213

Merged
merged 2 commits into from
May 23, 2018

Conversation

yaeldekel
Copy link

@yaeldekel yaeldekel commented May 23, 2018

The stratification column is being hashed to too many hash bits, and the RangeFilter that does the stratified split can't do the split. We reduce number of hash bits in stratification column so that the RangeFilter can split the data.
Fixes #182 .

@yaeldekel yaeldekel requested review from codemzs and TomFinley May 23, 2018 15:50
var importInput = new ML.Data.TextLoader(dataPath);
importInput.Arguments.Column = new ML.Data.TextLoaderColumn[]
{
new ML.Data.TextLoaderColumn { Name = "Label", Source = new[] { new ML.Data.TextLoaderRange { Min = 0, Max = 0 } }, Type = DataKind.R4 },
Copy link
Contributor

@TomFinley TomFinley May 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ML.Data.TextLoaderRange [](start = 88, length = 23)

I think for these scalar columns you can just use the simpler new ML.Data.TextLoaderRange(0) constructor. #Resolved

@TomFinley
Copy link
Contributor

TomFinley commented May 23, 2018

I see why this is a necessary change, but is there an issue we could link, possibly with more details of the problem?

Otherwise looks good.

Edit: Now referring to an issue thanks Yael! #Resolved

Copy link
Contributor

@TomFinley TomFinley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Member

@codemzs codemzs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Member

@codemzs codemzs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@yaeldekel yaeldekel merged commit 73d894b into dotnet:master May 23, 2018
@yaeldekel yaeldekel deleted the cvstratification branch May 23, 2018 18:30
eerhardt pushed a commit to eerhardt/machinelearning that referenced this pull request Jul 27, 2018
* Reduce number of hash bits in stratification column and add a unit test.

* Address PR comments.
@ghost ghost locked as resolved and limited conversation to collaborators Mar 30, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants