-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CV macro with stratification column doesn't work #213
Conversation
var importInput = new ML.Data.TextLoader(dataPath); | ||
importInput.Arguments.Column = new ML.Data.TextLoaderColumn[] | ||
{ | ||
new ML.Data.TextLoaderColumn { Name = "Label", Source = new[] { new ML.Data.TextLoaderRange { Min = 0, Max = 0 } }, Type = DataKind.R4 }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ML.Data.TextLoaderRange [](start = 88, length = 23)
I think for these scalar columns you can just use the simpler new ML.Data.TextLoaderRange(0)
constructor. #Resolved
I see why this is a necessary change, but is there an issue we could link, possibly with more details of the problem? Otherwise looks good. Edit: Now referring to an issue thanks Yael! #Resolved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
* Reduce number of hash bits in stratification column and add a unit test. * Address PR comments.
The stratification column is being hashed to too many hash bits, and the RangeFilter that does the stratified split can't do the split. We reduce number of hash bits in stratification column so that the RangeFilter can split the data.
Fixes #182 .