-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modify API for FeaturizeText ? #2460
Comments
The TextFeaturizingEstimator was modeled like the learners. IMO should follow the same pattern of having Options.. |
@sfilipi . Just to clarify .. we do follow the pattern of passing The issue was created to discuss your comment of whether we should use |
Last I looked, we also couldn't change the ngram/chargram lengths, which is quite odd as this is the main text transform hyperparameter which shows gains on text problems. We should be encouraging users to modify this hyperparameter. You can see a good chart of the sensitivity to ngram/chargran lengths in #2305:
|
#838 related? |
@shauheen . This is not related to #838 . Both #838 and Justin's comment above are slightly orthogonal to the original issue. The In one of the recent PRs, @sfilipi was curious if we should change the API signature to take in the inputColumnNames as I suggest we keep the API as-is .
@sfilipi . Should I close this issue ? |
@abgoswam 's suggestion sounds reasonable to me. Making it |
@wschin .. could you give an example. not sure what you mean by output names being a |
There are one-to-one mapping, many-to-one mapping, and many-to-many maaping. If we use |
In the MLContext for the text featurizer, the input column names are taken as a IEnumerable
machinelearning/src/Microsoft.ML.Transforms/Text/TextCatalog.cs
Lines 43 to 46 in 834e471
#2394 (comment) recommends making them params instead.
Should we modify this API ?
@sfilipi
The text was updated successfully, but these errors were encountered: