-
Notifications
You must be signed in to change notification settings - Fork 842
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: Example notebook of VW vs LightGBM #641
docs: Example notebook of VW vs LightGBM #641
Conversation
Regression problem example comparing Vowpal Wabbit vs LightGBM vs Linear Regressor (Spark MLlib) Please enter the commit message for your changes. Lines starting
💖 Thanks for opening your first pull request! 💖 We use semantic commit messages to streamline the release process. Before your pull request can be merged, you should make sure your first commit and PR title start with a semantic prefix. This helps us to create release messages and credit you for your hard work! |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
Codecov Report
@@ Coverage Diff @@
## master #641 +/- ##
==========================================
+ Coverage 70.1% 79.77% +9.67%
==========================================
Files 229 229
Lines 9154 9154
Branches 478 478
==========================================
+ Hits 6417 7303 +886
+ Misses 2737 1851 -886
Continue to review full report at Codecov.
|
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@loomlike @eisber its hard to comment directly on the NB so ill list them here:
Can you move to the Dataframe API explicitly without the flatmaps? generally you shouldnt need to use that soon to be deprecated API.
No need for matplotlib inline
Can you add the image in the results in markdown for those to compare?
- Remove rdd operations - Remove inline plot command - Add plot screenshot to markdown cell
@mhamilton723 Thank you for the review. I updated the notebook accordingly. |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
" for icol in range(ncols):\n", | ||
" try:\n", | ||
" feat = features[irow*ncols + icol]\n", | ||
" xx = [r[feat] for r in train_data.select(feat).collect()]\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its much more efficient to select the columns in spark, then call .toPandas
on the resulting dataframe. This will then make it in a nice form for plotting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, theres one more small perf thing I have found. No worries if you dont want to fix it though. Tag me when ready to merge (and poke me on teams for faster merging :))
@mhamilton723 Good catch! yeah that code did |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
@loomlike would you be able to update the PR to latest? the lightgbm bug should be fixed now. |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Congrats on merging your first pull request, we appreciate your support! 🎉🎉🎉 |
Regression example of Vowpal Wabbit, comparing with MMLSpark LightGBM and Spark MLlib Linear Regressor.