docs: Example notebook of VW vs LightGBM #641

loomlike · 2019-08-05T19:37:00Z

Regression example of Vowpal Wabbit, comparing with MMLSpark LightGBM and Spark MLlib Linear Regressor.

Regression problem example comparing Vowpal Wabbit vs LightGBM vs Linear Regressor (Spark MLlib) Please enter the commit message for your changes. Lines starting

welcome · 2019-08-05T19:37:04Z

💖 Thanks for opening your first pull request! 💖 We use semantic commit messages to streamline the release process. Before your pull request can be merged, you should make sure your first commit and PR title start with a semantic prefix. This helps us to create release messages and credit you for your hard work!
Examples of commit messages with semantic prefixes: - fix: Fix LightGBM crashes with empty partitions - feat: Make HTTP on Spark back-offs configurable - docs: Update Spark Serving usage - build: Add codecov support - perf: improve LightGBM memory usage - refactor: make python code generation rely on classes - style: Remove nulls from CNTKModel - test: Add test coverage for CNTKModel
Make sure to check out the developer guide for guidance on testing your change.

eisber · 2019-08-05T20:07:28Z

/azp run

azure-pipelines · 2019-08-05T20:07:39Z

Azure Pipelines successfully started running 1 pipeline(s).

codecov · 2019-08-05T20:18:09Z

Codecov Report

Merging #641 into master will increase coverage by 9.67%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master     #641      +/-   ##
==========================================
+ Coverage    70.1%   79.77%   +9.67%     
==========================================
  Files         229      229              
  Lines        9154     9154              
  Branches      478      478              
==========================================
+ Hits         6417     7303     +886     
+ Misses       2737     1851     -886

Impacted Files	Coverage Δ
.../execution/streaming/continuous/HTTPSourceV2.scala	`93.04% <0%> (-0.37%)`	⬇️
...scala/com/microsoft/ml/spark/io/http/Parsers.scala	`75% <0%> (+1.04%)`	⬆️
.../microsoft/ml/spark/core/schema/Categoricals.scala	`86.45% <0%> (+3.12%)`	⬆️
...com/microsoft/ml/spark/core/contracts/Params.scala	`91.48% <0%> (+4.25%)`	⬆️
...la/com/microsoft/ml/spark/io/http/HTTPSchema.scala	`88.48% <0%> (+4.31%)`	⬆️
...a/com/microsoft/ml/spark/io/http/HTTPClients.scala	`54.71% <0%> (+5.66%)`	⬆️
...om/microsoft/ml/spark/featurize/ValueIndexer.scala	`77.61% <0%> (+5.97%)`	⬆️
...a/com/microsoft/ml/spark/featurize/Featurize.scala	`96.96% <0%> (+6.06%)`	⬆️
...osoft/ml/spark/io/http/SimpleHTTPTransformer.scala	`91.93% <0%> (+8.06%)`	⬆️
...n/scala/org/apache/spark/ml/param/ArrayParam.scala	`70% <0%> (+20%)`	⬆️
... and 23 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 163dead...fd3c057. Read the comment docs.

mhamilton723 · 2019-08-28T17:28:59Z

/azp run

azure-pipelines · 2019-08-28T17:29:12Z

Azure Pipelines successfully started running 1 pipeline(s).

mhamilton723

@loomlike @eisber its hard to comment directly on the NB so ill list them here:

Can you move to the Dataframe API explicitly without the flatmaps? generally you shouldnt need to use that soon to be deprecated API.

No need for matplotlib inline

Can you add the image in the results in markdown for those to compare?

- Remove rdd operations - Remove inline plot command - Add plot screenshot to markdown cell

loomlike · 2019-09-01T05:15:48Z

@mhamilton723 Thank you for the review. I updated the notebook accordingly.

mhamilton723 · 2019-09-04T15:06:56Z

/azp run

azure-pipelines · 2019-09-04T15:07:11Z

Azure Pipelines successfully started running 1 pipeline(s).

mhamilton723 · 2019-09-05T13:29:04Z

notebooks/samples/Regression - Vowpal Wabbit vs. LightGBM vs. Linear Regressor.ipynb

+    "    for icol in range(ncols):\n",
+    "        try:\n",
+    "            feat = features[irow*ncols + icol]\n",
+    "            xx = [r[feat] for r in train_data.select(feat).collect()]\n",


its much more efficient to select the columns in spark, then call .toPandas on the resulting dataframe. This will then make it in a nice form for plotting

mhamilton723

Hey, theres one more small perf thing I have found. No worries if you dont want to fix it though. Tag me when ready to merge (and poke me on teams for faster merging :))

loomlike · 2019-09-06T03:31:42Z

@mhamilton723 Good catch! yeah that code did collect() for every feature which was not necessary. Thanks for the review. I just made changes and push them.

mhamilton723 · 2019-09-06T14:50:13Z

/azp run

azure-pipelines · 2019-09-06T14:50:28Z

Azure Pipelines successfully started running 1 pipeline(s).

mhamilton723 · 2019-09-08T20:10:18Z

/azp run

azure-pipelines · 2019-09-08T20:10:31Z

Azure Pipelines successfully started running 1 pipeline(s).

eisber · 2019-09-09T10:11:03Z

/azp run

azure-pipelines · 2019-09-09T10:11:15Z

Azure Pipelines successfully started running 1 pipeline(s).

imatiach-msft · 2019-10-11T04:01:44Z

@loomlike would you be able to update the PR to latest? the lightgbm bug should be fixed now.

imatiach-msft · 2019-10-11T04:02:07Z

/azp run

azure-pipelines · 2019-10-11T04:02:17Z

Azure Pipelines successfully started running 1 pipeline(s).

imatiach-msft

LGTM

welcome · 2019-10-11T15:00:19Z

Congrats on merging your first pull request, we appreciate your support! 🎉🎉🎉

loomlike added 2 commits August 5, 2019 18:19

Example notebook of VW vs LightGBM

f12deee

Regression problem example comparing Vowpal Wabbit vs LightGBM vs Linear Regressor (Spark MLlib) Please enter the commit message for your changes. Lines starting

Cleanup cells

6d3afc8

loomlike requested review from drdarshan and mhamilton723 as code owners August 5, 2019 19:37

Merge branch 'master' into loomlike/vw-regression

f4204b5

Merge branch 'master' into loomlike/vw-regression

c8189ab

mhamilton723 requested changes Aug 28, 2019

View reviewed changes

loomlike added 2 commits September 1, 2019 02:51

Merge remote-tracking branch 'upstream/master'

21cc66f

Merge branch 'master' into loomlike/vw-regression

e78120f

loomlike mentioned this pull request Sep 1, 2019

LightGBMRegressor predicts the same value for all samples on Databricks 5.5 LTS ML #680

Closed

Address comments

05ed02f

- Remove rdd operations - Remove inline plot command - Add plot screenshot to markdown cell

Merge branch 'master' into loomlike/vw-regression

7ffee21

mhamilton723 reviewed Sep 5, 2019

View reviewed changes

mhamilton723 requested changes Sep 5, 2019

View reviewed changes

Simplify dataframe conversion into array

8918b25

imatiach-msft assigned imatiach-msft and unassigned imatiach-msft Sep 12, 2019

Merge branch 'master' into loomlike/vw-regression

fd3c057

imatiach-msft changed the title ~~Example notebook of VW vs LightGBM~~ docs: Example notebook of VW vs LightGBM Oct 11, 2019

imatiach-msft requested a review from mhamilton723 October 11, 2019 14:27

imatiach-msft approved these changes Oct 11, 2019

View reviewed changes

mhamilton723 merged commit 6b07829 into microsoft:master Oct 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: Example notebook of VW vs LightGBM #641

docs: Example notebook of VW vs LightGBM #641

loomlike commented Aug 5, 2019

welcome bot commented Aug 5, 2019

eisber commented Aug 5, 2019

azure-pipelines bot commented Aug 5, 2019

codecov bot commented Aug 5, 2019 •

edited

Loading

mhamilton723 commented Aug 28, 2019

azure-pipelines bot commented Aug 28, 2019

mhamilton723 left a comment •

edited

Loading

loomlike commented Sep 1, 2019

mhamilton723 commented Sep 4, 2019

azure-pipelines bot commented Sep 4, 2019

mhamilton723 Sep 5, 2019

mhamilton723 left a comment

loomlike commented Sep 6, 2019

mhamilton723 commented Sep 6, 2019

azure-pipelines bot commented Sep 6, 2019

mhamilton723 commented Sep 8, 2019

azure-pipelines bot commented Sep 8, 2019

eisber commented Sep 9, 2019

azure-pipelines bot commented Sep 9, 2019

imatiach-msft commented Oct 11, 2019

imatiach-msft commented Oct 11, 2019

azure-pipelines bot commented Oct 11, 2019

imatiach-msft left a comment

welcome bot commented Oct 11, 2019

docs: Example notebook of VW vs LightGBM #641

docs: Example notebook of VW vs LightGBM #641

Conversation

loomlike commented Aug 5, 2019

welcome bot commented Aug 5, 2019

eisber commented Aug 5, 2019

azure-pipelines bot commented Aug 5, 2019

codecov bot commented Aug 5, 2019 • edited Loading

Codecov Report

mhamilton723 commented Aug 28, 2019

azure-pipelines bot commented Aug 28, 2019

mhamilton723 left a comment • edited Loading

Choose a reason for hiding this comment

loomlike commented Sep 1, 2019

mhamilton723 commented Sep 4, 2019

azure-pipelines bot commented Sep 4, 2019

mhamilton723 Sep 5, 2019

Choose a reason for hiding this comment

mhamilton723 left a comment

Choose a reason for hiding this comment

loomlike commented Sep 6, 2019

mhamilton723 commented Sep 6, 2019

azure-pipelines bot commented Sep 6, 2019

mhamilton723 commented Sep 8, 2019

azure-pipelines bot commented Sep 8, 2019

eisber commented Sep 9, 2019

azure-pipelines bot commented Sep 9, 2019

imatiach-msft commented Oct 11, 2019

imatiach-msft commented Oct 11, 2019

azure-pipelines bot commented Oct 11, 2019

imatiach-msft left a comment

Choose a reason for hiding this comment

welcome bot commented Oct 11, 2019

codecov bot commented Aug 5, 2019 •

edited

Loading

mhamilton723 left a comment •

edited

Loading