Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LightGBMRegressor predicts the same value for all samples on Databricks 5.5 LTS ML #680

Closed
loomlike opened this issue Sep 1, 2019 · 1 comment

Comments

@loomlike
Copy link
Contributor

loomlike commented Sep 1, 2019

Describe the bug
On Databricks 5.5 LTS ML (Spark 2.4.3, Scala 2.11), LightGBMRegressor produces the same values for all the samples like:

+------+--------------------+------------------+
|target|            features|        prediction|
+------+--------------------+------------------+
|  24.0|[0.00632,18.0,2.3...|18.228695330412492|
|  32.2|[0.00906,90.0,2.9...|18.228695330412492|
|  22.0|[0.01096,55.0,2.2...|18.228695330412492|
|  32.7|[0.01301,35.0,1.5...|18.228695330412492|
|  35.4|[0.01311,90.0,1.2...|18.228695330412492|
|  18.9|[0.0136,75.0,4.0,...|18.228695330412492|
|  50.0|[0.01381,80.0,0.4...|18.228695330412492|
|  31.6|[0.01432,100.0,1....|18.228695330412492|

If I run the exact same codes on Databricks 5.3 (Spark 2.4.0, Scala 2.11), it returns correct predictions:

+------+--------------------+------------------+
|target|            features|        prediction|
+------+--------------------+------------------+
|  24.0|[0.00632,18.0,2.3...| 24.17294548841416|
|  32.2|[0.00906,90.0,2.9...| 30.26500328960329|
|  22.0|[0.01096,55.0,2.2...|22.371702836613604|
|  32.7|[0.01301,35.0,1.5...| 32.76094906901519|
|  35.4|[0.01311,90.0,1.2...| 35.57861891896448|
|  18.9|[0.0136,75.0,4.0,...|18.686677511038454|
|  50.0|[0.01381,80.0,0.4...|44.583744699278974|
|  31.6|[0.01432,100.0,1....| 30.66843269977932|

FYI, I'm using mmlspark_2.11-0.18.1.

To Reproduce
Here are the code I used to init LightGBMRegressor:

lgr = LightGBMRegressor(
  objective='quantile',
  alpha=0.2,
  learningRate=0.3,
  numLeaves=31,
  labelCol='target',
  numIterations=100,
)

Expected behavior
Run correctly on Databricks ML versions.

Info (please complete the following information):

  • MMLSpark Version: mmlspark_2.11-0.18.1
  • Spark Version: 2.4.3
  • Spark Platform: Databricks 5.5 LTS ML

Additional context
Found this bug while testing #641 notebook.

@imatiach-msft
Copy link
Contributor

fixed in latest (validated that I could repro with 0.18.1 version and fixed on latest master), seems to be due to the caching issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants