Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add featuresShapCol to LightGBMClassifierModel #863

Merged
merged 9 commits into from
May 25, 2020
Merged

feat: add featuresShapCol to LightGBMClassifierModel #863

merged 9 commits into from
May 25, 2020

Conversation

ocworld
Copy link
Contributor

@ocworld ocworld commented May 19, 2020

In this pr, featuresShapCol is added to LightGBMClassifierModel.

@ocworld
Copy link
Contributor Author

ocworld commented May 19, 2020

@AhnLab-OSS

@ekerazha
Copy link

So we have SHAP for LightGBM Ranker and Classifier? Is it now missing only for Regressor?

@imatiach-msft
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

imatiach-msft
imatiach-msft previously approved these changes May 19, 2020
Copy link
Contributor

@imatiach-msft imatiach-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great except for a few minor changes!

@ocworld
Copy link
Contributor Author

ocworld commented May 19, 2020

@ekerazha I assume that except regression SHAP is supported.
Please note that it has different usages for getting SHAP between LightGBM Ranker and Classifier in this pr.

@imatiach-msft
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@codecov
Copy link

codecov bot commented May 19, 2020

Codecov Report

Merging #863 into master will increase coverage by 0.03%.
The diff coverage is 88.88%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #863      +/-   ##
==========================================
+ Coverage   85.12%   85.16%   +0.03%     
==========================================
  Files         187      187              
  Lines        8668     8681      +13     
  Branches      520      507      -13     
==========================================
+ Hits         7379     7393      +14     
+ Misses       1289     1288       -1     
Impacted Files Coverage Δ
...crosoft/ml/spark/lightgbm/LightGBMClassifier.scala 90.09% <87.50%> (+1.46%) ⬆️
.../microsoft/ml/spark/lightgbm/LightGBMBooster.scala 87.87% <100.00%> (+1.51%) ⬆️
.../execution/streaming/continuous/HTTPSourceV2.scala 93.04% <0.00%> (-0.37%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 82e7a8e...170e187. Read the comment docs.

@imatiach-msft
Copy link
Contributor

hmm seems the test is failing in the native code:

[info] + Test Verify LightGBM Classifier features shap took 0.583s 
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f4ea7779ef2, pid=7254, tid=0x00007f4ea4263700
#
# JRE version: OpenJDK Runtime Environment () (8.0_252-b14) (build 1.8.0_252-b14)
# Java VM: OpenJDK 64-Bit Server VM (25.252-b14 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libc.so.6+0x7fef2]
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /home/vsts/work/1/s/hs_err_pid7254.log
[thread 139974389262080 also had an error]
#
# If you would like to submit a bug report, please visit:
#   http://www.azulsystems.com/support/
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
timeout: the monitored command dumped core
/home/vsts/work/_temp/azureclitaskscript1589903456723.sh: line 6:  7253 Aborted

@imatiach-msft
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@imatiach-msft
Copy link
Contributor

running another build again, will try to run this branch locally also

@imatiach-msft
Copy link
Contributor

@ocworld great news, I was able to run your code locally and I discovered the issue - SHAP returns the expected value in addition to the feature contributions, which led to memory corruption. Although I was not able to get the SHAP test to fail, I did see another test fail due to memory corruption from the SHAP test, and once I fixed the issue I validated that the memory corruption error is no longer occurring. I pushed the fix directly to your branch, hope you don't mind.

@imatiach-msft
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

imatiach-msft
imatiach-msft previously approved these changes May 25, 2020
@imatiach-msft
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

imatiach-msft
imatiach-msft previously approved these changes May 25, 2020
@imatiach-msft
Copy link
Contributor

missed updating another test in ranker which was checking the number of features, fixed

@imatiach-msft
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@imatiach-msft
Copy link
Contributor

found another minor issue where the new column name was missing in the objects to save which caused a serialization failure, fixed, hopefully build should be green now

@imatiach-msft
Copy link
Contributor

once PR is merged I'll send a corresponding PR for
1.) pyspark API
2.) lightgbm regressor
to make it consistent across codebase

@imatiach-msft
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mhamilton723 mhamilton723 merged commit 6bb4a45 into microsoft:master May 25, 2020
ocworld pushed a commit to AhnLab-OSS/mmlspark that referenced this pull request May 25, 2020
feat: add featuresShapCol to LightGBMClassifierModel (microsoft#863)

Co-authored-by: Fokko Driesprong <fokko@apache.org>
Co-authored-by: Ilya Matiach <ilmat@microsoft.com>
@imatiach-msft
Copy link
Contributor

@ocworld I noticed this PR won't work for multiclass case, since multiclass shap values are of shape (# features + 1 expected_value) * # classes, but we currently aren't making the array the size of # classes. I can fix this too when I send out the regression PR.

@ocworld
Copy link
Contributor Author

ocworld commented May 27, 2020

@imatiach-msft Thanks for your notice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants