-
Notifications
You must be signed in to change notification settings - Fork 842
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add featuresShapCol to LightGBMClassifierModel #863
Conversation
src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMClassifier.scala
Outdated
Show resolved
Hide resolved
So we have SHAP for LightGBM Ranker and Classifier? Is it now missing only for Regressor? |
src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMClassifier.scala
Outdated
Show resolved
Hide resolved
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great except for a few minor changes!
@ekerazha I assume that except regression SHAP is supported. |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
Codecov Report
@@ Coverage Diff @@
## master #863 +/- ##
==========================================
+ Coverage 85.12% 85.16% +0.03%
==========================================
Files 187 187
Lines 8668 8681 +13
Branches 520 507 -13
==========================================
+ Hits 7379 7393 +14
+ Misses 1289 1288 -1
Continue to review full report at Codecov.
|
hmm seems the test is failing in the native code:
|
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
running another build again, will try to run this branch locally also |
@ocworld great news, I was able to run your code locally and I discovered the issue - SHAP returns the expected value in addition to the feature contributions, which led to memory corruption. Although I was not able to get the SHAP test to fail, I did see another test fail due to memory corruption from the SHAP test, and once I fixed the issue I validated that the memory corruption error is no longer occurring. I pushed the fix directly to your branch, hope you don't mind. |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
missed updating another test in ranker which was checking the number of features, fixed |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
found another minor issue where the new column name was missing in the objects to save which caused a serialization failure, fixed, hopefully build should be green now |
once PR is merged I'll send a corresponding PR for |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
feat: add featuresShapCol to LightGBMClassifierModel (microsoft#863) Co-authored-by: Fokko Driesprong <fokko@apache.org> Co-authored-by: Ilya Matiach <ilmat@microsoft.com>
@ocworld I noticed this PR won't work for multiclass case, since multiclass shap values are of shape (# features + 1 expected_value) * # classes, but we currently aren't making the array the size of # classes. I can fix this too when I send out the regression PR. |
@imatiach-msft Thanks for your notice |
In this pr, featuresShapCol is added to LightGBMClassifierModel.