16 Nov 05:19

e6da4d5

SynapseML v0.9.4

Building production ready distributed machine learning pipelines can be a challenge for even the most seasoned researcher or engineer. We are excited to announce the release of SynapseML (Previously MMLSpark), an open-source library that aims to simplify the creation of massively scalable machine learning pipelines. SynapseML unifies several existing ML Frameworks and new MSFT algorithms in a single, scalable API that’s usable across Python, R, Scala, and Java.

Highlights


General Availability on Synapse	ONNX on Spark	Responsible AI	Form Recognition and Translation	Reinforcement Learning
We are ready to help you productionalize on Azure Synapse Analytics	Distributed and hardware accelerated model inference on Spark	Understand opaque-box models, measure dataset biases, Explainable Boosting Machines	Parse PDFs and translate dataframes between over 100 languages	Contextual Bandit Reinforcement Learning with Vowpal Wabbit

New Features

General ✨

Renamed and rebranded! Microsoft ML for Apache Spark is now SynapseML
New modular library sub-packages for standalone install of each major set of features
Support Spark 3.1.2 and Scala 2.12
Support pip install synapseml for python bindings

ONNX on Spark 🕸

ONNX model inference on Spark (#1152)
Add documention and notebooks for ONNXModel evaluation (#1164)

Cognitive Services for Big Data🧠

Added Multilingual Translation APIs (#1108) (Tutorial)
Added FormRecognition APIs (Invoice, IDs, BusinessCards, Layouts, Custom Models) (#1099) (Tutorial)
Added the FormOntologyLearner to extract meaningful "ontologies" of objects from collections of forms
Add notebook to Create a Multilingual Search Engine from Forms
Updated Text Analytics API to V3.1 (#1193)
Add redactedText to PIIV3 (#1247)
Added Personally Identifying Information (PII) identification
Added Read API
Added Conversation Transcription API
Cognitive service now support data exfiltration protected (DEP) VNET allowing for individualized security solutions on Synapse Analytics (Learn More)
Added support for the m4a codec in Speech to Text models
Added predictive maintenance notebook
Added Cognitive Service overview notebook
Added support for linked service authentication in Synapse Analytics
Simple no-code support in in Synapse Analytics

Responsible AI at Scale 😇

Added Additive Shapley Explanations (SHAP) for understanding the predictions of opaque-box models (#1077)
New API for Locally Interpretable Model-Agnostic Explanations (LIME), now supports background distributions text models, and has the same API as SHAP (#1077)
Added Measure transformers for Data Balance Analysis (#1218)
Add more notebook samples for documentation (#1043)
Documentation and notebooks for Interpretability on Spark
Introduce Responsible AI section on website (Interpretability + DataBalanceAnalysis) (#1241)
Adding document and notebook for Data Balance Analysis (#1226)
Explainable Boosting Machines for performant and interpretable ML (Private preview on Synapse Analytics only)

Vowpal Wabbit 🐇

Added ContextualBandit reinforcement learning (#896)
Added Vowpal Wabbit Overview Notebook

LightGBM 🌳

Added matrix type parameter and improve logic to automatically infer dataset sparsity (#1052)
Added several parameters related to dart boosting type (#1045)
Added chunk size parameter for copying java data to native (#1041)
Added number of threads parameter (#1055)
Added custom objective function to LightGBM learners (#1054)
Added singleton dataset mode for faster performance and reduced memory usage (#1066)
Add num iteration and start iteration parameters to LightGBM model (#1024)
Added the average precision metric (#1034)
Added overview notebook for LightGBM
Moved to new streaming API for dense data to reduce memory usage
Tuned chinking code for faster performance

Build and Infrastructure Improvements 🏭

New Docusaurus website generation system
E2E Tests on Synapse Analytics (#1014)
Split library into separately installable subprojects (#1073)
Added a unified logging and telemetry system (#1019)
Modernized R wrapper generation
New Automated Python test generation (#998)
New extensible code generation system
New two-tiered security for build secrets
Update ubuntu version to 18.04
Automated back-up ACR images

Additional Updates

Bug Fixes 🐞

Enable backwards compatibility for mmlspark python namespace imports (#1244)
Fix publishing to maven and pypi (#1242)
Fix broken link to notebook in Data Balance Analysis doc (#1240)
min_data_in_leaf missing from dataset parameters in lightgbm (#1239)
Fix performance issue in interpretability notebooks (#1238)
Fixed cognitive service errors (#1176)
Fixed flaky tests
Rename NERPii to PII
Fixed cog service test flakes
Fixed setLinkedService issues in Synapse (#1177)
Improved LGBM error message for invalid slot names (#1160)
Fixed generated python code (#1121)
Updated notebookUtils class path (#1118)
Fixed LIME NaN weight output (#1117, #1112)
Fixed Guava version issue in Azure Synapse and Databricks (#1103)
Fixed flakiness in spark session stopping
Fixed result parsing for forms
Fixed explainers returning wrong results when targetClassesCol is specified
Fixed CNTKModel issue due to catalyst bug on databricks (#1076)
Fixed null handling in bing image response (#1067)
Avoided strange issue with databricks json parser
Fixed dependency exclusions and build secret querying
Fixed issue in tabular lime sampler (#1058)
Updated Bing search URLs (#1048)
Refactored python wrappers to use common class (#758)
Updated java params patch (#1027)
Added missing returns in new python lightGBM model methods
Stop R binding generation from failing silently
Fixed conversation transcription participant column functionality
Reduce verbosity to...

Assets 2

03 Nov 03:11

mhamilton723

v0.9.2

81f5f80

SynapseML v0.9.2

v0.9.2

Bug Fixes 🐞

fix publish to central maven (#1233)
fix website (#1234)
fix typo in sbt install
lightgbm default params should not be specified if optional (#1232)
fix website broken links (#1230)
improve azure search writer error message in Array[Array[]] case
update baseUrl and fix static images (#1217)
Fixing flaky unit tests (#1215)
Docker image should install openjdk-8-jre as opposed to default-… (#1211)
Fixing flaky test

Documentation 📘

add explanation dashboard integration example notebook (#1236)
fix links to developer readme and R setup (#1229)

Feat

Build our new website (#1190)

Features 🌈

support direct pip install (#1223)
Measure transformers for Data Balance Analysis (#1218)
Add the FormOntologyLearner

Maintenance 🔧

release synapseml 0.9.2 (#1237)

Performance Improvements 🚀

website enhancement (#1221)

Acknowledgements

We would like to acknowledge the developers and contributors, both internal and external who helped create this version of SynapseML.\n

Changes:

81f5f80 chore: release synapseml 0.9.2 (#1237)
127c70a docs: add explanation dashboard integration example notebook (#1236)
9b9c2fb fix: fix publish to central maven (#1233)
7059573 fix: fix website (#1234)
d47f014 fix: fix typo in sbt install
336eff5 fix: lightgbm default params should not be specified if optional (#1232)
3d92dd7 feat: support direct pip install (#1223)
2771853 docs: fix links to developer readme and R setup (#1229)
ea91189 fix: fix website broken links (#1230)
bbd8744 perf: website enhancement (#1221)

See More

c5e1742 feat: Measure transformers for Data Balance Analysis (#1218)
73c6a65 fix: improve azure search writer error message in Array[Array[]] case
d8344c5 feat: Add the FormOntologyLearner
2d81b50 fix: update baseUrl and fix static images (#1217)
e23041f fix: Fixing flaky unit tests (#1215)
5d31e3e fix: Docker image should install openjdk-8-jre as opposed to default-… (#1211)
9623b3e Feat: Build our new website (#1190)
3f74133 fix: Fixing flaky test

This list of changes was auto generated.

Assets 2

15 Oct 20:14

mhamilton723

v0.9.1

6b81426

SynapseML v0.9.1

v0.9.1

Bug Fixes 🐞

fix readme badge

Maintenance 🔧

Bump version to 0.9.1

Acknowledgements

We would like to acknowledge the developers and contributors, both internal and external who helped create this version of SynapseML.\n

Changes:

6b81426 chore: Bump version to 0.9.1
274b110 fix:fix doc publishing
600bc6e fix: fix readme badge

This list of changes was auto generated.

Assets 2

15 Oct 05:01

mhamilton723

v0.9.0

a6c7fea

SynapseML v0.9.0

v0.9.0

Bug Fixes 🐞

don't crash on fallback storage location (#1183)

Chore

rename mmlspark to synapseml (#1204)

Features 🌈

updata versions in README.md (#1205)

Maintenance 🔧

release synapseml 0.9.0 (#1206)

Acknowledgements

We would like to acknowledge the developers and contributors, both internal and external who helped create this version of SynapseML.\n

Changes:

a6c7fea chore: release synapseml 0.9.0 (#1206)
383cb95 Chore: rename mmlspark to synapseml (#1204)
ecc6868 fix: don't crash on fallback storage location (#1183)
661e3e5 feat: updata versions in README.md (#1205)

This list of changes was auto generated.

Assets 2

18 Jul 02:16

mhamilton723

mmlspark-v1.0.0-rc4

5fc65ab

MMLSpark v1.0.0-rc4

v1.0.0-rc4

Bug Fixes 🐞

fix setLinkedService in Synapse
fix cognitive service errors (#1176)
fix anomaly detector test cases
rename NERPii to PII
fix scala style error
fix cog service test flakes
fix setLinkedService issues in Synapse (#1177)
improve LGBM error message for invalid slot names (#1160)
flaky lime test
fix flaky conversation transcription test
fix SpeechToTextSDK setLinedService (#1138)
fix generated python code (#1121)
update notebookUtils class path (#1118)
LIME returns NaN weight if a feature contains a single value or when the sampler cannot obtain a different state for a feature due to data skew. It returns zero weights for all other features. (#1117)
fix Guava version issue in Azure Synapse and Databricks (#1103)
fix flakiness in spark session stopping
Fix result parsing for forms
LIME sometimes return nan weights (#1112)
reformat code
explainers return wrong results when targetClassesCol is specified
Unit test OOM error (#1093)
Update codeowners (#1092)
BingImageSearch fails randomly in E2E test (#1082)
[Workaround] CNTKModel does not output correct result (#1076)
small issue with null in bing image response (#1067)
fix flaky conversation transcription test
avoid strange issue with databricks json parser
fix dependency exclusions and build secret querying
Fix issue in tabular lime sampler (#1058)
Bing search URL update (#1048)
early stopping test and average precision metric (#1034)
refactor python wrappers to use common class (#758)
java params patch (#1027)
missing returns in new python lightgbm model methods
fix issue with r bindings silently failing
fix conversation transcription participant column functionality
reduce verbosity to prevent RPC disassociated errors
Fix performance slip in Featurize
add timeout for stt
update subscription in build secrets
Add ffmpeg time limit enforcing for flaky streams (#1001)
fix upload python whl file to blob(#1000)
adding more recommendation code owners (#996)
cleanup python tests (#994)
Fix read schemas (#988)
fix issue with NER suite test
make concurrent timeout infinite
Make rate limiting retry indefinitely
Recommender Patch for Spark 3 Update (#982)
fix typo in text sentimant schema
change ints to longs for offset and duration in STT
fix python tests in build
fix processing sparse vector size
Fix Double User agent setting bug

Build 🏭

add two teired security for build secrets
Fixing build warnings (#1080)
update ubuntu version to 18.04
fix build for new intellij
fix livy dependency resolution

Doc

add predictive maintenence notebook
Add CyberML link to README.md (#989)
Add example cyberML notebook (#958)

Documentation 📘

Adding document and notebooks for ONNXModel (#1164)
Documentation and notebooks for Interpretability on Spark
Add explicit pointer to HDI install
fix typo (#990)
Bump python install to top to make it clearer

Features 🌈

Update Text Analytics API to V3.1 (#1193)
add NERPii
Add Infrastructure to Run Tests on Synapse (#1014)
rename Read to ReadImage (#1163)
ONNX model inference on Spark (#1152)
update DocumentTranslator to support setLinkedService in Synapse (#1151)
add setLinkedService (#1136)
add translator (#1108)
add singleton dataset mode for faster performance and use old sparse dataset create method to reduce memory usage (#1066)
add form recognizer support (#1099)
split library into subprojects (#1073)
new LIME and KernelSHAP explainers (#1077)
refactor to have separate dataset utils and partition processor (#1089)
refactoring of lightgbm code in preparation for single dataset mode (#1088)
move partition consolidator and add LocalAggregator API (#1071)
add number of threads parameter (#1055)
add custom objective function to lightgbm learners (#1054)
Add more notebook samples for documentation (#1043)
add matrix type parameter and improve auto logic (#1052)
add several parameters related to dart boosting type (#1045)
added chunk size parameter for copying java data to native (#1041)
Add MMLSpark logging infrastructure (#1019)
Add R wrapper gen
add num iteration and start iteration to lightgbm model (#1024)
Refactor code generation system
add automated python test generation infrastructure (#998)
add TextLIME
Add ReadAPI
add conversation transcription
add m4a codec

Maintenance 🔧

bump version numbers (#1203)
Fix pom for sbt dependencies (#1202)
Add script to clean and back up ACR
fix bug in testgen parallelism
testing new build
disable failing synapse e2e tests
fix flaky serialization fuzzing test
disable failing doc translator test
fix flakiness in python tests (#1144)
auto-update packages in docker
fix flaky notebook
remove ununsed code
fix codecov logging of wrapper generation (#1098)
update to lightgbm 3.2.110
fix badge publishing
upgrade lightgbm to 3.2.100
update build to new subscription (#991)
fix Detect face suite (#968)
remove issue in scalastle file for new IJ
lower threshold for STT tests

Performance Improvements 🚀

tune chunking code, fix memory leak
moving to new streaming API for dense data to reduce memory usage

Update

reformat code
update setLocation
remove parens
use HasSetLinkedService trait
add more cognitive service
add more cognitive service
add more cognitive service
add more cognitive service
remove test code
add test code
remove testing code
add sample code for test
add sample code for test
add sample code for test
add sample code for test
add sample code for test
add sample code for test
add reflection
remove example in test files
add class path
add reflection
notebook
update spark version to 3.1.2 (#1086)

Acknowledgements

We would like to acknowledge the developers and contributors, both internal and external who helped create this version of MMLSpark.\n

Changes:

5fc65ab chore: bump version numbers (#1203)
993da81 chore: Fix pom for sbt dependencies (#1202)
327be83 feat: Update Text Analytics API to V3.1 (#1193)
6610577 fix: fix setLinkedService in Synapse
e08a8e2 chore: Add script to clean and back up ACR
d85aae8 fix: fix cognitive service errors (#1176)
c6925db fix: fix anomaly detector test cases
b52c361 fix: rename NERPii to PII
2ce1ba6...

Assets 2

18 Jul 02:16

mhamilton723

mmlspark-v1.0.0-rc3

67891a6

MMLSpark v1.0.0-rc3

v1.0.0-rc3

Bug Fixes 🐞

fix broken test link
Fix incorrect indexing for determining eval prob in CB (#922)
Update DBC path

Features 🌈

Add Env variable parametrized UserAgent header
Add support for ContextualBandit in the VW module (#896)
Update text analytics api to v3 (#916)

Maintenance 🔧

bump version to 1.0.0-rc3

Acknowledgements

We would like to acknowledge the developers and contributors, both internal and external who helped create this version of MMLSpark.

@jackgerrits @rohit21agrawal

Contributors

rohit21agrawal and jackgerrits

Assets 2

18 Jul 02:16

mmlspark-bot

mmlspark-v1.0.0-rc2

81e73a2

MMLSpark v1.0.0-rc2

Highlights


Isolation Forest on Spark	CyberML	Speech To Text	Conditional KNN	LightGBM + SHAP
Distributed Nonlinear Outlier Detection	Machine Learning Tools for Cyber Security	Custom Speech to Text with Streaming Support	Scalable KNN Models with Conditional Queries	Interpret LightGBM Models using Additive Shapley Explanations