Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update documentation for LightGBM and add missing binary references to console app. #452

Merged
merged 10 commits into from
Jul 2, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions Directory.Build.targets
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,33 @@
Text="The tools directory [$(ToolsDir)] does not exist. Please run build in the root of the repo to ensure the tools are installed before attempting to build an individual project." />
</Target>

<Target Name="CopyNativeAssemblies"
BeforeTargets="PrepareForRun">

<PropertyGroup>
<LibPrefix Condition="'$(OS)' != 'Windows_NT'">lib</LibPrefix>
<LibExtension Condition="'$(OS)' == 'Windows_NT'">.dll</LibExtension>
<LibExtension Condition="'$(OS)' != 'Windows_NT'">.so</LibExtension>
<LibExtension Condition="$([MSBuild]::IsOSPlatform('osx'))">.dylib</LibExtension>
</PropertyGroup>

<ItemGroup>
<NativeAssemblyReference>
<FullAssemblyPath>$(NativeOutputPath)$(LibPrefix)%(NativeAssemblyReference.Identity)$(LibExtension)</FullAssemblyPath>
</NativeAssemblyReference>
</ItemGroup>

<Copy SourceFiles = "@(NativeAssemblyReference->'%(FullAssemblyPath)')"
DestinationFolder="$(OutputPath)"
OverwriteReadOnlyFiles="$(OverwriteReadOnlyFiles)"
Retries="$(CopyRetryCount)"
RetryDelayMilliseconds="$(CopyRetryDelayMilliseconds)"
UseHardlinksIfPossible="$(CreateHardLinksForPublishFilesIfPossible)"
UseSymboliclinksIfPossible="$(CreateSymbolicLinksForPublishFilesIfPossible)">
<Output TaskParameter="DestinationFiles" ItemName="FileWrites"/>
</Copy>

</Target>

<Import Project="$(ToolsDir)/versioning.targets" Condition="Exists('$(ToolsDir)/versioning.targets')" />
</Project>
2 changes: 1 addition & 1 deletion src/Microsoft.ML.Console/Console.cs
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,4 @@ public static class Console
{
public static int Main(string[] args) => Maml.Main(args);
}
}
}
27 changes: 22 additions & 5 deletions src/Microsoft.ML.Console/Microsoft.ML.Console.csproj
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,34 @@
<PropertyGroup>
<AllowUnsafeBlocks>true</AllowUnsafeBlocks>
<DefineConstants>CORECLR</DefineConstants>
<TargetFramework>netcoreapp2.0</TargetFramework>
<OutputType>Exe</OutputType>
<AssemblyName>MML</AssemblyName>
<StartupObject>Microsoft.ML.Runtime.Tools.Console.Console</StartupObject>
<TargetFramework>netcoreapp2.0</TargetFramework>
<OutputType>Exe</OutputType>
<AssemblyName>MML</AssemblyName>
<StartupObject>Microsoft.ML.Runtime.Tools.Console.Console</StartupObject>
</PropertyGroup>

<ItemGroup>
<ProjectReference Include="..\Microsoft.ML.Core\Microsoft.ML.Core.csproj" />
<ProjectReference Include="..\Microsoft.ML.CpuMath\Microsoft.ML.CpuMath.csproj" />
<ProjectReference Include="..\Microsoft.ML.Data\Microsoft.ML.Data.csproj" />
<ProjectReference Include="..\Microsoft.ML.Ensemble\Microsoft.ML.Ensemble.csproj" />
<ProjectReference Include="..\Microsoft.ML.FastTree\Microsoft.ML.FastTree.csproj" />
<ProjectReference Include="..\Microsoft.ML.InternalStreams\Microsoft.ML.InternalStreams.csproj" />
<ProjectReference Include="..\Microsoft.ML.KMeansClustering\Microsoft.ML.KMeansClustering.csproj" />
<ProjectReference Include="..\Microsoft.ML.LightGBM\Microsoft.ML.LightGBM.csproj" />
<ProjectReference Include="..\Microsoft.ML.Maml\Microsoft.ML.Maml.csproj" />
<ProjectReference Include="..\Microsoft.ML.PCA\Microsoft.ML.PCA.csproj" />
<ProjectReference Include="..\Microsoft.ML.PipelineInference\Microsoft.ML.PipelineInference.csproj" />
</ItemGroup>
<ProjectReference Include="..\Microsoft.ML.ResultProcessor\Microsoft.ML.ResultProcessor.csproj" />
<ProjectReference Include="..\Microsoft.ML.StandardLearners\Microsoft.ML.StandardLearners.csproj" />
<ProjectReference Include="..\Microsoft.ML.Sweeper\Microsoft.ML.Sweeper.csproj" />
<ProjectReference Include="..\Microsoft.ML.Transforms\Microsoft.ML.Transforms.csproj" />
<ProjectReference Include="..\Microsoft.ML.UniversalModelFormat\Microsoft.ML.UniversalModelFormat.csproj" />

<NativeAssemblyReference Include="FastTreeNative" />
<NativeAssemblyReference Include="CpuMathNative" />
<NativeAssemblyReference Include="FactorizationMachineNative" />
Copy link
Member

@eerhardt eerhardt Jun 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about LdaNative? https://github.com/dotnet/machinelearning/tree/master/src/Native/LdaNative. Is that necessary? #Resolved

Copy link
Member Author

@codemzs codemzs Jun 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. We will need that if someone uses that via command line.


In reply to: 199242696 [](ancestors = 199242696)

<NativeAssemblyReference Include="LdaNative" />
</ItemGroup>

</Project>
2 changes: 1 addition & 1 deletion src/Microsoft.ML.LightGBM/LightGbmBinaryTrainer.cs
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,7 @@ public static partial class LightGbm
{
[TlcModule.EntryPoint(
Name = "Trainers.LightGbmBinaryClassifier",
Desc = "Train an LightGBM binary class model",
Desc = "Train a LightGBM binary class model.",
UserName = LightGbmBinaryTrainer.Summary,
ShortName = LightGbmBinaryTrainer.ShortName)]
public static CommonOutputs.BinaryClassificationOutput TrainBinary(IHostEnvironment env, LightGbmArguments input)
Expand Down
2 changes: 1 addition & 1 deletion src/Microsoft.ML.LightGBM/LightGbmMulticlassTrainer.cs
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,7 @@ public static partial class LightGbm
{
[TlcModule.EntryPoint(
Name = "Trainers.LightGbmClassifier",
Desc = "Train an LightGBM multi class model",
Desc = "Train a LightGBM multi class model.",
UserName = LightGbmMulticlassTrainer.Summary,
ShortName = LightGbmMulticlassTrainer.ShortName)]
public static CommonOutputs.MulticlassClassificationOutput TrainMultiClass(IHostEnvironment env, LightGbmArguments input)
Expand Down
6 changes: 5 additions & 1 deletion src/Microsoft.ML.LightGBM/LightGbmRankingTrainer.cs
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,11 @@ protected override void CheckAndUpdateParametersBeforeTraining(IChannel ch, Role
/// </summary>
public static partial class LightGbm
{
[TlcModule.EntryPoint(Name = "Trainers.LightGbmRanker", Desc = "Train an LightGBM ranking model", UserName = LightGbmRankingTrainer.Summary, ShortName = LightGbmRankingTrainer.ShortName)]
[TlcModule.EntryPoint(
Name = "Trainers.LightGbmRanker",
Desc = "Train a LightGBM ranking model.",
UserName = LightGbmRankingTrainer.Summary,
ShortName = LightGbmRankingTrainer.ShortName)]
public static CommonOutputs.RankingOutput TrainRanking(IHostEnvironment env, LightGbmArguments input)
{
Contracts.CheckValue(env, nameof(env));
Expand Down
6 changes: 5 additions & 1 deletion src/Microsoft.ML.LightGBM/LightGbmRegressionTrainer.cs
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,11 @@ protected override void CheckAndUpdateParametersBeforeTraining(IChannel ch, Role
/// </summary>
public static partial class LightGbm
{
[TlcModule.EntryPoint(Name = "Trainers.LightGbmRegressor", Desc = LightGbmRegressorTrainer.Summary, UserName = LightGbmRegressorTrainer.UserNameValue, ShortName = LightGbmRegressorTrainer.ShortName)]
[TlcModule.EntryPoint(
Name = "Trainers.LightGbmRegressor",
Desc = LightGbmRegressorTrainer.Summary,
UserName = LightGbmRegressorTrainer.UserNameValue,
ShortName = LightGbmRegressorTrainer.ShortName)]
public static CommonOutputs.RegressionOutput TrainRegression(IHostEnvironment env, LightGbmArguments input)
{
Contracts.CheckValue(env, nameof(env));
Expand Down
6 changes: 3 additions & 3 deletions src/Microsoft.ML/CSharpApi.cs
Original file line number Diff line number Diff line change
Expand Up @@ -7320,7 +7320,7 @@ public enum LightGbmArgumentsEvalMetricType


/// <summary>
/// Train an LightGBM binary class model
/// Train a LightGBM binary class model.
/// </summary>
public sealed partial class LightGbmBinaryClassifier : Microsoft.ML.Runtime.EntryPoints.CommonInputs.ITrainerInputWithGroupId, Microsoft.ML.Runtime.EntryPoints.CommonInputs.ITrainerInputWithWeight, Microsoft.ML.Runtime.EntryPoints.CommonInputs.ITrainerInputWithLabel, Microsoft.ML.Runtime.EntryPoints.CommonInputs.ITrainerInput, Microsoft.ML.ILearningPipelineItem
{
Expand Down Expand Up @@ -7525,7 +7525,7 @@ namespace Trainers
{

/// <summary>
/// Train an LightGBM multi class model
/// Train a LightGBM multi class model.
/// </summary>
public sealed partial class LightGbmClassifier : Microsoft.ML.Runtime.EntryPoints.CommonInputs.ITrainerInputWithGroupId, Microsoft.ML.Runtime.EntryPoints.CommonInputs.ITrainerInputWithWeight, Microsoft.ML.Runtime.EntryPoints.CommonInputs.ITrainerInputWithLabel, Microsoft.ML.Runtime.EntryPoints.CommonInputs.ITrainerInput, Microsoft.ML.ILearningPipelineItem
{
Expand Down Expand Up @@ -7730,7 +7730,7 @@ namespace Trainers
{

/// <summary>
/// Train an LightGBM ranking model
/// Train a LightGBM ranking model.
/// </summary>
public sealed partial class LightGbmRanker : Microsoft.ML.Runtime.EntryPoints.CommonInputs.ITrainerInputWithGroupId, Microsoft.ML.Runtime.EntryPoints.CommonInputs.ITrainerInputWithWeight, Microsoft.ML.Runtime.EntryPoints.CommonInputs.ITrainerInputWithLabel, Microsoft.ML.Runtime.EntryPoints.CommonInputs.ITrainerInput, Microsoft.ML.ILearningPipelineItem
{
Expand Down
58 changes: 58 additions & 0 deletions src/Microsoft.ML/Trainers/LightGBM.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.

namespace Microsoft.ML.Trainers
{
/// <summary>
/// This API requires Microsoft.ML.LightGBM nuget.
/// </summary>
/// <example>
/// <code>
/// pipeline.Add(new LightGbmBinaryClassifier() { NumLeaves = 5, NumBoostRound = 5, MinDataPerLeaf = 2 })
/// </code>
/// </example>
public sealed partial class LightGbmBinaryClassifier
{

}

/// <summary>
/// This API requires Microsoft.ML.LightGBM nuget.
/// </summary>
/// <example>
/// <code>
/// pipeline.Add(new LightGbmClassifier() { NumLeaves = 5, NumBoostRound = 5, MinDataPerLeaf = 2 })
/// </code>
/// </example>
public sealed partial class LightGbmClassifier
{

}

/// <summary>
/// This API requires Microsoft.ML.LightGBM nuget.
/// </summary>
/// <example>
/// <code>
/// pipeline.Add(new LightGbmRanker() { NumLeaves = 5, NumBoostRound = 5, MinDataPerLeaf = 2 })
/// </code>
/// </example>
public sealed partial class LightGbmRanker
{

}

/// <summary>
/// This API requires Microsoft.ML.LightGBM nuget.
/// </summary>
/// <example>
/// <code>
/// pipeline.Add(new LightGbmRegressor() { NumLeaves = 5, NumBoostRound = 5, MinDataPerLeaf = 2 })
/// </code>
/// </example>
public sealed partial class LightGbmRegressor
{

}
}
4 changes: 4 additions & 0 deletions test/BaselineOutput/Common/EntryPoints/core_ep-list.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,10 @@ Trainers.FieldAwareFactorizationMachineBinaryClassifier Train a field-aware fact
Trainers.GeneralizedAdditiveModelBinaryClassifier Trains a gradient boosted stump per feature, on all features simultaneously, to fit target values using least-squares. It mantains no interactions between features. Microsoft.ML.Runtime.FastTree.Gam TrainBinary Microsoft.ML.Runtime.FastTree.BinaryClassificationGamTrainer+Arguments Microsoft.ML.Runtime.EntryPoints.CommonOutputs+BinaryClassificationOutput
Trainers.GeneralizedAdditiveModelRegressor Trains a gradient boosted stump per feature, on all features simultaneously, to fit target values using least-squares. It mantains no interactions between features. Microsoft.ML.Runtime.FastTree.Gam TrainRegression Microsoft.ML.Runtime.FastTree.RegressionGamTrainer+Arguments Microsoft.ML.Runtime.EntryPoints.CommonOutputs+RegressionOutput
Trainers.KMeansPlusPlusClusterer K-means is a popular clustering algorithm. With K-means, the data is clustered into a specified number of clusters in order to minimize the within-cluster sum of squares. K-means++ improves upon K-means by using a better method for choosing the initial cluster centers. Microsoft.ML.Runtime.KMeans.KMeansPlusPlusTrainer TrainKMeans Microsoft.ML.Runtime.KMeans.KMeansPlusPlusTrainer+Arguments Microsoft.ML.Runtime.EntryPoints.CommonOutputs+ClusteringOutput
Trainers.LightGbmBinaryClassifier Train a LightGBM binary class model. Microsoft.ML.Runtime.LightGBM.LightGbm TrainBinary Microsoft.ML.Runtime.LightGBM.LightGbmArguments Microsoft.ML.Runtime.EntryPoints.CommonOutputs+BinaryClassificationOutput
Trainers.LightGbmClassifier Train a LightGBM multi class model. Microsoft.ML.Runtime.LightGBM.LightGbm TrainMultiClass Microsoft.ML.Runtime.LightGBM.LightGbmArguments Microsoft.ML.Runtime.EntryPoints.CommonOutputs+MulticlassClassificationOutput
Trainers.LightGbmRanker Train a LightGBM ranking model. Microsoft.ML.Runtime.LightGBM.LightGbm TrainRanking Microsoft.ML.Runtime.LightGBM.LightGbmArguments Microsoft.ML.Runtime.EntryPoints.CommonOutputs+RankingOutput
Trainers.LightGbmRegressor LightGBM Regression Microsoft.ML.Runtime.LightGBM.LightGbm TrainRegression Microsoft.ML.Runtime.LightGBM.LightGbmArguments Microsoft.ML.Runtime.EntryPoints.CommonOutputs+RegressionOutput
Trainers.LinearSvmBinaryClassifier Train a linear SVM. Microsoft.ML.Runtime.Learners.LinearSvm TrainLinearSvm Microsoft.ML.Runtime.Learners.LinearSvm+Arguments Microsoft.ML.Runtime.EntryPoints.CommonOutputs+BinaryClassificationOutput
Trainers.LogisticRegressionBinaryClassifier Logistic Regression is a classification method used to predict the value of a categorical dependent variable from its relationship to one or more independent variables assumed to have a logistic distribution. If the dependent variable has only two possible values (success/failure), then the logistic regression is binary. If the dependent variable has more than two possible values (blood type given diagnostic test results), then the logistic regression is multinomial.The optimization technique used for LogisticRegressionBinaryClassifier is the limited memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS). Both the L-BFGS and regular BFGS algorithms use quasi-Newtonian methods to estimate the computationally intensive Hessian matrix in the equation used by Newton's method to calculate steps. But the L-BFGS approximation uses only a limited amount of memory to compute the next step direction, so that it is especially suited for problems with a large number of variables. The memory_size parameter specifies the number of past positions and gradients to store for use in the computation of the next step.This learner can use elastic net regularization: a linear combination of L1 (lasso) and L2 (ridge) regularizations. Regularization is a method that can render an ill-posed problem more tractable by imposing constraints that provide information to supplement the data and that prevents overfitting by penalizing models with extreme coefficient values. This can improve the generalization of the model learned by selecting the optimal complexity in the bias-variance tradeoff. Regularization works by adding the penalty that is associated with coefficient values to the error of the hypothesis. An accurate model with extreme coefficient values would be penalized more, but a less accurate model with more conservative values would be penalized less. L1 and L2 regularization have different effects and uses that are complementary in certain respects.l1_weight: can be applied to sparse models, when working with high-dimensional data. It pulls small weights associated features that are relatively unimportant towards 0. l2_weight: is preferable for data that is not sparse. It pulls large weights towards zero. Adding the ridge penalty to the regularization overcomes some of lasso's limitations. It can improve its predictive accuracy, for example, when the number of predictors is greater than the sample size. If x = l1_weight and y = l2_weight, ax + by = c defines the linear span of the regularization terms. The default values of x and y are both 1. An agressive regularization can harm predictive capacity by excluding important variables out of the model. So choosing the optimal values for the regularization parameters is important for the performance of the logistic regression model.<see href='http://en.wikipedia.org/wiki/L-BFGS'>Wikipedia: L-BFGS</see>.<see href='http://en.wikipedia.org/wiki/Logistic_regression'>Wikipedia: Logistic regression</see>.<see href='http://research.microsoft.com/apps/pubs/default.aspx?id=78900'>Scalable Training of L1-Regularized Log-Linear Models</see>.<see href='https://msdn.microsoft.com/en-us/magazine/dn904675.aspx'>Test Run - L1 and L2 Regularization for Machine Learning</see>. Microsoft.ML.Runtime.Learners.LogisticRegression TrainBinary Microsoft.ML.Runtime.Learners.LogisticRegression+Arguments Microsoft.ML.Runtime.EntryPoints.CommonOutputs+BinaryClassificationOutput
Trainers.LogisticRegressionClassifier Logistic Regression is a classification method used to predict the value of a categorical dependent variable from its relationship to one or more independent variables assumed to have a logistic distribution. If the dependent variable has only two possible values (success/failure), then the logistic regression is binary. If the dependent variable has more than two possible values (blood type given diagnostic test results), then the logistic regression is multinomial.The optimization technique used for LogisticRegressionBinaryClassifier is the limited memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS). Both the L-BFGS and regular BFGS algorithms use quasi-Newtonian methods to estimate the computationally intensive Hessian matrix in the equation used by Newton's method to calculate steps. But the L-BFGS approximation uses only a limited amount of memory to compute the next step direction, so that it is especially suited for problems with a large number of variables. The memory_size parameter specifies the number of past positions and gradients to store for use in the computation of the next step.This learner can use elastic net regularization: a linear combination of L1 (lasso) and L2 (ridge) regularizations. Regularization is a method that can render an ill-posed problem more tractable by imposing constraints that provide information to supplement the data and that prevents overfitting by penalizing models with extreme coefficient values. This can improve the generalization of the model learned by selecting the optimal complexity in the bias-variance tradeoff. Regularization works by adding the penalty that is associated with coefficient values to the error of the hypothesis. An accurate model with extreme coefficient values would be penalized more, but a less accurate model with more conservative values would be penalized less. L1 and L2 regularization have different effects and uses that are complementary in certain respects.l1_weight: can be applied to sparse models, when working with high-dimensional data. It pulls small weights associated features that are relatively unimportant towards 0. l2_weight: is preferable for data that is not sparse. It pulls large weights towards zero. Adding the ridge penalty to the regularization overcomes some of lasso's limitations. It can improve its predictive accuracy, for example, when the number of predictors is greater than the sample size. If x = l1_weight and y = l2_weight, ax + by = c defines the linear span of the regularization terms. The default values of x and y are both 1. An agressive regularization can harm predictive capacity by excluding important variables out of the model. So choosing the optimal values for the regularization parameters is important for the performance of the logistic regression model.<see href='http://en.wikipedia.org/wiki/L-BFGS'>Wikipedia: L-BFGS</see>.<see href='http://en.wikipedia.org/wiki/Logistic_regression'>Wikipedia: Logistic regression</see>.<see href='http://research.microsoft.com/apps/pubs/default.aspx?id=78900'>Scalable Training of L1-Regularized Log-Linear Models</see>.<see href='https://msdn.microsoft.com/en-us/magazine/dn904675.aspx'>Test Run - L1 and L2 Regularization for Machine Learning</see>. Microsoft.ML.Runtime.Learners.LogisticRegression TrainMultiClass Microsoft.ML.Runtime.Learners.MulticlassLogisticRegression+Arguments Microsoft.ML.Runtime.EntryPoints.CommonOutputs+MulticlassClassificationOutput
Expand Down
Loading