Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scores to Label mapping #239

Merged
merged 7 commits into from
May 25, 2018
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions src/Microsoft.ML.Core/Data/ITransformModel.cs
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,10 @@ public interface ITransformModel
/// </summary>
ISchema InputSchema { get; }

/// <summary>
/// Schema of the transform model.
Copy link
Contributor

@TomFinley TomFinley May 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Schema of the transform model. [](start = 12, length = 30)

This comment, even if the thing were fixed to return an ISchema, is rather odd because models don't have schema. They only have schema once applied to model. You will need something like the above description of InputSchema, that clarifies the status of this. "Schema of the transform model" is not a meaningful statement. #Closed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll also probably want to name it OutputSchema, since it serves a similar purpose to InputSchema.


In reply to: 190936953 [](ancestors = 190936953)

/// </summary>
IDataView Schema { get; }
Copy link
Contributor

@zeahmed zeahmed May 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the name Schema here seems bit misleading. The type here is IDataView. This is against InputSchema defined above which has type ISchema.
#Resolved

/// <summary>
/// Apply the transform(s) in the model to the given input data.
/// </summary>
Expand Down
8 changes: 8 additions & 0 deletions src/Microsoft.ML.Data/EntryPoints/TransformModel.cs
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,14 @@ public ISchema InputSchema
get { return _schemaRoot; }
}

/// <summary>
/// Schema of the transform model.
/// </summary>
public IDataView Schema
Copy link
Contributor

@TomFinley TomFinley May 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDataView Schema [](start = 15, length = 16)

A data view is not a schema. #Closed

{
get { return _chain; }
}

Copy link
Contributor

@TomFinley TomFinley May 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember we now have C# 7.x's niceties available to us. #Closed

/// <summary>
/// Create a TransformModel containing the transforms from "result" back to "input".
/// </summary>
Expand Down
27 changes: 27 additions & 0 deletions src/Microsoft.ML/PredictionModel.cs
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,33 @@ internal Runtime.EntryPoints.TransformModel PredictorModel
get { return _predictorModel; }
}

/// <summary>
/// Returns labels that correspond to indices of the score array in the case of
/// multi-class classification problem.
/// </summary>
/// <param name="mapping">Label to score mapping</param>
/// <param name="scoreColumnName">Name of the score column</param>
/// <returns></returns>
public bool TryGetScoreLabelMapping(out string[] mapping, string scoreColumnName = "Score")
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka May 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Score" [](start = 91, length = 7)

DefaultColumnNames.Score maybe? #Resolved

{
mapping = null;
IDataView idv = _predictorModel.Schema;
int colIndex = -1;
if (!idv.Schema.TryGetColumnIndex(scoreColumnName, out colIndex))
return false;

VBuffer<DvText> labels = default(VBuffer<DvText>);
Copy link
Contributor

@TomFinley TomFinley May 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default(VBuffer); [](start = 37, length = 25)

Can be just default since you were explicit with the type declaration. (New in some version of C# 7.x.) Or maybe just have this be var. #Closed

idv.Schema.GetMetadata(MetadataUtils.Kinds.SlotNames, colIndex, ref labels);
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka May 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetMetadata [](start = 23, length = 11)

TryGetMetadata? #Resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, because this is a Try the method should succeed even if the data isn't there, is of a different type, or anything like that.


In reply to: 190754868 [](ancestors = 190754868)


Contracts.Assert(labels.IsDense);
Copy link
Contributor

@TomFinley TomFinley May 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contracts.Assert(labels.IsDense); [](start = 12, length = 33)

You can't assume that it is dense, and a VBuffer being sparse isn't an error condition. #Closed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.DenseValues perhaps.


In reply to: 190781068 [](ancestors = 190781068)


mapping = new string[labels.Count];
Copy link
Contributor

@TomFinley TomFinley May 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

labels.Count [](start = 33, length = 12)

Length. #Closed

for (int index = 0; index < labels.Count; index++)
mapping[index] = labels.Values[index].ToString();
Copy link
Contributor

@TomFinley TomFinley May 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrase as enumeration over DenseValues here, in case it is not clear. #Closed


Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka May 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extra lines #Resolved

return true;
}

/// <summary>
/// Read model from file asynchronously.
/// </summary>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,14 @@ public void TrainAndPredictIrisModelWithStringLabelTest()
pipeline.Add(new StochasticDualCoordinateAscentClassifier());

PredictionModel<IrisDataWithStringLabel, IrisPrediction> model = pipeline.Train<IrisDataWithStringLabel, IrisPrediction>();
string[] scoreLabels;
model.TryGetScoreLabelMapping(out scoreLabels);
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka May 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model.TryGetScoreLabelMapping(out scoreLabels); [](start = 12, length = 47)

Why I can't just define
public class IrisPrediction
{
[ColumnName("Score")]
public float[] PredictedScores;

        [ColumnName("OriginalLabels")]
        public string[] OriginalLabels;
    }

and fill it automatically if it's possible? #Resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IrisPrediction is a per-row structure. This is metadata, which is a property of the schema itself.


In reply to: 190754719 [](ancestors = 190754719)


Assert.NotNull(scoreLabels);
Assert.Equal(3, scoreLabels.Length);
Assert.True(scoreLabels[0] == "Iris-setosa");
Assert.True(scoreLabels[1] == "Iris-versicolor");
Assert.True(scoreLabels[2] == "Iris-virginica");
Copy link
Contributor

@TomFinley TomFinley May 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you know about Assert.Equal since you used it in the line above. Maybe these three equality conditions should be Assert.Equal. #Closed


IrisPrediction prediction = model.Predict(new IrisDataWithStringLabel()
{
Expand Down