Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for training metrics in PipelineSweeperMacro + new graph variable outputs #152

Merged
merged 15 commits into from
May 23, 2018
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 12 additions & 9 deletions src/Microsoft.ML.PipelineInference/AutoInference.cs
Original file line number Diff line number Diff line change
Expand Up @@ -56,13 +56,13 @@ public class DependencyMap : Dictionary<int, LevelDependencyMap> { }
/// </summary>
public sealed class SupportedMetric
{
public static readonly SupportedMetric Auc = new SupportedMetric(FieldNames.PipelineSweeperSupportedMetrics.Auc, true);
public static readonly SupportedMetric Auc = new SupportedMetric(FieldNames.PipelineSweeperSupportedMetrics.Auc.ToUpper(), true);
public static readonly SupportedMetric AccuracyMicro = new SupportedMetric(FieldNames.PipelineSweeperSupportedMetrics.AccuracyMicro, true);
public static readonly SupportedMetric AccuracyMacro = new SupportedMetric(FieldNames.PipelineSweeperSupportedMetrics.AccuracyMacro, true);
public static readonly SupportedMetric L1 = new SupportedMetric(FieldNames.PipelineSweeperSupportedMetrics.L1, false);
public static readonly SupportedMetric L2 = new SupportedMetric(FieldNames.PipelineSweeperSupportedMetrics.L2, false);
public static readonly SupportedMetric F1 = new SupportedMetric(FieldNames.PipelineSweeperSupportedMetrics.F1, true);
public static readonly SupportedMetric AuPrc = new SupportedMetric(FieldNames.PipelineSweeperSupportedMetrics.AuPrc, true);
public static readonly SupportedMetric AuPrc = new SupportedMetric(FieldNames.PipelineSweeperSupportedMetrics.AuPrc.ToUpper(), true);
public static readonly SupportedMetric TopKAccuracy = new SupportedMetric(FieldNames.PipelineSweeperSupportedMetrics.TopKAccuracy, true);
public static readonly SupportedMetric Rms = new SupportedMetric(FieldNames.PipelineSweeperSupportedMetrics.Rms, false);
Copy link
Contributor

@TomFinley TomFinley May 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rms [](start = 120, length = 3)

This? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure of a good way to do this, but am open to suggestions. These show up as all caps in the metrics IDV's, which was causing a failure (since it was looking for "Auc", the name given, rather than "AUC", what it was in the IDV). The failure wasn't seen previously, since a bug was causing the first column to always get picked up (which coincidentally was the AUC column), so fixing the bug exposed this problem. We want the name of the supported metric to match what it will be in the IDV when looking to extract it. Hence the upper case. Should I just hard-code to the string "AUC" / "AUPRC" here instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. So you are saying that something is producing columns with AUC in the name, but we were looking for Auc because FieldNames.PipelineSweeperSupportedMetrics defines it that way, and you fix it this way?

Is there a reason why we would not just either fix FieldNames.PipelineSweeperSupportedMetrics to be AUC, or fix whatever is producing a data view with column AUC to produce Auc?

That is: my impression is that this seems like a bandaid on a more serious inconsistency between two pieces of code, is that impression accurate or am I wrong? If I'm right I'd rather fix that inconsistency.


In reply to: 189666725 [](ancestors = 189666725)

Copy link
Contributor Author

@george-microsoft george-microsoft May 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FieldNames.PipelineSweeperSupportedMetrics string is defined to be whatever we want. Since these are shown to the user, I was wary of changing them to be AUC, etc., unilaterally. So the change is easy (I can just change them there) if no one has a problem with having them all caps. (It is weird to worry about capitalization, but that is also a big deal in the codebase, getting capitalization right and consistent). With this way, I was only changing it on my end, where I'd be using it, so it seemed like the least intrusive of the options. But your reply makes me think it wasn't the way to go.

public static readonly SupportedMetric LossFn = new SupportedMetric(FieldNames.PipelineSweeperSupportedMetrics.LossFn, false);
Expand Down Expand Up @@ -172,12 +172,14 @@ private bool GetDataVariableName(IExceptionContext ectx, string nameOfData, JTok
public sealed class RunSummary
{
public double MetricValue { get; }
public double TrainingMetricValue { get; }
public int NumRowsInTraining { get; }
public long RunTimeMilliseconds { get; }

public RunSummary(double metricValue, int numRows, long runTimeMilliseconds)
public RunSummary(double metricValue, int numRows, long runTimeMilliseconds, double trainingMetricValue)
{
MetricValue = metricValue;
TrainingMetricValue = trainingMetricValue;
NumRowsInTraining = numRows;
RunTimeMilliseconds = runTimeMilliseconds;
}
Expand Down Expand Up @@ -303,7 +305,7 @@ private void MainLearningLoop(int batchSize, int numOfTrainingRows)
var stopwatch = new Stopwatch();
var probabilityUtils = new Sweeper.Algorithms.SweeperProbabilityUtils(_host);

while (!_terminator.ShouldTerminate(_history))
while (!_terminator.ShouldTerminate(_history))
{
// Get next set of candidates
var currentBatchSize = batchSize;
Expand Down Expand Up @@ -341,16 +343,17 @@ private void ProcessPipeline(Sweeper.Algorithms.SweeperProbabilityUtils utils, S

// Run pipeline, and time how long it takes
stopwatch.Restart();
double d = candidate.RunTrainTestExperiment(_trainData.Take(randomizedNumberOfRows),
_testData, Metric, TrainerKind);
candidate.RunTrainTestExperiment(_trainData.Take(randomizedNumberOfRows),
_testData, Metric, TrainerKind, out var testMetricVal, out var trainMetricVal);
stopwatch.Stop();

// Handle key collisions on sorted list
while (_sortedSampledElements.ContainsKey(d))
d += 1e-10;
while (_sortedSampledElements.ContainsKey(testMetricVal))
testMetricVal += 1e-10;

// Save performance score
candidate.PerformanceSummary = new RunSummary(d, randomizedNumberOfRows, stopwatch.ElapsedMilliseconds);
candidate.PerformanceSummary =
new RunSummary(testMetricVal, randomizedNumberOfRows, stopwatch.ElapsedMilliseconds, trainMetricVal);
_sortedSampledElements.Add(candidate.PerformanceSummary.MetricValue, candidate);
_history.Add(candidate);
}
Expand Down
35 changes: 27 additions & 8 deletions src/Microsoft.ML.PipelineInference/AutoMlUtils.cs
Original file line number Diff line number Diff line change
Expand Up @@ -15,21 +15,38 @@ namespace Microsoft.ML.Runtime.PipelineInference
{
public static class AutoMlUtils
{
public static AutoInference.RunSummary ExtractRunSummary(IHostEnvironment env, IDataView data, string metricColumnName)
public static AutoInference.RunSummary ExtractRunSummary(IHostEnvironment env, IDataView result, string metricColumnName, IDataView trainResult = null)
{
double metricValue = 0;
double testingMetricValue = 0;
double trainingMetricValue = -1d;
int numRows = 0;
var schema = data.Schema;
schema.TryGetColumnIndex(metricColumnName, out var metricCol);
var schema = result.Schema;
bool hasIndex = schema.TryGetColumnIndex(metricColumnName, out var metricCol);
env.Check(hasIndex);

using (var cursor = data.GetRowCursor(col => col == metricCol))
using (var cursor = result.GetRowCursor(col => col == metricCol))
{
var getter = cursor.GetGetter<double>(metricCol);
cursor.MoveNext();
getter(ref metricValue);
bool moved = cursor.MoveNext();
env.Check(moved);
getter(ref testingMetricValue);
}

return new AutoInference.RunSummary(metricValue, numRows, 0);
if (trainResult != null)
{
var trainSchema = trainResult.Schema;
env.Check(trainSchema.TryGetColumnIndex(metricColumnName, out var trainingMetricCol));

using (var cursor = trainResult.GetRowCursor(col => col == trainingMetricCol))
{
var getter = cursor.GetGetter<double>(trainingMetricCol);
bool moved = cursor.MoveNext();
env.Check(moved);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

env.Check(moved); [](start = 20, length = 17)

We might want to make this error exception a little more specific, since the default message here, "The operation failed due to the current state of the object" (IIRC) is, while not strictly incorrect, perhaps not helpful from the perspective of allowing the user to diagnose the problem.

Copy link
Contributor Author

@george-microsoft george-microsoft May 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What error message would you suggest? Or are you saying trying to catch a more specific error?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HI George, so I think the idea is that you're expecting that the the input trainResult, if present, will have exactly one row. (Indeed, this seems to be the case in many of these methods... is it possible to just use an IRow rather than deal with an IDataView? We sometimes use an IRow for such purposes, rather than deal with the bother of having to open cursors, etc.

So I'd suggest something along those lines, than what I see currently... "could not move cursor forward" is certainly describing what's going on sort of, but the real error isn't that it can't move the cursor, or in the second case that you can, but rather that you expected an IDataView with exactly one row.


In reply to: 188125581 [](ancestors = 188125581)

Copy link
Contributor

@TomFinley TomFinley May 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're doing this a lot in this and other similar functions I see. Might want to factor out all this repeated logic into something resembling a utility. Maybe. Unless of course you can reprhase this code to just operate over IRow in which case it becomes a non-issue.


In reply to: 189667233 [](ancestors = 189667233,188125581)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some changes to this, following your suggestion to extract the code to a separate utility function. I'll push the changes shortly.

getter(ref trainingMetricValue);
}
}

return new AutoInference.RunSummary(testingMetricValue, numRows, 0, trainingMetricValue);
}

public static CommonInputs.IEvaluatorInput CloneEvaluatorInstance(CommonInputs.IEvaluatorInput evalInput) =>
Expand Down Expand Up @@ -618,5 +635,7 @@ public static Tuple<string, string[]>[] ConvertToSweepArgumentStrings(TlcModule.
}
return results;
}

public static string GenerateOverallTrainingMetricVarName(Guid id) => $"Var_Training_OM_{id:N}";
}
}
19 changes: 14 additions & 5 deletions src/Microsoft.ML.PipelineInference/Macros/PipelineSweeperMacro.cs
Original file line number Diff line number Diff line change
Expand Up @@ -65,18 +65,24 @@ public static Output ExtractSweepResult(IHostEnvironment env, ResultInput input)
var col1 = new KeyValuePair<string, ColumnType>("Graph", TextType.Instance);
var col2 = new KeyValuePair<string, ColumnType>("MetricValue", PrimitiveType.FromKind(DataKind.R8));
var col3 = new KeyValuePair<string, ColumnType>("PipelineId", TextType.Instance);
var col4 = new KeyValuePair<string, ColumnType>("TrainingMetricValue", PrimitiveType.FromKind(DataKind.R8));
var col5 = new KeyValuePair<string, ColumnType>("FirstInput", TextType.Instance);
var col6 = new KeyValuePair<string, ColumnType>("PredictorModel", TextType.Instance);

if (rows.Count == 0)
{
var host = env.Register("ExtractSweepResult");
outputView = new EmptyDataView(host, new SimpleSchema(host, col1, col2, col3));
outputView = new EmptyDataView(host, new SimpleSchema(host, col1, col2, col3, col4, col5, col6));
}
else
{
var builder = new ArrayDataViewBuilder(env);
builder.AddColumn(col1.Key, (PrimitiveType)col1.Value, rows.Select(r => new DvText(r.GraphJson)).ToArray());
builder.AddColumn(col2.Key, (PrimitiveType)col2.Value, rows.Select(r => r.MetricValue).ToArray());
builder.AddColumn(col3.Key, (PrimitiveType)col3.Value, rows.Select(r => new DvText(r.PipelineId)).ToArray());
builder.AddColumn(col4.Key, (PrimitiveType)col4.Value, rows.Select(r => r.TrainingMetricValue).ToArray());
builder.AddColumn(col5.Key, (PrimitiveType)col5.Value, rows.Select(r => new DvText(r.FirstInput)).ToArray());
builder.AddColumn(col6.Key, (PrimitiveType)col6.Value, rows.Select(r => new DvText(r.PredictorModel)).ToArray());
outputView = builder.GetDataView();
}
return new Output { Results = outputView, State = autoMlState };
Expand Down Expand Up @@ -132,11 +138,11 @@ public static CommonOutputs.MacroOutput<Output> PipelineSweep(
// Extract performance summaries and assign to previous candidate pipelines.
foreach (var pipeline in autoMlState.BatchCandidates)
{
if (node.Context.TryGetVariable(ExperimentUtils.GenerateOverallMetricVarName(pipeline.UniqueId),
out var v))
if (node.Context.TryGetVariable(ExperimentUtils.GenerateOverallMetricVarName(pipeline.UniqueId), out var v) &&
node.Context.TryGetVariable(AutoMlUtils.GenerateOverallTrainingMetricVarName(pipeline.UniqueId), out var v2))
{
pipeline.PerformanceSummary =
AutoMlUtils.ExtractRunSummary(env, (IDataView)v.Value, autoMlState.Metric.Name);
AutoMlUtils.ExtractRunSummary(env, (IDataView)v.Value, autoMlState.Metric.Name, (IDataView)v2.Value);
autoMlState.AddEvaluated(pipeline);
}
}
Expand Down Expand Up @@ -168,14 +174,17 @@ public static CommonOutputs.MacroOutput<Output> PipelineSweep(
{
// Add train test experiments to current graph for candidate pipeline
var subgraph = new Experiment(env);
var trainTestOutput = p.AddAsTrainTest(training, testing, autoMlState.TrainerKind, subgraph);
var trainTestOutput = p.AddAsTrainTest(training, testing, autoMlState.TrainerKind, subgraph, true);

// Change variable name to reference pipeline ID in output map, context and entrypoint output.
var uniqueName = ExperimentUtils.GenerateOverallMetricVarName(p.UniqueId);
var uniqueNameTraining = AutoMlUtils.GenerateOverallTrainingMetricVarName(p.UniqueId);
var sgNode = EntryPointNode.ValidateNodes(env, node.Context,
new JArray(subgraph.GetNodes().Last()), node.Catalog).Last();
sgNode.RenameOutputVariable(trainTestOutput.OverallMetrics.VarName, uniqueName, cascadeChanges: true);
sgNode.RenameOutputVariable(trainTestOutput.TrainingOverallMetrics.VarName, uniqueNameTraining, cascadeChanges: true);
trainTestOutput.OverallMetrics.VarName = uniqueName;
trainTestOutput.TrainingOverallMetrics.VarName = uniqueNameTraining;
expNodes.Add(sgNode);

// Store indicators, to pass to next iteration of macro.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
<ProjectReference Include="..\Microsoft.ML.Core\Microsoft.ML.Core.csproj" />
<ProjectReference Include="..\Microsoft.ML.StandardLearners\Microsoft.ML.StandardLearners.csproj" />
<ProjectReference Include="..\Microsoft.ML.Sweeper\Microsoft.ML.Sweeper.csproj" />
<ProjectReference Include="..\Microsoft.ML\Microsoft.ML.csproj" />
</ItemGroup>

</Project>
Loading