-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding support for training metrics in PipelineSweeperMacro + new graph variable outputs #152
Changes from 2 commits
05482eb
b789159
76dd270
483a50b
d42a522
fbee51c
0285c87
6f92693
e006ee1
90d96a3
1d1c303
ce9792f
a828266
c642f68
e066db0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,21 +15,38 @@ namespace Microsoft.ML.Runtime.PipelineInference | |
{ | ||
public static class AutoMlUtils | ||
{ | ||
public static AutoInference.RunSummary ExtractRunSummary(IHostEnvironment env, IDataView data, string metricColumnName) | ||
public static AutoInference.RunSummary ExtractRunSummary(IHostEnvironment env, IDataView result, string metricColumnName, IDataView trainResult = null) | ||
{ | ||
double metricValue = 0; | ||
double testingMetricValue = 0; | ||
double trainingMetricValue = -1d; | ||
int numRows = 0; | ||
var schema = data.Schema; | ||
schema.TryGetColumnIndex(metricColumnName, out var metricCol); | ||
var schema = result.Schema; | ||
bool hasIndex = schema.TryGetColumnIndex(metricColumnName, out var metricCol); | ||
env.Check(hasIndex); | ||
|
||
using (var cursor = data.GetRowCursor(col => col == metricCol)) | ||
using (var cursor = result.GetRowCursor(col => col == metricCol)) | ||
{ | ||
var getter = cursor.GetGetter<double>(metricCol); | ||
cursor.MoveNext(); | ||
getter(ref metricValue); | ||
bool moved = cursor.MoveNext(); | ||
env.Check(moved); | ||
getter(ref testingMetricValue); | ||
} | ||
|
||
return new AutoInference.RunSummary(metricValue, numRows, 0); | ||
if (trainResult != null) | ||
{ | ||
var trainSchema = trainResult.Schema; | ||
env.Check(trainSchema.TryGetColumnIndex(metricColumnName, out var trainingMetricCol)); | ||
|
||
using (var cursor = trainResult.GetRowCursor(col => col == trainingMetricCol)) | ||
{ | ||
var getter = cursor.GetGetter<double>(trainingMetricCol); | ||
bool moved = cursor.MoveNext(); | ||
env.Check(moved); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
We might want to make this error exception a little more specific, since the default message here, "The operation failed due to the current state of the object" (IIRC) is, while not strictly incorrect, perhaps not helpful from the perspective of allowing the user to diagnose the problem. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What error message would you suggest? Or are you saying trying to catch a more specific error? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. HI George, so I think the idea is that you're expecting that the the input So I'd suggest something along those lines, than what I see currently... "could not move cursor forward" is certainly describing what's going on sort of, but the real error isn't that it can't move the cursor, or in the second case that you can, but rather that you expected an In reply to: 188125581 [](ancestors = 188125581) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You're doing this a lot in this and other similar functions I see. Might want to factor out all this repeated logic into something resembling a utility. Maybe. Unless of course you can reprhase this code to just operate over In reply to: 189667233 [](ancestors = 189667233,188125581) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Made some changes to this, following your suggestion to extract the code to a separate utility function. I'll push the changes shortly. |
||
getter(ref trainingMetricValue); | ||
} | ||
} | ||
|
||
return new AutoInference.RunSummary(testingMetricValue, numRows, 0, trainingMetricValue); | ||
} | ||
|
||
public static CommonInputs.IEvaluatorInput CloneEvaluatorInstance(CommonInputs.IEvaluatorInput evalInput) => | ||
|
@@ -618,5 +635,7 @@ public static Tuple<string, string[]>[] ConvertToSweepArgumentStrings(TlcModule. | |
} | ||
return results; | ||
} | ||
|
||
public static string GenerateOverallTrainingMetricVarName(Guid id) => $"Var_Training_OM_{id:N}"; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure of a good way to do this, but am open to suggestions. These show up as all caps in the metrics IDV's, which was causing a failure (since it was looking for "Auc", the name given, rather than "AUC", what it was in the IDV). The failure wasn't seen previously, since a bug was causing the first column to always get picked up (which coincidentally was the AUC column), so fixing the bug exposed this problem. We want the name of the supported metric to match what it will be in the IDV when looking to extract it. Hence the upper case. Should I just hard-code to the string "AUC" / "AUPRC" here instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. So you are saying that something is producing columns with
AUC
in the name, but we were looking forAuc
becauseFieldNames.PipelineSweeperSupportedMetrics
defines it that way, and you fix it this way?Is there a reason why we would not just either fix
FieldNames.PipelineSweeperSupportedMetrics
to beAUC
, or fix whatever is producing a data view with columnAUC
to produceAuc
?That is: my impression is that this seems like a bandaid on a more serious inconsistency between two pieces of code, is that impression accurate or am I wrong? If I'm right I'd rather fix that inconsistency.
In reply to: 189666725 [](ancestors = 189666725)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The FieldNames.PipelineSweeperSupportedMetrics string is defined to be whatever we want. Since these are shown to the user, I was wary of changing them to be AUC, etc., unilaterally. So the change is easy (I can just change them there) if no one has a problem with having them all caps. (It is weird to worry about capitalization, but that is also a big deal in the codebase, getting capitalization right and consistent). With this way, I was only changing it on my end, where I'd be using it, so it seemed like the least intrusive of the options. But your reply makes me think it wasn't the way to go.