Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AutoML aggregate exception #5631

Merged
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions src/Microsoft.ML.AutoML/Experiment/Experiment.cs
Original file line number Diff line number Diff line change
Expand Up @@ -196,6 +196,22 @@ public IList<TRunDetail> Execute()
"was reached, and the running MLContext was stopped. Details: {0}", e.Message);
return iterationResults;
}
catch (AggregateException e)
{
// This exception is thrown when the IHost/MLContext of the trainer is canceled due to
// reaching maximum experiment time. Simply catch this exception and return finished
// iteration results. For some trainers, like FastTree, because training is done in parallel
// in can throw multiple OperationCancelledExceptions. This causes them to be returned as an
// AggregateException and misses the first catch block. This is to handle that case.
if (e.InnerExceptions.All( exception => exception is OperationCanceledException))
{
_logger.Warning("OperationCanceledException has been caught after maximum experiment time" +
"was reached, and the running MLContext was stopped. Details: {0}", e.Message);
return iterationResults;
}

throw;
}
} while (_history.Count < _experimentSettings.MaxModels &&
!_experimentSettings.CancellationToken.IsCancellationRequested &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not sure how to really add tests for this, It was brought up by a user and you can repro it locally though its not consistent. If any of you have ideas for good tests let me know.

To add a unit test focusing on OperationCanceledException, you could set the AutoML setting to use only FastTree:

var experimentSettings = new RegressionExperimentSettings { MaxExperimentTimeInSeconds = 1 };
experimentSettings.Trainers.Clear();
experimentSettings.Trainers.Add(RegressionTrainer.FastTree);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried that, but it only throws this exception in 1 specific method and this wasn't a stable repro. Not stable enough to try and have a consistent test for it anyways.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. After discussing, even restricting to only FastTree will not make for a stable unit test.

If the timeout occurs in certain steps of FastTree, AggregateException is thrown; while at other steps, OperationCanceledException is thrown.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you force it with a test-specific Trainer, or set of trainers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that. The problem is that there is only 1 specific place that causes this issue when a timeout occurs, and I haven't found a way to get it to repro consistently without modifying the method to always throw there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking more along the lines of a test-only mock trainer. Ignore if we don't already have a framework / pattern for this, that'd be more of a future-looking test strategy piece of feedback.

!_experimentTimerExpired);
Expand Down