-
Notifications
You must be signed in to change notification settings - Fork 621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
api: Use TaskStatus Err field for non-terminal errors #2287
Conversation
There are some cases when a task can't advance from a particular state because preconditions are not met. For example, if no nodes meet its constraints, it will not advance to "assigned". Currently, we put a note about this in the Message field of TaskStatus, but this is not surfaced to the user. It wouldn't make sense to expose Message prominently because it usually contains uninteresting notes about how the task reached that state. Its presence does not indicate that something is wrong. Expand the scope of Err to also cover non-terminal errors that are blocking the task from progressing, and use it in those cases. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
3d878f6
to
fc6ef74
Compare
Codecov Report
@@ Coverage Diff @@
## master #2287 +/- ##
=========================================
+ Coverage 61.06% 61.1% +0.03%
=========================================
Files 128 128
Lines 20532 20538 +6
=========================================
+ Hits 12538 12549 +11
+ Misses 6611 6602 -9
- Partials 1383 1387 +4 |
@aaronlehmann Why don't consumers just print the message field correctly? |
It is unclear when the |
The use case that is being described here is exactly what the message field is designed for:
Do we need something to indicate that a task is stalled? |
In this case, tasks are stalled indefinitely - for instance not all constraints were met or we reached network exhaustion. The operator needs to have this information surfaced somehow since state convergence cannot go forward until action is taken. Currently, Long story short, this needs to be blinking red in some ops' dashboard. |
@aluzzardi I get what you are trying to do, but I am asking if there is a better way to do it. Overloading the If a task is stalled, do we have an indicator of that state? For example, a boolean field, indicating that the task is stalled and to check the message field would make sense here. |
What is the status of this PR? Are we planning on moving this forward? It would be very helpful for users to be able to see when their tasks are in a state where the user must take action for them to be scheduled (e.g. network address exhaustion). |
LGTM Looking at this again, I think we can go forward with it. My concerns are little abstract, but I think that this will work. @aluzzardi How should we move this forward? It looks like it will merge cleanly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -159,7 +159,8 @@ loop: | |||
// restarting the task on another node | |||
// (if applicable). | |||
t.Status.State = api.TaskStateRejected | |||
t.Status.Message = "assigned node no longer meets constraints" | |||
t.Status.Message = "task rejected by constraint enforcer" | |||
t.Status.Err = "assigned node no longer meets constraints" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel the Message/Err should be swapped here. The error on the task should be related to the task, so "task rejected by constraint enforcer" seems more relevant.
Also, does it make sense to be more specific around what constraints were failed ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that we don't display Err
on the command line interface. Message
is used to provided better reporting to the user.
Looks good, one question/comment, which can probably be addressed later if we need to merge it right away @nishanttotla |
There are some cases when a task can't advance from a particular state because preconditions are not met. For example, if no nodes meet its constraints, it will not advance to "assigned".
Currently, we put a note about this in the
Message
field ofTaskStatus
, but this is not surfaced to the user. It wouldn't make sense to exposeMessage
prominently because it usually contains uninteresting notes about how the task reached that state. Its presence does not indicate that something is wrong.Expand the scope of
Err
to also cover non-terminal errors that are blocking the task from progressing, and use it in those cases.Ideally we would also use this to surface IP allocation failures that block a task in "new", but the current design of the allocator would loop if we updated a task on failure. This might be something for a followup.
cc @aluzzardi @stevvooe