-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Temporal client method to restart failed workflow #15051
Temporal client method to restart failed workflow #15051
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not personally too worried about the static methods, but I am worried about a very large set of failing jobs that need to be restarted.
@@ -446,6 +451,79 @@ public ManualOperationResult synchronousResetConnection(final UUID connectionId, | |||
Optional.of(resetJobId), Optional.empty()); | |||
} | |||
|
|||
public void restartWorkflowByStatus(final WorkflowExecutionStatus executionStatus) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 on executionStatus
being an argument
UUID.fromString(stringUUID)); | ||
} | ||
|
||
Set<String> fetchWorkflowByStatus(final WorkflowExecutionStatus executionStatus) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set<String> fetchWorkflowByStatus(final WorkflowExecutionStatus executionStatus) { | |
Set<String> fetchWorkflowsByStatus(final WorkflowExecutionStatus executionStatus) { |
This is a plural method, that returns a set
final Set<String> workflowExecutionInfos = fetchWorkflowByStatus(executionStatus); | ||
|
||
final Set<String> nonRunningWorkflow = filterOutRunningWorkspaceId(workflowExecutionInfos); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How large can these sets be? Are we worried about running out of memory? Should we be iterating in batches?
Maybe the method signature could be fetchWorkflowByStatus(executionStatus, limit=100)
and we keep calling it until we get 0 results. In the method below, you are already paging though results with nextPagetoken
- perhaps we don't need to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The limiting number here is number of connection ID. I don't expect it to be a concern in the short term.
In order to limit the memory consumption, I will change this to use UUID instead of String.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still concerned about this method returning quite a lot of data once we have 1000s of workspaces and connections, but I agree for now this is probably OK.
+ "unreachable state before starting a new workflow for this connection"); | ||
ConnectionManagerUtils.startConnectionManagerNoSignal(client, maybeConnectionId.get()); | ||
} else { | ||
log.error("Malformated workflow id, won't terminate of restart"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gramatical Nit:
log.error("Malformated workflow id, won't terminate of restart"); | |
log.error("Malformated workflow id, won't terminate or restart"); |
System.err.println(workflowId); | ||
System.err.println(StringUtils.removeStart(workflowId, "connection_manager_")); | ||
System.err.flush(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use log()
elsewhere... can the logger not accept a log-level to print to stdout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the miss-understanding. This is a draft and is not meant to be merge, so it contains debug print. This will be clean up. The main thing to look at is the test and the static way the mock are done.
…poral-client-restart-by-status
…poral-client-restart-by-status
…poral-client-restart-by-status
…poral-client-restart-by-status
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Looks like you sorted out the instance/static method problem too!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When trying to understand the scope of the ConnectionManagerUtils
, it feels like we should look a bit closer at the scope of the ConnectorManagerWorkflow
and the WorkflowClient
. It is not super clear what's the responsibility of each object, this will be helpful in order to get a cleaner dependency injection.
…poral-client-restart-by-status
What
Described in #14970
Add a method in
TemporalClient
that will allow to restart all the failed temporal workflow.It also make
ConnectionManagerUtils
to be injectable (i.e. remove the statics).