Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unload taskQueueManager by instance, not queue ID #1917

Merged
merged 1 commit into from
Sep 13, 2021
Merged

Unload taskQueueManager by instance, not queue ID #1917

merged 1 commit into from
Sep 13, 2021

Conversation

mmcshane
Copy link
Contributor

What changed?
Unload taskQueueManager by instance, not queue ID

Why?
Matching engine can incorrectly unload a taskQueueManager instance
because the fatal signals from a previous instance are still being
delivered and reference the same queue ID. To address we unload by
taskQueueManager object instance rather than simply by queue ID.

How did you test it?
New unit test

Potential risks
Minimal - unload occurs in strictly fewer cases

Is hotfix candidate?
no

@mmcshane mmcshane requested a review from a team September 13, 2021 21:57
@@ -201,7 +201,7 @@ func (w *taskWriter) appendTasks(
return resp, nil

case *persistence.ConditionFailedError:
w.tlMgr.signalFatalProblem(w.tlMgr.taskQueueID)
w.tlMgr.signalFatalProblem(w.tlMgr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, if we were just call w.tlMgr.Unload() here we won't have to deal with this bug.

Copy link
Contributor Author

@mmcshane mmcshane Sep 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean - taskQueueManager doesn't have an Unload?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tlMgr.Stop maybe? that would work because it dedups on a per-taskQueueManager instance basis.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meat tlm.Stop(), but yes we still need to remove it from the collection of matching engine, which is keyed on taskQueueID, and dedupe would still needed. So, maybe it does not matter.

Matching engine can incorrectly unload a taskQueueManager instance
because the fatal signals from a previous instance are still being
delivered and reference the same queue ID. To address we unload by
taskQueueManager object instance rather than simply by queue ID.
@mmcshane mmcshane enabled auto-merge (squash) September 13, 2021 22:05
@mmcshane mmcshane merged commit 96de6fa into temporalio:master Sep 13, 2021
yiminc pushed a commit that referenced this pull request Sep 13, 2021
Matching engine can incorrectly unload a taskQueueManager instance
because the fatal signals from a previous instance are still being
delivered and reference the same queue ID. To address we unload by
taskQueueManager object instance rather than simply by queue ID.
@mmcshane mmcshane deleted the mpm/tqm-storm2 branch September 13, 2021 23:10
@alexshtin
Copy link
Member

Russian Revolution PR: https://en.wikipedia.org/wiki/Russian_Revolution

@mmcshane
Copy link
Contributor Author

☮️ 🏞️ 🍞

alexshtin pushed a commit that referenced this pull request Sep 16, 2021
Matching engine can incorrectly unload a taskQueueManager instance
because the fatal signals from a previous instance are still being
delivered and reference the same queue ID. To address we unload by
taskQueueManager object instance rather than simply by queue ID.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants