What node failure scenario can lead to lost bindings? #12783
-
Describe the bugWhen the physical machine of one of the nodes fails and cannot be started, the RabbitMQ server loses the binding relationships. This binding in RabbitMQ is a broadcast binding, and the loss only affects part of the bindings of one broadcast. Reproduction stepsI have tried actions such as shutting down nodes, but I couldn't reproduce the issue. Of course, since I can't simulate a physical machine crash, I can only attempt to use these experiments to try to reproduce the problem. Expected behaviorHas anyone experienced a similar issue? How can I resolve this problem? Additional contextMy queue metadata contains older version data that was imported from a historical version. I'm not sure if this could be causing the issue. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
@wangliguang111 yopu haven't provided any evidence of a bug or reproduction details. Not even what version of RabbitMQ is used. We do not guess in this community. There is one rare to hit and very difficult to reproduce case where a combination of
RabbitMQ cannot do much about this race condition: clients are connected to different nodes and This problem should be largely or completely addressed in 4.0 with Khepri because its data model is completely different and bindings are not deleted in batches. Another approach is to use a stable durable topology where even mass client disconnect won't Transient queues bound to durable exchanges is a combination that calls for the aforementioned distributed race conditions between binding deletion and recovery when clients lose connections, nodes fail or stopped, and so on. |
Beta Was this translation helpful? Give feedback.
-
I have added a note to the 4.0 release notes that explain the fundamental problem, how Khepri fundamentally addresses it, and what those who are still on Mnesia or 3.x versions can do to work around it. |
Beta Was this translation helpful? Give feedback.
I have added a note to the 4.0 release notes that explain the fundamental problem, how Khepri fundamentally addresses it, and what those who are still on Mnesia or 3.x versions can do to work around it.