Replies: 2 comments 4 replies
-
Versions: |
Beta Was this translation helpful? Give feedback.
-
is the line you are looking for. A filesystem operation has failed with Queue replicas and channels are completely independent from each other, although channels can monitor queues. Channels do retry some operations with a delay when a queue does not have an elected leader (in the case of quorum queues and streams). So for non-transient data, use one of those two. For transient (specifically exclusive, since non-exclusive non-durable queues are going away later in the 4.x series), there already is #12949 which is not as trivial to address as it may sound. |
Beta Was this translation helpful? Give feedback.
-
Describe the bug
We are noticing some odd behavior when our queue processes crash. The reason for the crashing is hard to identify, we are assuming some windows security software is interacting poorly with Rabbit.
When the queues do crash, our consumers tied to those queues stop receiving messages. When the queues automatically restart the consumers still do not receive messages. The channels used by these consumers do not encounter a ChannelShutdown event, so our software is not detecting that anything has gone wrong with the rabbit process.
Snippet of Queue crash stack:
2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> ** Reason for termination == 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> ** {{badmatch,{error,eacces}}, 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> [{rabbit_classic_queue_index_v2,new_segment_file,3, 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> [{file,"rabbit_classic_queue_index_v2.erl"},{line,594}]}, 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> {rabbit_classic_queue_index_v2,publish,8, 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> [{file,"rabbit_classic_queue_index_v2.erl"},{line,583}]}, 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> {rabbit_variable_queue,maybe_write_index_to_disk,3, 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> [{file,"rabbit_variable_queue.erl"},{line,1841}]}, 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> {rabbit_variable_queue,publish_delivered1,5, 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> [{file,"rabbit_variable_queue.erl"},{line,1736}]}, 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> {rabbit_variable_queue,publish_delivered,4, 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> [{file,"rabbit_variable_queue.erl"},{line,542}]}, 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> {rabbit_priority_queue,publish_delivered,4, 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> [{file,"rabbit_priority_queue.erl"},{line,219}]}, 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> {rabbit_amqqueue_process,'-attempt_delivery/4-fun-0-',10, 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> [{file,"rabbit_amqqueue_process.erl"},{line,680}]}, 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> {rabbit_queue_consumers,deliver_to_consumer,4, 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> [{file,"rabbit_queue_consumers.erl"},{line,344}]}]} 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> crasher: 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> initial call: rabbit_amqqueue_process:init/1 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> pid: <0.1774.0> 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> registered_name: [] 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> exception exit: {{badmatch,{error,eacces}}, 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> [{rabbit_classic_queue_index_v2,new_segment_file,3, 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> [{file,"rabbit_classic_queue_index_v2.erl"}, 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> {line,594}]}, 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> {rabbit_classic_queue_index_v2,publish,8, 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> [{file,"rabbit_classic_queue_index_v2.erl"}, 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> {line,583}]}, 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> {rabbit_variable_queue,maybe_write_index_to_disk,3, 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> [{file,"rabbit_variable_queue.erl"},{line,1841}]}, 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> {rabbit_variable_queue,publish_delivered1,5, 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> [{file,"rabbit_variable_queue.erl"},{line,1736}]}, 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> {rabbit_variable_queue,publish_delivered,4, 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> [{file,"rabbit_variable_queue.erl"},{line,542}]}, 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> {rabbit_priority_queue,publish_delivered,4, 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> [{file,"rabbit_priority_queue.erl"},{line,219}]}, 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> {rabbit_amqqueue_process,'-attempt_delivery/4-fun-0-', 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> 10, 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> [{file,"rabbit_amqqueue_process.erl"},{line,680}]}, 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> {rabbit_queue_consumers,deliver_to_consumer,4, 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> [{file,"rabbit_queue_consumers.erl"},{line,344}]}]} 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> in function gen_server2:terminate/3 (gen_server2.erl, line 1172) 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> ancestors: [<0.1773.0>,<0.582.0>,<0.560.0>,<0.559.0>, 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> rabbit_vhost_sup_sup,rabbit_sup,<0.211.0>] 2025-01-22 12:01:04.367000-06:00 [error] <0.1774.0> message_queue_len: 186
Snippet of Queue Restart
2025-01-22 12:01:04.538000-06:00 [error] <0.3464.0> Restarting crashed queue 'queue_1' in vhost '/'. 2025-01-22 12:01:04.538000-06:00 [warning] <0.3464.0> Queue queue_1 in vhost / dropped 0/0/0 persistent messages and 0 transient messages after unclean shutdown
The below link seems to be a related issue, and the guidance was to check in on security software. We have done this to the best of our ability by disabling the security software that was exposed to use, but we don't fundamentally control the host, so its possible that other security software is interacting with RabbitMQ.
Link to seemingly related issue
Any guidance on:
Reproduction steps
1.Have a consumer with a channel that is bound to Queue A
2.Causes Queue A to have an unclean shutdown
3.Witness channel shutdown not be fired
4.Allow Queue A to restart
5.Witness consumer fail to process messages from Queue A until connection/channel are destroyed and recreated
Expected behavior
Additional context
Beta Was this translation helpful? Give feedback.
All reactions