Rebalancing fails partly when one of two clients with multiple single active consumers (with the same name) on a superstream crashes #13372

bvrnAXI · 2025-02-19T16:06:20Z

bvrnAXI
Feb 19, 2025

Community Support Policy

I have read RabbitMQ's Community Support Policy
I run RabbitMQ 4.x, the only series currently covered by community support
I promise to provide all relevant information (versions, logs from all nodes, rabbitmq-diagnostics output, detailed reproduction steps)

RabbitMQ version used

4.0.5

Erlang version used

27.2.x

Operating system (distribution) used

Linux version 5.15.153.1-microsoft-standard-WSL2

How is RabbitMQ deployed?

Community Docker image

rabbitmq-diagnostics status output

See https://www.rabbitmq.com/docs/cli to learn how to use rabbitmq-diagnostics

Status of node rabbit@acm_rabbit_node_1 ...
[]
Runtime

OS PID: 20
OS: Linux
Uptime (seconds): 1053
Is under maintenance?: false
RabbitMQ version: 4.0.6
RabbitMQ release series support status: see https://www.rabbitmq.com/release-information
Node name: rabbit@acm_rabbit_node_1
Erlang configuration: Erlang/OTP 27 [erts-15.2.2] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [jit:ns]
Crypto library: OpenSSL 3.3.3 11 Feb 2025
Erlang processes: 766 used, 1048576 limit
Scheduler run queue: 1
Cluster heartbeat timeout (net_ticktime): 60

Plugins

Enabled plugin file: /etc/rabbitmq/enabled_plugins
Enabled plugins:

 * rabbitmq_stream_management
 * rabbitmq_stream
 * rabbitmq_stream_common
 * rabbitmq_management
 * rabbitmq_management_agent
 * rabbitmq_web_dispatch
 * amqp_client
 * cowboy
 * cowlib
 * oauth2_client
 * jose

Data directory

Node data directory: /var/lib/rabbitmq/mnesia/rabbit@acm_rabbit_node_1
Raft data directory: /var/lib/rabbitmq/mnesia/rabbit@acm_rabbit_node_1/quorum/rabbit@acm_rabbit_node_1

Config files

 * /etc/rabbitmq/rabbitmq.conf
 * /etc/rabbitmq/conf.d/10-defaults.conf

Log file(s)

 * /var/log/rabbitmq/rabbit@acm_rabbit_node_1.log
 * <stdout>

Alarms

(none)

Tags

(none)

Memory

Total memory used: 0.1534 gb
Calculation strategy: rss
Memory high watermark setting: 0.6 of available memory, computed to: 9.9626 gb

reserved_unallocated: 0.0616 gb (40.14 %)
code: 0.0229 gb (14.94 %)
other_system: 0.022 gb (14.35 %)
other_proc: 0.0159 gb (10.37 %)
allocated_unused: 0.0155 gb (10.11 %)
plugins: 0.0033 gb (2.15 %)
binary: 0.0028 gb (1.81 %)
metrics: 0.0018 gb (1.19 %)
connection_other: 0.0017 gb (1.13 %)
other_ets: 0.0017 gb (1.1 %)
mgmt_db: 0.0012 gb (0.77 %)
atom: 0.0011 gb (0.73 %)
queue_procs: 0.0004 gb (0.26 %)
connection_readers: 0.0003 gb (0.22 %)
msg_index: 0.0003 gb (0.19 %)
connection_channels: 0.0002 gb (0.14 %)
stream_queue_coordinator_procs: 0.0002 gb (0.12 %)
mnesia: 0.0002 gb (0.1 %)
metadata_store: 0.0001 gb (0.06 %)
connection_writers: 0.0001 gb (0.05 %)
stream_queue_procs: 0.0001 gb (0.04 %)
quorum_ets: 0.0 gb (0.02 %)
metadata_store_ets: 0.0 gb (0.01 %)
quorum_queue_procs: 0.0 gb (0.0 %)
quorum_queue_dlx_procs: 0.0 gb (0.0 %)
stream_queue_replica_reader_procs: 0.0 gb (0.0 %)

File Descriptors

Total: 0, limit: 1048479

Free Disk Space

Low free disk space watermark: 0.05 gb
Free disk space: 231.2531 gb

Totals

Connection count: 9
Queue count: 42
Virtual host count: 1

Listeners

Interface: [::], port: 15672, protocol: http, purpose: HTTP API
Interface: [::], port: 5552, protocol: stream, purpose: stream
Interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0

Logs from node 1 (with sensitive values edited out)

See https://www.rabbitmq.com/docs/logging to learn how to collect logs

glrabbitmq-1  | 2025-02-19 15:17:57.975116+00:00 [warning] <0.1414.0> Socket #Port<0.70> closed [<0.1414.0>]
glrabbitmq-1  | 2025-02-19 15:17:57.975206+00:00 [warning] <0.1355.0> closing AMQP connection <0.1355.0> (192.168.203.4:46016 -> 192.168.203.7:5672 - SpringAMQP#1f071d3a:0, vhost: '/', user: 'guest', duration: '16M, 26s'):
glrabbitmq-1  | 2025-02-19 15:17:57.975206+00:00 [warning] <0.1355.0> client unexpectedly closed TCP connection
glrabbitmq-1  | 2025-02-19 15:17:57.975261+00:00 [debug] <0.1414.0> rabbit_stream_reader terminating in state 'open' with reason 'normal'
glrabbitmq-1  | 2025-02-19 15:17:57.975436+00:00 [warning] <0.1255.0> closing AMQP connection <0.1255.0> (192.168.203.4:48634 -> 192.168.203.7:5672 - SpringAMQP#5d9c3ca4:0, vhost: '/', user: 'guest', duration: '16M, 29s'):
glrabbitmq-1  | 2025-02-19 15:17:57.975436+00:00 [warning] <0.1255.0> client unexpectedly closed TCP connection
glrabbitmq-1  | 2025-02-19 15:17:57.975259+00:00 [warning] <0.1286.0> closing AMQP connection <0.1286.0> (192.168.203.4:48640 -> 192.168.203.7:5672 - SpringAMQP#7a074202:0, vhost: '/', user: 'guest', duration: '16M, 29s'):
glrabbitmq-1  | 2025-02-19 15:17:57.975259+00:00 [warning] <0.1286.0> client unexpectedly closed TCP connection
glrabbitmq-1  | 2025-02-19 15:17:57.975292+00:00 [warning] <0.1279.0> Socket #Port<0.56> closed [<0.1279.0>]
glrabbitmq-1  | 2025-02-19 15:17:57.975726+00:00 [debug] <0.1268.0> Deleting exclusive queue 'AdminQueue-localhost2' in vhost '/' because its declaring connection <0.1255.0> was closed
glrabbitmq-1  | 2025-02-19 15:17:57.975724+00:00 [debug] <0.1279.0> rabbit_stream_reader terminating in state 'open' with reason 'normal'
glrabbitmq-1  | 2025-02-19 15:17:57.976054+00:00 [debug] <0.6250.0> Closing 1 channel(s) because connection '192.168.203.4:48634 -> 192.168.203.7:5672' has been closed
glrabbitmq-1  | 2025-02-19 15:17:57.976179+00:00 [debug] <0.6254.0> Closing 1 channel(s) because connection '192.168.203.4:46016 -> 192.168.203.7:5672' has been closed
glrabbitmq-1  | 2025-02-19 15:17:57.976191+00:00 [debug] <0.6256.0> Closing 1 channel(s) because connection '192.168.203.4:48640 -> 192.168.203.7:5672' has been closed
glrabbitmq-1  | 2025-02-19 15:17:57.976311+00:00 [debug] <0.1368.0> Deleting exclusive queue 'RealtimeQueue-localhost2' in vhost '/' because its declaring connection <0.1355.0> was closed
glrabbitmq-1  | 2025-02-19 15:17:57.975121+00:00 [warning] <0.1420.0> Socket #Port<0.72> closed [<0.1420.0>]
glrabbitmq-1  | 2025-02-19 15:17:57.987060+00:00 [debug] <0.1432.0> Subscription 0 on <<"CmmnEvents-2">> instructed to become active: false
glrabbitmq-1  | 2025-02-19 15:17:57.987226+00:00 [debug] <0.1432.0> Closing Osiris segment of subscription 0 for now
glrabbitmq-1  | 2025-02-19 15:17:57.987362+00:00 [debug] <0.1432.0> Registering RPC request #{active => false,stream => <<"CmmnEvents-2">>,
glrabbitmq-1  | 2025-02-19 15:17:57.987362+00:00 [debug] <0.1432.0>                           consumer_name => <<"CmmnEventConsumer">>,
glrabbitmq-1  | 2025-02-19 15:17:57.987362+00:00 [debug] <0.1432.0>                           stepping_down => true,subscription_id => 0} with correlation ID 6
glrabbitmq-1  | 2025-02-19 15:17:57.989192+00:00 [debug] <0.1432.0> Received consumer update response for subscription 0 on stream <<"CmmnEvents-2">>, correlation ID 6
glrabbitmq-1  | 2025-02-19 15:17:57.989517+00:00 [debug] <0.1432.0> Not an active consumer
glrabbitmq-1  | 2025-02-19 15:17:57.989559+00:00 [debug] <0.1432.0> Subscription 0 on stream <<"CmmnEvents-2">>, group <<"CmmnEventConsumer">> has stepped down, activating consumer
glrabbitmq-1  | 2025-02-19 15:17:57.994397+00:00 [debug] <0.1432.0> Subscription 8 on <<"CmmnEvents-2">> instructed to become active: true
glrabbitmq-1  | 2025-02-19 15:17:57.994502+00:00 [debug] <0.1432.0> Registering RPC request #{active => true,stream => <<"CmmnEvents-2">>,
glrabbitmq-1  | 2025-02-19 15:17:57.994502+00:00 [debug] <0.1432.0>                           consumer_name => <<"CmmnEventConsumer">>,
glrabbitmq-1  | 2025-02-19 15:17:57.994502+00:00 [debug] <0.1432.0>                           subscription_id => 8} with correlation ID 7
glrabbitmq-1  | 2025-02-19 15:17:57.996696+00:00 [debug] <0.1432.0> Received consumer update response for subscription 8 on stream <<"CmmnEvents-2">>, correlation ID 7
glrabbitmq-1  | 2025-02-19 15:17:57.996856+00:00 [debug] <0.1432.0> Initializing reader for active consumer (subscription 8, stream <<"CmmnEvents-2">>), offset spec is 88
glrabbitmq-1  | 2025-02-19 15:17:57.997008+00:00 [debug] <0.1432.0> osiris: initialising reader. Spec: 88
glrabbitmq-1  | 2025-02-19 15:17:58.007681+00:00 [debug] <0.1432.0> __CmmnEvents-2_1739888727630723581 [osiris_log:init_offset_reader0/2]  spec 88 chunk_id range {0,88} Num index files 1
glrabbitmq-1  | 2025-02-19 15:17:58.032530+00:00 [debug] <0.1432.0> __CmmnEvents-2_1739888727630723581 [osiris_log:offset_idx_scan/3]  completed in 0.357000ms
glrabbitmq-1  | 2025-02-19 15:17:58.032690+00:00 [debug] <0.1432.0> __CmmnEvents-2_1739888727630723581 [osiris_log:init_offset_reader0/2] resolved chunk_id 88 at file pos: 41820
glrabbitmq-1  | 2025-02-19 15:17:58.032938+00:00 [debug] <0.1432.0> Next offset for subscription 8 is 88
glrabbitmq-1  | 2025-02-19 15:17:58.032968+00:00 [debug] <0.1432.0> Dispatching to subscription 8 (stream <<"CmmnEvents-2">>), credit(s) 10, send limit 5
glrabbitmq-1  | 2025-02-19 15:17:58.035423+00:00 [debug] <0.1432.0> Subscription 8 (stream <<"CmmnEvents-2">>) is now at offset 89 with 0 message(s) distributed after subscription
glrabbitmq-1  | 2025-02-19 15:17:58.035521+00:00 [debug] <0.1432.0> Subscription 2 on <<"CmmnEvents-4">> instructed to become active: true
glrabbitmq-1  | 2025-02-19 15:17:58.035569+00:00 [debug] <0.1432.0> Registering RPC request #{active => true,stream => <<"CmmnEvents-4">>,
glrabbitmq-1  | 2025-02-19 15:17:58.035569+00:00 [debug] <0.1432.0>                           consumer_name => <<"CmmnEventConsumer">>,
glrabbitmq-1  | 2025-02-19 15:17:58.035569+00:00 [debug] <0.1432.0>                           subscription_id => 2} with correlation ID 8
glrabbitmq-1  | 2025-02-19 15:17:58.037848+00:00 [debug] <0.1432.0> Received consumer update response for subscription 2 on stream <<"CmmnEvents-4">>, correlation ID 8
glrabbitmq-1  | 2025-02-19 15:17:58.037953+00:00 [debug] <0.1432.0> Initializing reader for active consumer (subscription 2, stream <<"CmmnEvents-4">>), offset spec is 83
glrabbitmq-1  | 2025-02-19 15:17:58.038040+00:00 [debug] <0.1432.0> osiris: initialising reader. Spec: 83
glrabbitmq-1  | 2025-02-19 15:17:58.046243+00:00 [debug] <0.1432.0> __CmmnEvents-4_1739888727659334327 [osiris_log:init_offset_reader0/2]  spec 83 chunk_id range {0,83} Num index files 1
glrabbitmq-1  | 2025-02-19 15:17:58.055272+00:00 [debug] <0.1432.0> __CmmnEvents-4_1739888727659334327 [osiris_log:offset_idx_scan/3]  completed in 0.466000ms
glrabbitmq-1  | 2025-02-19 15:17:58.055421+00:00 [debug] <0.1432.0> __CmmnEvents-4_1739888727659334327 [osiris_log:init_offset_reader0/2] resolved chunk_id 83 at file pos: 39579
glrabbitmq-1  | 2025-02-19 15:17:58.055609+00:00 [debug] <0.1432.0> Next offset for subscription 2 is 83
glrabbitmq-1  | 2025-02-19 15:17:58.055683+00:00 [debug] <0.1432.0> Dispatching to subscription 2 (stream <<"CmmnEvents-4">>), credit(s) 10, send limit 5
glrabbitmq-1  | 2025-02-19 15:17:58.057358+00:00 [debug] <0.1432.0> Subscription 2 (stream <<"CmmnEvents-4">>) is now at offset 84 with 0 message(s) distributed after subscription
glrabbitmq-1  | 2025-02-19 15:17:58.057753+00:00 [debug] <0.1432.0> Subscription 11 on <<"CmmnEvents-3">> instructed to become active: false
glrabbitmq-1  | 2025-02-19 15:17:58.057795+00:00 [debug] <0.1432.0> Closing Osiris segment of subscription 11 for now
glrabbitmq-1  | 2025-02-19 15:17:58.057891+00:00 [debug] <0.1432.0> Registering RPC request #{active => false,stream => <<"CmmnEvents-3">>,
glrabbitmq-1  | 2025-02-19 15:17:58.057891+00:00 [debug] <0.1432.0>                           consumer_name => <<"CmmnEventConsumer">>,
glrabbitmq-1  | 2025-02-19 15:17:58.057891+00:00 [debug] <0.1432.0>                           stepping_down => true,subscription_id => 11} with correlation ID 9
glrabbitmq-1  | 2025-02-19 15:17:58.058287+00:00 [debug] <0.1432.0> Subscription 4 on <<"CmmnEvents-6">> instructed to become active: false
glrabbitmq-1  | 2025-02-19 15:17:58.058365+00:00 [debug] <0.1432.0> Closing Osiris segment of subscription 4 for now
glrabbitmq-1  | 2025-02-19 15:17:58.058666+00:00 [debug] <0.1432.0> Registering RPC request #{active => false,stream => <<"CmmnEvents-6">>,
glrabbitmq-1  | 2025-02-19 15:17:58.058666+00:00 [debug] <0.1432.0>                           consumer_name => <<"CmmnEventConsumer">>,
glrabbitmq-1  | 2025-02-19 15:17:58.058666+00:00 [debug] <0.1432.0>                           stepping_down => true,subscription_id => 4} with correlation ID 10
glrabbitmq-1  | 2025-02-19 15:17:58.059922+00:00 [debug] <0.1432.0> Subscription 14 on <<"CmmnEvents-7">> instructed to become active: false
glrabbitmq-1  | 2025-02-19 15:17:58.060267+00:00 [debug] <0.1432.0> Closing Osiris segment of subscription 14 for now
glrabbitmq-1  | 2025-02-19 15:17:58.060885+00:00 [debug] <0.1432.0> Registering RPC request #{active => false,stream => <<"CmmnEvents-7">>,
glrabbitmq-1  | 2025-02-19 15:17:58.060885+00:00 [debug] <0.1432.0>                           consumer_name => <<"CmmnEventConsumer">>,
glrabbitmq-1  | 2025-02-19 15:17:58.060885+00:00 [debug] <0.1432.0>                           stepping_down => true,subscription_id => 14} with correlation ID 11
glrabbitmq-1  | 2025-02-19 15:17:58.061602+00:00 [debug] <0.1432.0> Received consumer update response for subscription 11 on stream <<"CmmnEvents-3">>, correlation ID 9
glrabbitmq-1  | 2025-02-19 15:17:58.061778+00:00 [debug] <0.1432.0> Not an active consumer
glrabbitmq-1  | 2025-02-19 15:17:58.061847+00:00 [debug] <0.1432.0> Subscription 11 on stream <<"CmmnEvents-3">>, group <<"CmmnEventConsumer">> has stepped down, activating consumer
glrabbitmq-1  | 2025-02-19 15:17:58.068744+00:00 [debug] <0.1432.0> Received consumer update response for subscription 4 on stream <<"CmmnEvents-6">>, correlation ID 10
glrabbitmq-1  | 2025-02-19 15:17:58.068855+00:00 [debug] <0.1432.0> Not an active consumer
glrabbitmq-1  | 2025-02-19 15:17:58.068897+00:00 [debug] <0.1432.0> Subscription 4 on stream <<"CmmnEvents-6">>, group <<"CmmnEventConsumer">> has stepped down, activating consumer
glrabbitmq-1  | 2025-02-19 15:17:58.072061+00:00 [debug] <0.1432.0> Received consumer update response for subscription 14 on stream <<"CmmnEvents-7">>, correlation ID 11
glrabbitmq-1  | 2025-02-19 15:17:58.072273+00:00 [debug] <0.1432.0> Not an active consumer
glrabbitmq-1  | 2025-02-19 15:17:58.072317+00:00 [debug] <0.1432.0> Subscription 14 on stream <<"CmmnEvents-7">>, group <<"CmmnEventConsumer">> has stepped down, activating consumer
glrabbitmq-1  | 2025-02-19 15:17:58.075272+00:00 [debug] <0.1432.0> Subscription 8 on <<"CmmnEvents-2">> instructed to become active: false
glrabbitmq-1  | 2025-02-19 15:17:58.075372+00:00 [debug] <0.1432.0> Closing Osiris segment of subscription 8 for now
glrabbitmq-1  | 2025-02-19 15:17:58.075552+00:00 [debug] <0.1432.0> Registering RPC request #{active => false,stream => <<"CmmnEvents-2">>,
glrabbitmq-1  | 2025-02-19 15:17:58.075552+00:00 [debug] <0.1432.0>                           consumer_name => <<"CmmnEventConsumer">>,
glrabbitmq-1  | 2025-02-19 15:17:58.075552+00:00 [debug] <0.1432.0>                           stepping_down => true,subscription_id => 8} with correlation ID 12
glrabbitmq-1  | 2025-02-19 15:17:58.075871+00:00 [debug] <0.1432.0> Subscription 6 on <<"CmmnEvents-7">> instructed to become active: true
glrabbitmq-1  | 2025-02-19 15:17:58.075923+00:00 [debug] <0.1432.0> Registering RPC request #{active => true,stream => <<"CmmnEvents-7">>,
glrabbitmq-1  | 2025-02-19 15:17:58.075923+00:00 [debug] <0.1432.0>                           consumer_name => <<"CmmnEventConsumer">>,
glrabbitmq-1  | 2025-02-19 15:17:58.075923+00:00 [debug] <0.1432.0>                           subscription_id => 6} with correlation ID 13
glrabbitmq-1  | 2025-02-19 15:17:58.077395+00:00 [debug] <0.1432.0> Received consumer update response for subscription 8 on stream <<"CmmnEvents-2">>, correlation ID 12
glrabbitmq-1  | 2025-02-19 15:17:58.077637+00:00 [debug] <0.1432.0> Not an active consumer
glrabbitmq-1  | 2025-02-19 15:17:58.077684+00:00 [debug] <0.1432.0> Subscription 8 on stream <<"-2">>, group <<"CmmnEventConsumer">> has stepped down, activating consumer
glrabbitmq-1  | 2025-02-19 15:17:58.082424+00:00 [debug] <0.1432.0> Subscription 11 on <<"CmmnEvents-3">> instructed to become active: true
glrabbitmq-1  | 2025-02-19 15:17:58.082548+00:00 [debug] <0.1432.0> Registering RPC request #{active => true,stream => <<"CmmnEvents-3">>,
glrabbitmq-1  | 2025-02-19 15:17:58.082548+00:00 [debug] <0.1432.0>                           consumer_name => <<"CmmnEventConsumer">>,
glrabbitmq-1  | 2025-02-19 15:17:58.082548+00:00 [debug] <0.1432.0>                           subscription_id => 11} with correlation ID 14
glrabbitmq-1  | 2025-02-19 15:17:58.082812+00:00 [debug] <0.1432.0> Subscription 0 on <<"CmmnEvents-2">> instructed to become active: true
glrabbitmq-1  | 2025-02-19 15:17:58.082847+00:00 [debug] <0.1432.0> Registering RPC request #{active => true,stream => <<"CmmnEvents-2">>,
glrabbitmq-1  | 2025-02-19 15:17:58.082847+00:00 [debug] <0.1432.0>                           consumer_name => <<"CmmnEventConsumer">>,
glrabbitmq-1  | 2025-02-19 15:17:58.082847+00:00 [debug] <0.1432.0>                           subscription_id => 0} with correlation ID 15
glrabbitmq-1  | 2025-02-19 15:17:58.082956+00:00 [debug] <0.1432.0> Received consumer update response for subscription 6 on stream <<"CmmnEvents-7">>, correlation ID 13
glrabbitmq-1  | 2025-02-19 15:17:58.083141+00:00 [debug] <0.1432.0> Initializing reader for active consumer (subscription 6, stream <<"CmmnEvents-7">>), offset spec is 68
glrabbitmq-1  | 2025-02-19 15:17:58.083196+00:00 [debug] <0.1432.0> osiris: initialising reader. Spec: 68
glrabbitmq-1  | 2025-02-19 15:17:58.084778+00:00 [debug] <0.1432.0> __CmmnEvents-7_1739888727686754974 [osiris_log:init_offset_reader0/2]  spec 68 chunk_id range {0,68} Num index files 1
glrabbitmq-1  | 2025-02-19 15:17:58.088269+00:00 [debug] <0.1432.0> __CmmnEvents-7_1739888727686754974 [osiris_log:offset_idx_scan/3]  completed in 0.415000ms
glrabbitmq-1  | 2025-02-19 15:17:58.088358+00:00 [debug] <0.1432.0> __CmmnEvents-7_1739888727686754974 [osiris_log:init_offset_reader0/2] resolved chunk_id 68 at file pos: 32328
glrabbitmq-1  | 2025-02-19 15:17:58.088539+00:00 [debug] <0.1432.0> Next offset for subscription 6 is 68
glrabbitmq-1  | 2025-02-19 15:17:58.088580+00:00 [debug] <0.1432.0> Dispatching to subscription 6 (stream <<"CmmnEvents-7">>), credit(s) 10, send limit 5
glrabbitmq-1  | 2025-02-19 15:17:58.090186+00:00 [debug] <0.1432.0> Subscription 6 (stream <<"CmmnEvents-7">>) is now at offset 69 with 0 message(s) distributed after subscription
glrabbitmq-1  | 2025-02-19 15:17:58.090300+00:00 [debug] <0.1432.0> Subscription 4 on <<"CmmnEvents-6">> instructed to become active: true
glrabbitmq-1  | 2025-02-19 15:17:58.090405+00:00 [debug] <0.1432.0> Registering RPC request #{active => true,stream => <<"CmmnEvents-6">>,
glrabbitmq-1  | 2025-02-19 15:17:58.090405+00:00 [debug] <0.1432.0>                           consumer_name => <<"CmmnEventConsumer">>,
glrabbitmq-1  | 2025-02-19 15:17:58.090405+00:00 [debug] <0.1432.0>                           subscription_id => 4} with correlation ID 16
glrabbitmq-1  | 2025-02-19 15:17:58.090520+00:00 [debug] <0.1432.0> Subscription 6 on <<"CmmnEvents-7">> instructed to become active: false
glrabbitmq-1  | 2025-02-19 15:17:58.090555+00:00 [debug] <0.1432.0> Closing Osiris segment of subscription 6 for now
glrabbitmq-1  | 2025-02-19 15:17:58.090716+00:00 [debug] <0.1420.0> rabbit_stream_reader terminating in state 'open' with reason 'normal'
glrabbitmq-1  | 2025-02-19 15:17:58.090842+00:00 [debug] <0.1432.0> Registering RPC request #{active => false,stream => <<"CmmnEvents-7">>,
glrabbitmq-1  | 2025-02-19 15:17:58.090842+00:00 [debug] <0.1432.0>                           consumer_name => <<"CmmnEventConsumer">>,
glrabbitmq-1  | 2025-02-19 15:17:58.090842+00:00 [debug] <0.1432.0>                           stepping_down => true,subscription_id => 6} with correlation ID 17
glrabbitmq-1  | 2025-02-19 15:17:58.090984+00:00 [debug] <0.1432.0> Received consumer update response for subscription 11 on stream <<"CmmnEvents-3">>, correlation ID 14
glrabbitmq-1  | 2025-02-19 15:17:58.091051+00:00 [debug] <0.1432.0> Initializing reader for active consumer (subscription 11, stream <<"CmmnEvents-3">>), offset spec is 84
glrabbitmq-1  | 2025-02-19 15:17:58.091103+00:00 [debug] <0.1432.0> osiris: initialising reader. Spec: 84
glrabbitmq-1  | 2025-02-19 15:17:58.092393+00:00 [debug] <0.1432.0> __CmmnEvents-3_1739888727649710392 [osiris_log:init_offset_reader0/2]  spec 84 chunk_id range {0,84} Num index files 1
glrabbitmq-1  | 2025-02-19 15:17:58.094072+00:00 [debug] <0.1432.0> __CmmnEvents-3_1739888727649710392 [osiris_log:offset_idx_scan/3]  completed in 0.162000ms
glrabbitmq-1  | 2025-02-19 15:17:58.094155+00:00 [debug] <0.1432.0> __CmmnEvents-3_1739888727649710392 [osiris_log:init_offset_reader0/2] resolved chunk_id 84 at file pos: 39466
glrabbitmq-1  | 2025-02-19 15:17:58.094391+00:00 [debug] <0.1432.0> Next offset for subscription 11 is 84
glrabbitmq-1  | 2025-02-19 15:17:58.094470+00:00 [debug] <0.1432.0> Dispatching to subscription 11 (stream <<"CmmnEvents-3">>), credit(s) 10, send limit 5
glrabbitmq-1  | 2025-02-19 15:17:58.096327+00:00 [debug] <0.1432.0> Subscription 11 (stream <<"CmmnEvents-3">>) is now at offset 85 with 0 message(s) distributed after subscription
glrabbitmq-1  | 2025-02-19 15:17:58.096422+00:00 [debug] <0.1432.0> Received consumer update response for subscription 0 on stream <<"CmmnEvents-2">>, correlation ID 15
glrabbitmq-1  | 2025-02-19 15:17:58.096510+00:00 [debug] <0.1432.0> Initializing reader for active consumer (subscription 0, stream <<"CmmnEvents-2">>), offset spec is 88
glrabbitmq-1  | 2025-02-19 15:17:58.096550+00:00 [debug] <0.1432.0> osiris: initialising reader. Spec: 88
glrabbitmq-1  | 2025-02-19 15:17:58.097079+00:00 [debug] <0.1432.0> __CmmnEvents-2_1739888727630723581 [osiris_log:init_offset_reader0/2]  spec 88 chunk_id range {0,88} Num index files 1
glrabbitmq-1  | 2025-02-19 15:17:58.098224+00:00 [debug] <0.1432.0> __CmmnEvents-2_1739888727630723581 [osiris_log:offset_idx_scan/3]  completed in 0.263000ms
glrabbitmq-1  | 2025-02-19 15:17:58.098267+00:00 [debug] <0.1432.0> __CmmnEvents-2_1739888727630723581 [osiris_log:init_offset_reader0/2] resolved chunk_id 88 at file pos: 41820
glrabbitmq-1  | 2025-02-19 15:17:58.098408+00:00 [debug] <0.1432.0> Next offset for subscription 0 is 88
glrabbitmq-1  | 2025-02-19 15:17:58.098443+00:00 [debug] <0.1432.0> Dispatching to subscription 0 (stream <<"CmmnEvents-2">>), credit(s) 10, send limit 5
glrabbitmq-1  | 2025-02-19 15:17:58.100266+00:00 [debug] <0.1432.0> Subscription 0 (stream <<"CmmnEvents-2">>) is now at offset 89 with 0 message(s) distributed after subscription
glrabbitmq-1  | 2025-02-19 15:17:58.100365+00:00 [debug] <0.1432.0> Subscription 7 on <<"CmmnEvents-0">> instructed to become active: true
glrabbitmq-1  | 2025-02-19 15:17:58.100421+00:00 [debug] <0.1432.0> Registering RPC request #{active => true,stream => <<"CmmnEvents-0">>,
glrabbitmq-1  | 2025-02-19 15:17:58.100421+00:00 [debug] <0.1432.0>                           consumer_name => <<"CmmnEventConsumer">>,
glrabbitmq-1  | 2025-02-19 15:17:58.100421+00:00 [debug] <0.1432.0>                           subscription_id => 7} with correlation ID 18
glrabbitmq-1  | 2025-02-19 15:17:58.100628+00:00 [debug] <0.1432.0> Received consumer update response for subscription 4 on stream <<"CmmnEvents-6">>, correlation ID 16
glrabbitmq-1  | 2025-02-19 15:17:58.100661+00:00 [debug] <0.1432.0> Initializing reader for active consumer (subscription 4, stream <<"CmmnEvents-6">>), offset spec is 58
glrabbitmq-1  | 2025-02-19 15:17:58.100696+00:00 [debug] <0.1432.0> osiris: initialising reader. Spec: 58
glrabbitmq-1  | 2025-02-19 15:17:58.101915+00:00 [debug] <0.1432.0> __CmmnEvents-6_1739888727676228583 [osiris_log:init_offset_reader0/2]  spec 58 chunk_id range {0,58} Num index files 1
glrabbitmq-1  | 2025-02-19 15:17:58.103382+00:00 [debug] <0.1432.0> __CmmnEvents-6_1739888727676228583 [osiris_log:offset_idx_scan/3]  completed in 0.261000ms
glrabbitmq-1  | 2025-02-19 15:17:58.103455+00:00 [debug] <0.1432.0> __CmmnEvents-6_1739888727676228583 [osiris_log:init_offset_reader0/2] resolved chunk_id 58 at file pos: 27846
glrabbitmq-1  | 2025-02-19 15:17:58.103551+00:00 [debug] <0.1432.0> Next offset for subscription 4 is 58
glrabbitmq-1  | 2025-02-19 15:17:58.103625+00:00 [debug] <0.1432.0> Dispatching to subscription 4 (stream <<"CmmnEvents-6">>), credit(s) 10, send limit 5
glrabbitmq-1  | 2025-02-19 15:17:58.105357+00:00 [debug] <0.1432.0> Subscription 4 (stream <<"CmmnEvents-6">>) is now at offset 59 with 0 message(s) distributed after subscription
glrabbitmq-1  | 2025-02-19 15:17:58.105424+00:00 [debug] <0.1432.0> Received consumer update response for subscription 6 on stream <<"CmmnEvents-7">>, correlation ID 17
glrabbitmq-1  | 2025-02-19 15:17:58.105533+00:00 [debug] <0.1432.0> Not an active consumer
glrabbitmq-1  | 2025-02-19 15:17:58.105631+00:00 [debug] <0.1432.0> Subscription 6 on stream <<"CmmnEvents-7">>, group <<"CmmnEventConsumer">> has stepped down, activating consumer
glrabbitmq-1  | 2025-02-19 15:17:58.107263+00:00 [debug] <0.1432.0> Subscription 14 on <<"CmmnEvents-7">> instructed to become active: true
glrabbitmq-1  | 2025-02-19 15:17:58.107327+00:00 [debug] <0.1432.0> Registering RPC request #{active => true,stream => <<"CmmnEvents-7">>,
glrabbitmq-1  | 2025-02-19 15:17:58.107327+00:00 [debug] <0.1432.0>                           consumer_name => <<"CmmnEventConsumer">>,
glrabbitmq-1  | 2025-02-19 15:17:58.107327+00:00 [debug] <0.1432.0>                           subscription_id => 14} with correlation ID 19
glrabbitmq-1  | 2025-02-19 15:17:58.107490+00:00 [debug] <0.1432.0> Received consumer update response for subscription 7 on stream <<"CmmnEvents-0">>, correlation ID 18
glrabbitmq-1  | 2025-02-19 15:17:58.107571+00:00 [debug] <0.1432.0> Initializing reader for active consumer (subscription 7, stream <<"CmmnEvents-0">>), offset spec is 59
glrabbitmq-1  | 2025-02-19 15:17:58.107628+00:00 [debug] <0.1432.0> osiris: initialising reader. Spec: 59
glrabbitmq-1  | 2025-02-19 15:17:58.108873+00:00 [debug] <0.1432.0> __CmmnEvents-0_1739888727571995580 [osiris_log:init_offset_reader0/2]  spec 59 chunk_id range {0,59} Num index files 1
glrabbitmq-1  | 2025-02-19 15:17:58.110169+00:00 [debug] <0.1432.0> __CmmnEvents-0_1739888727571995580 [osiris_log:offset_idx_scan/3]  completed in 0.152000ms
glrabbitmq-1  | 2025-02-19 15:17:58.110256+00:00 [debug] <0.1432.0> __CmmnEvents-0_1739888727571995580 [osiris_log:init_offset_reader0/2] resolved chunk_id 59 at file pos: 27925
glrabbitmq-1  | 2025-02-19 15:17:58.110560+00:00 [debug] <0.1432.0> Next offset for subscription 7 is 59
glrabbitmq-1  | 2025-02-19 15:17:58.110601+00:00 [debug] <0.1432.0> Dispatching to subscription 7 (stream <<"CmmnEvents-0">>), credit(s) 10, send limit 5
glrabbitmq-1  | 2025-02-19 15:17:58.112326+00:00 [debug] <0.1432.0> Subscription 7 (stream <<"CmmnEvents-0">>) is now at offset 60 with 0 message(s) distributed after subscription
glrabbitmq-1  | 2025-02-19 15:17:58.112436+00:00 [debug] <0.1432.0> Received consumer update response for subscription 14 on stream <<"CmmnEvents-7">>, correlation ID 19
glrabbitmq-1  | 2025-02-19 15:17:58.112481+00:00 [debug] <0.1432.0> Initializing reader for active consumer (subscription 14, stream <<"CmmnEvents-7">>), offset spec is 68
glrabbitmq-1  | 2025-02-19 15:17:58.112595+00:00 [debug] <0.1432.0> osiris: initialising reader. Spec: 68
glrabbitmq-1  | 2025-02-19 15:17:58.113286+00:00 [debug] <0.1432.0> __CmmnEvents-7_1739888727686754974 [osiris_log:init_offset_reader0/2]  spec 68 chunk_id range {0,68} Num index files 1
glrabbitmq-1  | 2025-02-19 15:17:58.114169+00:00 [debug] <0.1432.0> __CmmnEvents-7_1739888727686754974 [osiris_log:offset_idx_scan/3]  completed in 0.143000ms
glrabbitmq-1  | 2025-02-19 15:17:58.114232+00:00 [debug] <0.1432.0> __CmmnEvents-7_1739888727686754974 [osiris_log:init_offset_reader0/2] resolved chunk_id 68 at file pos: 32328
glrabbitmq-1  | 2025-02-19 15:17:58.114332+00:00 [debug] <0.1432.0> Next offset for subscription 14 is 68
glrabbitmq-1  | 2025-02-19 15:17:58.114363+00:00 [debug] <0.1432.0> Dispatching to subscription 14 (stream <<"CmmnEvents-7">>), credit(s) 10, send limit 5
glrabbitmq-1  | 2025-02-19 15:17:58.116317+00:00 [debug] <0.1432.0> Subscription 14 (stream <<"CmmnEvents-7">>) is now at offset 69 with 0 message(s) distributed after subscription

Logs from node 2 (if applicable, with sensitive values edited out)

See https://www.rabbitmq.com/docs/logging to learn how to collect logs

# PASTE LOG HERE, BETWEEN BACKTICKS

Logs from node 3 (if applicable, with sensitive values edited out)

See https://www.rabbitmq.com/docs/logging to learn how to collect logs

# PASTE LOG HERE, BETWEEN BACKTICKS

rabbitmq.conf

See https://www.rabbitmq.com/docs/configure#config-location to learn how to find rabbitmq.conf file location

log.file.level = debug
log.console.level = debug

Steps to deploy RabbitMQ cluster

docker-compose up

glrabbitmq:
restart: always
hostname: acm_rabbit_node_1
image: rabbitmq:4.0.6-management
volumes:
- rabbitmq:/var/lib/rabbitmq
- ./rabbit/enabled_plugins:/etc/rabbitmq/enabled_plugins
- ./rabbit/rabbitmq.conf:/etc/rabbitmq/rabbitmq.conf
ports:
- ${LISTEN_IP}:5672:5672
- ${LISTEN_IP}:5552:5552
- ${LISTEN_IP}:25672:25672
- ${LISTEN_IP}:15672:15672
networks:
- glnet

Steps to reproduce the behavior in question

A super stream 'CmmnEvents' is created with 8 partitions.
On 2 different servers we deploy an application that uses the java stream client 0.22 to create one Environment (connection) with 2 single active consumers (with the same name) on the superstream.
When the clients start up we can see that the 8 partitions of the superstream each get a single active consumer, nicely balanced between the 2 connections. When one server gets shut down nicely, the application calls the close methods on the consumers and we can see that the other connection perfectly takes over all the partitions with a single active consumer. But when we simulate a crash (kill) of one client, the other connection does not take over every partition, but always does it for 2 of them. So 2 others don't get a single active consumer anymore and our application fails to consume messages from those partitions.
When only one consumer is defined per client application, it all works fine.
From the rabbit mq server log it seems that during this crash scenario the server does not send RPC calls about the 2 failing partitions, while it does for the other 6.

advanced.config

See https://www.rabbitmq.com/docs/configure#config-location to learn how to find advanced.config file location

# PASTE advanced.config HERE, BETWEEN BACKTICKS

Application code

# PASTE CODE HERE, BETWEEN BACKTICKS

Kubernetes deployment file

# Relevant parts of K8S deployment that demonstrate how RabbitMQ is deployed
# PASTE YAML HERE, BETWEEN BACKTICKS

What problem are you trying to solve?

When multiple client applications define multiple single active consumers (with the same name) on the same partitioned superstream, and when one client crashes, we expect the remaining client to take over consuming all the partitions, but this scenario always ends with 2 of the 8 partitions unconsumed.

acogoluegnes · 2025-02-19T19:38:55Z

acogoluegnes
Feb 19, 2025
Maintainer

On 2 different servers we deploy an application that uses the java stream client 0.22 to create one Environment (connection) with 2 single active consumers (with the same name) on the superstream.

You deploy 2 instances of the same application, is that correct?

1 reply

bvrnAXI Feb 20, 2025
Author

yes that is correct

acogoluegnes · 2025-02-20T10:29:52Z

acogoluegnes
Feb 20, 2025
Maintainer

I could not reproduce the issue. Can you provide a step-by-step procedure?

Here is the procedure I used. We expect something similar from your end to help us diagnose the issue.

Start a 3-node cluster. Using the stream Java client Docker setup is convenient:

cd /tmp
git clone git@github.com:rabbitmq/rabbitmq-stream-java-client.git
cd rabbitmq-stream-java-client
ci/start-cluster.sh

(to stop the cluster later: docker compose --file ci/cluster/docker-compose.yml down)

In another terminal tab, get Stream PerfTest and run it to simulate a super stream consumer (it creates the super stream as well):

cd /tmp
wget https://github.com/rabbitmq/rabbitmq-java-tools-binaries-dev/releases/download/v-stream-perf-test-latest/stream-perf-test-latest.jar
java -jar stream-perf-test-latest.jar --producers 0 --consumers 1 --stream-count 1 \
  --super-streams --super-stream-partitions 8 --single-active-consumer --consumer-names my-app \
  --uris rabbitmq-stream://$(hostname):5552,rabbitmq-stream://$(hostname):5553,rabbitmq-stream://$(hostname):5554 \
  --consumers-by-connection 100

In yet another terminal tab, run another consumer:

cd /tmp
java -jar stream-perf-test-latest.jar --producers 0 --consumers 1 --stream-count 1 \
  --super-streams --super-stream-partitions 8 --single-active-consumer --consumer-names my-app \
  --uris rabbitmq-stream://$(hostname):5552,rabbitmq-stream://$(hostname):5553,rabbitmq-stream://$(hostname):5554 \
  --consumers-by-connection 100

List the consumers of the group for the stream-0 partition:

docker exec rabbitmq0 rabbitmqctl list_stream_group_consumers --stream stream-0 --reference my-app
Listing group consumers ...
┌─────────────────┬─────────────────────────────────────┬──────────┐
│ subscription_id │ connection_name                     │ state    │
├─────────────────┼─────────────────────────────────────┼──────────┤
│ 2               │ 172.25.0.1:45052 -> 172.25.0.4:5552 │ active   │
├─────────────────┼─────────────────────────────────────┼──────────┤
│ 1               │ 172.25.0.1:54656 -> 172.25.0.4:5552 │ inactive │
└─────────────────┴─────────────────────────────────────┴──────────┘

List the Java processes with jps:

jps
58802 stream-perf-test-latest.jar
58933 stream-perf-test-latest.jar
42725 Launcher
59501 Jps

Kill one of the Stream PerfTest processes:

kill -9 58802

List the consumers on the stream-0 partition:

docker exec rabbitmq0 rabbitmqctl list_stream_group_consumers --stream stream-0 --reference my-app
Listing group consumers ...
┌─────────────────┬─────────────────────────────────────┬────────┐
│ subscription_id │ connection_name                     │ state  │
├─────────────────┼─────────────────────────────────────┼────────┤
│ 1               │ 172.25.0.1:54656 -> 172.25.0.4:5552 │ active │
└─────────────────┴─────────────────────────────────────┴────────┘

There is still a consumer, it was inactive before and has been promoted to active.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rebalancing fails partly when one of two clients with multiple single active consumers (with the same name) on a superstream crashes #13372

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Rebalancing fails partly when one of two clients with multiple single active consumers (with the same name) on a superstream crashes #13372

bvrnAXI Feb 19, 2025

Community Support Policy

RabbitMQ version used

Erlang version used

Operating system (distribution) used

How is RabbitMQ deployed?

rabbitmq-diagnostics status output

Logs from node 1 (with sensitive values edited out)

Logs from node 2 (if applicable, with sensitive values edited out)

Logs from node 3 (if applicable, with sensitive values edited out)

rabbitmq.conf

Steps to deploy RabbitMQ cluster

Steps to reproduce the behavior in question

advanced.config

Application code

Kubernetes deployment file

What problem are you trying to solve?

Replies: 2 comments · 1 reply

acogoluegnes Feb 19, 2025 Maintainer

bvrnAXI Feb 20, 2025 Author

acogoluegnes Feb 20, 2025 Maintainer

bvrnAXI
Feb 19, 2025

Replies: 2 comments 1 reply

acogoluegnes
Feb 19, 2025
Maintainer

bvrnAXI Feb 20, 2025
Author

acogoluegnes
Feb 20, 2025
Maintainer