-
-
Notifications
You must be signed in to change notification settings - Fork 545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Specified group generation id is not valid" after broker maintenance, consumer stops receiving events #1466
Comments
I have pretty the same thing. I have a connection to 11 topics and when I start receiving messages i see the logs below
and after it the message that the consumer has been stopped. Increasing of heartbeats interval and sessionTimeout didn't help |
Same thing for us
After that just hangs until manually restarted Happened at the end of (or right after) AWS Kafka maintenance "Heal cluster" |
Ran into this as well, proposed fix: #1474 |
I've also encountered this. Rejoin should be correct in this case. |
We are seeing the same thing after a GKE update. Does anyone know a workaround while we wait? |
@ErlendFax have you found a workaround that is not restart manually the consumer ?
@h0od when you say rejoin, should the library handle it or should be done withing the consumer code ? thanks 🙏 |
We have not. Just hoping it won't fail again. I'm interested in a workaround/solution as well. |
The library should try to rejoin, exactly like it does when the group is rebalancing. |
Same here as well, node are being rotated and then consumer just stop consuming:
I think |
As a workaround, one could try something like this: kafkaClient.consumer.on("consumer.crash", (event) => {
if (event.payload.error.name === "KafkaJSNonRetriableError") {
process.exit(1); // will initiate a k8s restart
// ... or do something else like reconnecting and starting `run` again ...
}
}); |
Hi, we are having an issue similar to #1009 but it happens after a broker maintenance.
We have consumers running parallelly on different machines, with a heartbeat check triggered on eachBatch.
We consume multiple topics, with a specific instance of our service per topic.
All of this works fine but we had issues (twice already) when brokers go on maintenance.
Some of the instance (thus some of the topics) stop consuming events, but don't throw errors nor crash (if it crashed we would respawn and everything would be ok).
We do see the error message:
[Consumer] Crash: KafkaJSNonRetriableError: Specified group generation id is not valid
But it doesn't actually crash, and the instance is stale, it won't consume any new message or trigger the heartbeat. If we restart the instance it will consume all pending traffic (given the offset is still current).
Odd thing is some of the topics keep working fine after the maintenance, so the overall system seems to be "up" unless we check each specific topic.
The text was updated successfully, but these errors were encountered: