-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Messages get lost when using gossipsub #197
Comments
@MBakhshi96 Can you be more precise about what you mean by "losing messages"? Are your nodes actually connected to each other and have they completed their initial handshakes? |
@aschmahmann I mean that sent messages didn't get received by all of other nodes. The nodes Are connected, I also tried the fully connected configuration, but that was not helpful. |
@MBakhshi96 we are not aware of any issues that could cause this. Can you post a test case showing the issue in a github repo? It needs to be reproduced in order to help you. Thanks. |
@raulk I'v added a test case showing the problem here.
We start with n = 10 nodes. If everything works well every node must receive n*n + n messages and then the execution will terminate, But in this example the execution never stops. You can check the number of acks for every message in the output and you'll see that not all of acks are received by nodes. |
Are there any logs about dropped messages? |
@vyzo Where can I find logs for this execution? There is no log in the output, but it may because of level of logging used in the pubsub code. |
|
also, what is your toplogy? |
@vyzo My topology is a simple ring, but I'v also tested it with fully connected topology.
I don't know what causes this problem and why these messages don't get retransmitted. |
this log tells you that the pubsub subsystem is dropping messages at subscription delivery; you are simply not consuming the messages fast enough. |
note that there is no retransmission whatsoever in pubsub; also note that the messages are propagated normally, they are just dropped at delivery. |
@vyzo What do you mean by not consuming fast? I'm receiving messages inside a for loop, which simply waits for a message and then prints it in the output. How can I consume it faster? |
Are you running the receiver in separate goroutines? |
what is your message rate? it may be that your computer is too slow. |
@vyzo Actually, I don't know my message rate. In the provided example, every node will publish only 1+10 messages, but I don't know how long it takes to publish these message. Also, even if my pc is too slow, which is not, I think it's not good to lose message. There must be a way to ensure reliable message delivery. |
there might be something else at play, are you receiving any messages? |
Also, re: drop messages: there has to be a throttle somewhere, we can't buffer an infinite number of messages. |
@vyzo Most of messages get delivered, I only lose a few messages. |
there is currently no way to specify the subscription buffer size. |
@vyzo So what is your proposition? How can I circumvent this problem, since I need a reliable broadcast scheme? |
You can make a pr to make the buffer capacity configurable perhaps, but this is not the solution long term. How many nodes are you running in the single computer? |
@vyzo Between 10 and 20. |
that's weird, it's not a lot of nodes. |
is there any delay between message transmission, or are you sending as fast as you can? |
@vyzo There is no delay between reception and transmission. |
can you add a small delay before transmitting consecutive messages? |
@vyzo I tried to add 100 milliseconds of delay before publishing to pubsub, but the problem still persists and it has got even worse! |
are you blocking the receive loop with that delay? that could explain getting worse. |
@vyzo No. I run it in another goroutine. |
btw, are you maxing the cpus in your computer? |
@vyzo No! |
this is very weird, you are not maxing your cpus and yet you are too slow to receive the messages! |
@vyzo It gets better with adding random delays in range of a second. But that is too much of delay for 10 nodes! |
No news in this thread? |
A few things about this issue:
|
@aschmahmann making queue size configurable would be good, so everybody can change the limit based on implementation needs.
I don't want to store anything on top of pubsub, the only thing that I want is to have reliable broadcasts using pubsub.
That would be really helpful for me, so I will be able to retransmit lost messages. |
@MBakhshi96 it really sounds like there's some shared state you're trying to track. Take this example where only rebroadcasting and/or a persistence layer is the only way to help with lost messages. A-B-C are connected in a line. A sends a message to B, B doesn't send it to C (maybe B crashed, maybe it blacklisted C, etc.). Even though A wanted to send a message to C and it successfully sent the message to B there's no way for it to know C received the message (or even that C exists). Note, that even if A and C directly connect afterwards the message A initially sent will not be automatically rebroadcast. Even your demo has this same property. All the nodes are implicitly aware of the other nodes and are trying to operate on the shared state
Recall that, as in the above example, some messages you won't know are lost. Adding this new event would be an optimization that would allow us to retransmit the state less frequently, but it's not strictly necessary in order to layer persistence on top of pubsub. |
I'm also lose messages when using gossipsub at stand alone project (numerous message at same peer and cpu is slow) , |
Pubsub is supposed to be reliable but I lose messages when using gossipsub. The problem is when I use around 10 nodes and all nodes try to broadcast messages to a topic in the pubsub, not all of the messages get delivered and receiving nodes lose some messages. I also have tried floodsub, but the problem still persists.
The text was updated successfully, but these errors were encountered: