-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unknown origin of USER TRAFFIC packets #1146
Comments
Hi @omertal88, let me start with the self-help section of the answer:
Yes. Each DATA(_FRAG) message, or really any message but these are the ones that triggered it, contain a writer GUID. You have to look at the dissection of the packet, usually the full GUID is formed by the Now, I assume you mean this (there are many more following, I think this is the first): and so for the selected message, the full writer GUID is Wireshark tries to map them to topic names, it can do only do that for the "built-in ones" (because they have hard-coded entity ids, and Wireshark shows them as, for example,
as well as including the topic and type names and a reference to the SEDP message in the dissection. For the messages you are asking about, clearly the writer's SEDP messages is not in the capture and hence Wireshark won't help. The Cyclone trace always starts at the beginning and so it always has the information (not doing log rotation can be an advantage). It also includes the topic name for each message because it makes life much more convenient while adding (relatively) little text to the (already gigantic) trace files. You may also want to have a look at this if you intend to also dive into the Cyclone DDS trace: https://gist.github.com/eboasson/56aa337c558151c7d9598214418f2fed which I started doing and meant to extend a bit further before turning it into a PR on Cyclone — I probably should just add it as it is ... Sometimes you can make an informed guess and this is one of those cases. ROS 2 used to map each ROS 2 node to a DDS DomainParticipant, but most DDS implementations have horrendous overhead when you do that. Cyclone is a notable exception, but I failed to convince them not to change it and so now it uses a single DDS DomainParticipant per process (or more accurately "context" but they are almost synonymous). (ros2/design#250 is a good starting point if you're curious.) For the most part, this was a trivial change, but not for introspecting the ROS 2 graph: it used to be that you could derive all necessary information from the DDS discovery data, but that change means the mapping between readers/writers and ROS 2 nodes no longer exists in the discovery data. This was solved by adding a transient-local, keep-last-1 "ros_discovery_topic" containing a list of all ROS2 nodes and how the various endpoints relate to them. In short, it contains tons of GUIDs — for some reason stored as 24 byte arrays instead of 16 byte arrays ... The writer of this topic is one of the first writers created when a ROS application starts and so writer entity id This topic gets published every time the application creates or deletes a ROS 2 subscription or publisher. If you look at the other traffic around the time this data is being published in the trace, you can see it correlates with The end-result of trying to work around a nasty performance problem in some DDS implementations by changing the ROS 2 node-to-DDS DomainParticipant mapping is therefore a significant increase in data associated with discovery and that's what you're seeing here. As predicted. And your case is actually a mild case ... (*) The protocol allows data from multiple DomainParticipants to be combined in a single packet, which is quite common in OpenSplice, but not with most implementations, and certainly not with ROS 2 because it only uses a single DDS DomainParticipant. There exists a special |
Hello @eboasson First allow me to show my deepest appreciation for your detailed answer. Secondly, what did you mean by
Since I'm using a wireless channel, I can't use multicast discovery, would you say a better alternative is to use SPDP and just set MaxAutoParticipants to a reasonable number? I feel that is wasteful because most nodes don't need to interact with all the other nodes in my system. Thirdly,
Thank you again for your help. |
If you can't do multicast discovery, then what you're doing is the most efficient way. With the MaxAutoParticipants you indeed get a burst of traffic, which is usually more than you need.
If it had been what I meant ... I probably would have been able to tell from looking at the contents of the SPDP message (
If you filter on If you see many in a message (e.g., 59633), chances are they are retransmits, more likely than not because a new remote node was discovered (or an existing one rediscovered, if you have moving robots, that's perfectly reasonable). The time stamp inside the messages tend to be: Feb 13, 2022 13:35:57.641412128 UTC for the ones that are definitely being retransmitted, with low sequence numbers. The very last one has a time stamp of Feb 13, 2022 14:45:16.716417248 UTC and sequence number 1168. I haven't studied them all, but it sure suggests something is being created/deleted. Eg.. 66624 is a new reader for topic You see them go out in bursts of 4, two to .35 and two to .75, but that must be because both nodes have two processes. |
I wrote what's below before you added the excerpt from the Cyclone trace. Thanks for sharing that, it confirms some of things I could only suspect before. Now we have to figure out whether my suspicions are correct :) I have a suspicion. There are basically four cases in requesting retransmits:
Naturally, some or all of the packets in the response can be lost in transmission. There are recovery mechanisms to deal with that and they start with a From packets 166835+166837 en 166843+166846 and knowing Cyclone, I suspect you're may be looking at case 3. When a remote reader requests a retransmission of a large sample, it only retransmits the first fragment and relies on the reader to request the remaining fragments via The little bit I can see fits with this pattern. This is all very much a grey area in the specs, it doesn't say anywhere that you must use Two things are interesting now:
If it is what I suspect, a quick work-around that should at least work for a while is to raise the maximum message size and maximum fragment size in Cyclone. In the capture you shared everything is <16kB, so setting: But I'd be grateful if you could have a look into those two questions as well. Work-arounds are not solutions, and this one falls apart for anything over 62kB. |
Hey again, So I'm not familiar with Fast-RTPS logs so I'll need some time to dig into it. One more observation that I noticed... Also, I find it weird that both clients loses the same sample and both stuck in that loop. This must be an issue in the sender's side. Edit: Thank you once again :) |
I think I may have a theory... Will update on the results... Edit: |
@eboasson, Thanks again. |
When you create a node, several subscriptions and publishers (for parameters and the like) get created, and this fits with that pattern. I did a quick test by just starting (a derivative of) "listener" and stopping it again (it stops cleanly when it receives ^C) and that looks perfectly normal: first you see all the entities get created (with as many ros_discovery_info samples, with the samples growing ever larger), then you see them getting destroyed (again with ros_discovery_info samples, but now getting smaller again).
tail filtered.txt:
It gets big, yes, but not too big to be sent. It does get too big to be sent in one ethernet frame, then in what Cyclone by default uses as the limit before it starts fragmenting, &c., but the ones in the Wireshark capture would still fit in a single UDP datagram. Raising the message size limit in Cyclone would avoid DDSI-RTPS fragmentation, and that, I suspect, will work around the problem. In theory all DDS implementations should interoperate perfectly with very large messages and fragmentation, but that's where I think you're running into some problem that hadn't surfaced before. |
Hello @eboasson,
If I understand it correctly, the ros_discovery_info message reached the size of 24672 bytes... So if I'm not wrong, in about 30 minutes from now it will start the loop again. I did try to restart my node and I noticed the message size was reset. Instead, if it's not too much trouble, can you please try to create a subscriber / publisher, and then do Thank you |
I don't think it really has anything to do with how you delete things, but I did reproduce it and for there onwards it was quite trivial. If you could give ros2/rmw_cyclonedds#373 a try ... Thanks for catching that! |
Thank you for all of your help... Do you know when / if it will be part of galactic? |
Definitely. (I chose wine because squashing a bug leaves a red stain 😜) I don't know yet when/if it will get backported (really: cherry-picked) to galactic, but I think it should be. Perhaps it would be a good idea to ask for that on the PR, I am sure everyone likes feedback from users that fixing stuff in patch releases is worth the effort. |
Hehe :) |
Hello,
I'm using CycloneDDS as RMW of ROS2 nodes in my system. I have an issue where after my system runs for a long time, large packets of USER TRAFFIC are sent to another node (on a different machine - running micro ROS Agent).
Looking at the code, I can't find any reason why these packets would be sent. Also, when I take the agent down - the traffic goes back to normal.
I have a Wireshark capture where you can see how the traffic starts to climb up. Is there a way I can use Wireshark and the trace log to find the publisher (or topic name) which creates this traffic?
Please see the attached wireshark capture. You can see the spike in RTPS traffic at around time=350.
wireshark-capture.zip
Another observation worth mentioning, I looked at one of the big packets sent from
192.0.5.15
to192.0.5.35
. In the serialized data section I expected to see random characters (as it is serialized)... But instead I saw an interesting, yet redundant pattern. For example:You can see
xtra_osd
which is probably the name of the node which sent the message. Then a repeating pattern ofÅPæwgWg$
followed by all english alphabet characters, digits and keyboard symbols one at a time.Thank you.
Omer
The text was updated successfully, but these errors were encountered: