Skip to content
This repository was archived by the owner on Jun 21, 2023. It is now read-only.

Disable non-standard TypeCode #288

Merged
merged 1 commit into from
May 9, 2018
Merged

Disable non-standard TypeCode #288

merged 1 commit into from
May 9, 2018

Conversation

sloretz
Copy link
Contributor

@sloretz sloretz commented May 9, 2018

This unblocks ros2/rcl_interfaces#32 and ros2/rclcpp#443. It's a workaround for ros2/rmw_fastrtps#194. In short it disables a nonstandard feature to reduce the size of a message sent during discovery by RTI connext.

To reproduce the issue this works around:

  1. Checkout rcl_interfaces branch paremeter_type_array
  2. RMW_IMPLEMENTATION=rmw_fastrtps_cpp ros2 run demo_nodes_cpp listener
  3. RMW_IMPLEMENTATION=rmw_connext_cpp ros2 run demo_nodes_cpp talker

Without this PR the node using rmw_fastrtps_cpp will spam the console with

[RTPS_HISTORY Error] Change payload size of '5100' bytes is larger than the history payload size of '5000' bytes [...]

These fail to communicate because RTI sends a very large message during discovery and FastRTPS seems to have a fixed size buffer (5000 bytes) that is too small. The reason RTI's message is so large has to do with its support for DDS-XTYPES.

RTI connext 5.x partially supports DDS-XTYPES, but versions 4.5f and lower implemented something similar but non-standard. The standard version is TypeObject and pre-standard is TypeCode. These contain full descriptions of the message, so more message fields means more data is sent. For backwards compatibility with 4.5f and below, 5.x versions sends both a TypeObject and TypeCode.

TypeObject and TypeCode have a default maximum serialized length of 3072 and 2048 bytes respectively. A message that almost maxes both of these out would exceed FastRTPS's buffer size. However, a message that is even larger causes connext to just not send the field and fall back to matching using the topic name. This means fastrtps and default configured connext will fail to communicate if TypeObject + TypeCode + (lots of smaller stuff) > 5000 bytes and TypeObject < 3072 and TypeCode < 2048. The parameter events message with array support falls into this sweet spot.

The fix here is to set the maximum serialized length of TypeCode to zero so that connext never sends it.

CI

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@sloretz sloretz added the in review Waiting for review (Kanban column) label May 9, 2018
@sloretz sloretz self-assigned this May 9, 2018
@sloretz sloretz requested a review from mikaelarguedas May 9, 2018 02:27
@mikaelarguedas
Copy link
Member

Sounds good to me.

I guess the assumption is that the issue @nuclearsandwich faced with Opensplice<=>FastRTPS communication is different? Or does opensplice also sends additional data not in the spec that can cause this to happen?
(unfortunately the job linked in ros2/rmw_fastrtps#194 has been deleted now)

Are there any "standard scenarios" where we could hit this 5000 bytes limit ?
If there are, we should follow-up by opening an issue on Fast-RTPS to provide a way to specify our own memory strategy for endpoint (like we do for participants in e.g. https://github.com/ros2/rmw_fastrtps/blob/a9c33de826339905b737b8ab01c4719b8a240aed/rmw_fastrtps_cpp/src/rmw_publisher.cpp#L107-L108), it is currently not exposed in Fast-RTPS and default initialized here: https://github.com/eProsima/Fast-RTPS/blob/095d657e117381fd7f6b611a0db216b7df942354/include/fastrtps/rtps/attributes/HistoryAttributes.h#L42

@sloretz
Copy link
Contributor Author

sloretz commented May 9, 2018

@mikaelarguedas eProsima/Fast-DDS#213

I couldn't find an upper limit to the size of TypeObject in the DDS-XTYPES 1.2 standard, though I don't think finding a limit would change anything for us. If this PR looks good to you then lets merge it and continue with implementing parameter arrays.

@sloretz
Copy link
Contributor Author

sloretz commented May 9, 2018

@mikaelarguedas I'm assuming the opensplice and fastrtps issue is different. How reproducible is it?

@mikaelarguedas
Copy link
Member

How reproducible is it?

I'm not sure actually that's why I focused the issue description to the reproducible Fast-RTPS <=> Connext communication with my rcl_interfaces changes. Maybe @nuclearsandwich has a better recollection. As we don't run opensplice nightly I cant really provide multiple examples of failures.

If this PR looks good to you then lets merge it and continue with implementing parameter arrays.

Yeah totally, I just wanted to make sure we reported it upstream to avoid future issues as this seems to be more of a workaround than a solution

Copy link
Member

@mikaelarguedas mikaelarguedas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@sloretz sloretz merged commit bbbc754 into master May 9, 2018
@sloretz sloretz deleted the disable_typecode_connext branch May 9, 2018 17:01
@sloretz sloretz removed the in review Waiting for review (Kanban column) label May 9, 2018
@nuclearsandwich
Copy link
Member

nuclearsandwich commented May 9, 2018

unfortunately the job linked in ros2/rmw_fastrtps#194 has been deleted now

(unfortunately the job linked in ros2/rmw_fastrtps#194 has been deleted now)

I still have the log file for this job. I can't upload or gist it due to file size but I will stick it
in Drive for Open Robotics staff (link)

@nuclearsandwich has a better recollection. As we don't run opensplice nightly I cant really provide multiple examples of failures.

As far as I remember I only saw this the once while testing for connectivity issues.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants