Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory leak that string not deleted. #224

Merged
merged 2 commits into from
Aug 26, 2020

Conversation

iuhilnehc-ynos
Copy link

Related to ros2/rcl#721

memory leak log

==23528== 29 bytes in 1 blocks are definitely lost in loss record 1 of 72
==23528==    at 0x4C3017F: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==23528==    by 0x1DAA6A: void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char const*>(char const*, char const*, std::forward_iterator_tag) (basic_string.tcc:219)
==23528==    by 0x5D45786: cycdeser::deserialize(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) (serdes.hpp:165)
==23528==    by 0x5D450F4: cycdeser::operator>>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) (serdes.hpp:113)
==23528==    by 0x5D4657B: void rmw_cyclonedds_cpp::deserialize_field<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(rosidl_typesupport_introspection_cpp::MessageMember const*, void*, cycdeser&, bool) (TypeSupport_impl.hpp:281)
==23528==    by 0x5D503D6: rmw_cyclonedds_cpp::TypeSupport<rosidl_typesupport_introspection_cpp::MessageMembers>::deserializeROSmessage(cycdeser&, rosidl_typesupport_introspection_cpp::MessageMembers const*, void*, bool) (TypeSupport_impl.hpp:514)
==23528==    by 0x5D5051A: rmw_cyclonedds_cpp::TypeSupport<rosidl_typesupport_introspection_cpp::MessageMembers>::deserializeROSmessage(cycdeser&, rosidl_typesupport_introspection_cpp::MessageMembers const*, void*, bool) (TypeSupport_impl.hpp:542)
==23528==    by 0x5D4AA30: rmw_cyclonedds_cpp::TypeSupport<rosidl_typesupport_introspection_cpp::MessageMembers>::deserializeROSmessage(cycdeser&, void*, std::function<void (cycdeser&)>) (TypeSupport_impl.hpp:669)
==23528==    by 0x5D9734E: serdata_rmw_to_sample(ddsi_serdata const*, void*, void**, void*) (serdata.cpp:288)
==23528==    by 0x80CD643: ddsi_serdata_to_sample (ddsi_serdata.h:230)
==23528==    by 0x814B813: read_take_to_sample (dds_rhc_default.c:1956)
==23528==    by 0x814BF68: take_w_qminv_inst (dds_rhc_default.c:2091)

The patch is to make sure deleting ParticipantEntitiesInfo after calling rmw_take each time.

Signed-off-by: Chen.Lihui lihui.chen@sony.com

Signed-off-by: Chen.Lihui <lihui.chen@sony.com>
Copy link
Collaborator

@eboasson eboasson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @iuhilnehc-ynos for figuring this out and proposing a fix. This looks good to me, the only thing I wonder about is the use of a unique_ptr like this — and that’s purely as a matter of style. I’m not a good judge of C++ style, so I’d like to wait for a review from one of the other people who regularly contribute here before I press merge.

@ivanpauno
Copy link
Member

I don't see how this fixes a leak.

Before ParticipantEntitiesInfo was allocated in the stack, and the destructor was cleaning up all its resources.
Allocating the same message in the heap isn't going to fix a leak.

@eboasson
Copy link
Collaborator

I don't see how this fixes a leak.

Before ParticipantEntitiesInfo was allocated in the stack, and the destructor was cleaning up all its resources.
Allocating the same message in the heap isn't going to fix a leak.

I figured that’s taken care of by the reset call on the unique pointer in the loop and its running out of scope. This is what I found when I googled it:

std::unique_ptr::reset
Destroys the object currently managed by the unique_ptr (if any) and takes ownership of p. If p is a null pointer (such as a default-initialized pointer), the unique_ptr becomes empty, managing no object after the call.

to me it looks like that means the destructor runs exactly once for each successful call to take and that eliminates the problem (from my understanding of the leak and admittedly limited knowledge of the vagaries of C++).

@ivanpauno
Copy link
Member

ivanpauno commented Aug 24, 2020

Using take with the same message repeatedly should not leak.
If that's happening, the typesupport implementation might be corrupting data (constructing a new string/vec without calling the destructor of the previous one).

I can try to dig in the current typesupport implementation, but the patch proposed here doesn't seem to be correct (it's hiding a leak in take by allocating the message in the heap and destructing it repeatedly).

@fujitatomoya
Copy link
Contributor

@iuhilnehc-ynos

as @ivanpauno mentioned previously, I do not see how this fixes the memory leak either. As far as i see here is to change allocation area into heap from stack.

@iuhilnehc-ynos
Copy link
Author

@ivanpauno @fujitatomoya

Let me try to explain what happened before.

  ParticipantEntitiesInfo msg;
  bool taken;
  while (rmw_take(impl->common.sub, &msg, &taken, nullptr) == RMW_RET_OK && taken) {
    // locally published data is filtered because of the subscription QoS
    impl->common.graph_cache.update_participant_entities(msg);
  }
  • step 1

ParticipantEntitiesInfo msg;

  • step 2

rmw_take(impl->common.sub, &msg, &taken, nullptr)

std::string node_name of NodeEntitiesInfo in ParticipantEntitiesInfo will be reallocated if the assigned buffer is long enough.

  • step 3

and then if there is another message need to take immediately, it means msg is reused

rmw_take(impl->common.sub, &msg, &taken, nullptr)

The node_name of NodeEntitiesInfo in ParticipantEntitiesInfo in step 2 is no opportunity to free.

So we need to call the destructor of ParticipantEntitiesInfo every time after update_participant_entities.

That's my understanding.

@iuhilnehc-ynos
Copy link
Author

@ivanpauno

(constructing a new string/vec without calling the destructor of the previous one).

  • Inside rmw_take, we don't know if the msg is reused or not (pass the address of msg into rmw_take, and there is no flag/member to identify), so I think we can't destruct the msg if it's reused.
  • To destruct it at the beginning every time also seems not sensible.

@fujitatomoya

As far as i see here is to change allocation area into heap from stack.

Actually, using stack is also working but it needs to change the original logic.
such as,

  do {
    ParticipantEntitiesInfo msg;
    if (rmw_take(impl->common.sub, &msg, &taken, nullptr) == RMW_RET_OK && taken) {
      impl->common.graph_cache.update_participant_entities(msg);
    } else {
      break;
    }
  } while (1);

@@ -551,11 +551,12 @@ static void handle_ParticipantEntitiesInfo(dds_entity_t reader, void * arg)
{
static_cast<void>(reader);
rmw_context_impl_t * impl = static_cast<rmw_context_impl_t *>(arg);
ParticipantEntitiesInfo msg;
auto msg = std::unique_ptr<ParticipantEntitiesInfo>(new ParticipantEntitiesInfo);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we agree on this kind of change finally, I'll update it with make_unique.

@fujitatomoya
Copy link
Contributor

std::string node_name of NodeEntitiesInfo in ParticipantEntitiesInfo will be reallocated if the assigned buffer is long enough.

while (dds_take(sub->enth, &ros_message, &info, 1, 1) == 1) {

call buffer size with 1.

https://github.com/eclipse-cyclonedds/cyclonedds/blob/eb12a8da00b403a1458513873faae8807e13fcd7/src/core/ddsc/src/dds_read.c#L65
check 1st pointer, and only if it is NULL, reallocation will be done. actually it assumes all or none.

static void sertopic_rmw_realloc_samples(
void ** ptrs, const struct ddsi_sertopic * d, void * old,
size_t oldcount, size_t count)
{
static_cast<void>(ptrs);
static_cast<void>(d);
static_cast<void>(old);
static_cast<void>(oldcount);
static_cast<void>(count);
/* Not using code paths that rely on this (loans, dispose, unregister with instance handle,
content filters) */
abort();
}

besides, sertopic_rmw_realloc_samples is not implemented yet.

I am not sure how reallocation happens in this implementation...i may be missing something.

@iuhilnehc-ynos
Copy link
Author

@fujitatomoya

string reallocated is happened at

x = std::string(data + pos, sz - 1);

  • message definition

rmw_dds_common/rmw_dds_common/msg

    ParticipantEntitiesInfo.msg

        Gid gid
        NodeEntitiesInfo[] node_entities_info_seq
        
        
    NodeEntitiesInfo.msg

        string<=256 node_namespace
        string<=256 node_name
        ...
  • message structure information

build/rmw_dds_common/rosidl_typesupport_introspection_cpp/rmw_dds_common/msg/detail/participant_entities_info__type_support.cpp

  static const ::rosidl_typesupport_introspection_cpp::MessageMember ParticipantEntitiesInfo_message_member_array[2] = {
   ...
  {
    "node_entities_info_seq",  // name
    ::rosidl_typesupport_introspection_cpp::ROS_TYPE_MESSAGE,  // type
    0,  // upper bound of string
    ::rosidl_typesupport_introspection_cpp::get_message_type_support_handle<rmw_dds_common::msg::NodeEntitiesInfo>(),  // members of sub message
    true,  // is array
    0,  // array size
    false,  // is upper bound
  • deserialize message data into ROS message

&msg -> ... -> TypeSupport::deserializeROSmessage() -> (msg is ROS_TYPE_MESSAGE, is array, array size is 0) -> get_submessage_array_deserialize ->

rmw_cyclonedds_cpp/include/rmw_cyclonedds_cpp/TypeSupport_impl.hpp

inline size_t get_submessage_array_deserialize(
  const rosidl_typesupport_introspection_cpp::MessageMember * member,
  cycdeser & deser,
  void * field,
  void * & subros_message,
  bool call_new,
  size_t sub_members_size,
  size_t max_align)
{
  ...
  auto vector = reinterpret_cast<std::vector<unsigned char> *>(field);
  if (call_new) {
    new(vector) std::vector<unsigned char>;
  }
  
  subros_message = reinterpret_cast<void *>(vector->data());            // it's used for NodeEntitiesInfo

-> continue to call deserializeROSmessage for NodeEntitiesInfo

rmw_cyclonedds_cpp/include/rmw_cyclonedds_cpp/TypeSupport_impl.hpp

template<>
inline void deserialize_field<std::string>(
  const rosidl_typesupport_introspection_cpp::MessageMember * member,
  void * field,
  cycdeser & deser,
  bool call_new)
{
  new(field) std::string();
  
  deser >> *static_cast<std::string *>(field);

->

rmw_cyclonedds_cpp/include/rmw_cyclonedds_cpp/serdes.hpp

    inline cycdeser & operator>>(std::string & x) {deserialize(x); return *this;}
    inline void deserialize(std::string & x)
    {
      x = std::string(data + pos, sz - 1);          // it will reallocated if data+pos is long enough. We can print the x.data() address both at front and at back. if it's reallocated, the std::string of NodeEntitiesInfo in ParticipantEntitiesInfo must be deleted. (Notice that reallocation happened inside std::string.)

The related code is so much, I just provided a little here. Hope these can make you understand more clearly.

@iuhilnehc-ynos
Copy link
Author

@fujitatomoya

It's not related to

while (dds_take(sub->enth, &ros_message, &info, 1, 1) == 1) {

Please refer to
static void handle_ParticipantEntitiesInfo(dds_entity_t reader, void * arg)
{
static_cast<void>(reader);
rmw_context_impl_t * impl = static_cast<rmw_context_impl_t *>(arg);
ParticipantEntitiesInfo msg;
bool taken;
while (rmw_take(impl->common.sub, &msg, &taken, nullptr) == RMW_RET_OK && taken) {
// locally published data is filtered because of the subscription QoS
impl->common.graph_cache.update_participant_entities(msg);
}
}

@fujitatomoya
Copy link
Contributor

@iuhilnehc-ynos

The related code is so much, I just provided a little here. Hope these can make you understand more clearly.

okay i was digging deeper 😄 this helps, thanks

@fujitatomoya
Copy link
Contributor

I wonder if reallocation is expected to happen here,

x = std::string(data + pos, sz - 1);

it seems that bounded strings are intentionally used,
https://github.com/ros2/rmw_dds_common/blob/0c7fe8f37105351bbd1b6cea159f5c086a11c61b/rmw_dds_common/msg/NodeEntitiesInfo.msg#L1-L2

@ivanpauno
Copy link
Member

Inside rmw_take, we don't know if the msg is reused or not (pass the address of msg into rmw_take, and there is no flag/member to identify), so I think we can't destruct the msg if it's reused.

It's known that the passed message is a valid object (i.e.: it was at least default constructed).
Knowing that, all cases can be handled correctly.

Actually, using stack is also working but it needs to change the original logic.

That example shows the real problem.

The issue is that:

MessageType msg;
rmw_take(subscription, &msg, &taken, nullptr);
rmw_take(subscription, &msg, &taken, nullptr);

is leaking when message is not a POD data structure (because of the typesupport corrupting the previous object).

@ivanpauno
Copy link
Member

I think that the problem is that the typesupport is calling placement new over the memory of an existing object:


and thus leaks.
(calling placement new should not be needed in the typesupport implementation)

@iuhilnehc-ynos
Copy link
Author

calling placement new should not be needed in the typesupport implementation

If not calling placement new for field, this field can't be assigned with a std::string instance.

@ivanpauno
Copy link
Member

If not calling placement new for field, this field can't be assigned with a std::string instance.

why?

@iuhilnehc-ynos
Copy link
Author

// Because std::string is a complex datatype, we need to make sure that
// the memory is initialized to something reasonable before eventually
// passing it as a reference.
is the reason.

this filed for std::string here is resized at

vector->resize(vsize * align_int_(max_align, sub_members_size));
subros_message = reinterpret_cast<void *>(vector->data());
which is not a pointer of std::string.

@ivanpauno
Copy link
Member

this filed for std::string here is resized at

the code doing the resizing is broken, and the code doing placement new is broken.
As commented above, the passed message is a valid object, calling placement new on its memory is wrong.

e.g.:

# message A
string data
# message B
MessageA[] data
MessageB msg;
msg.data.resize(10);
rmw_take(subscription, &msg, taken, null);  // here a leak, because placement new is corrupting the old memory

I see this was copied from rmw_fastrtps_dynamic_cpp 🤦‍♂️, I will try to fix both.

@ivanpauno
Copy link
Member

@iuhilnehc-ynos for the moment, we can merge a fix as the one you proposed above:

  do {
    ParticipantEntitiesInfo msg;
    if (rmw_take(impl->common.sub, &msg, &taken, nullptr) == RMW_RET_OK && taken) {
      impl->common.graph_cache.update_participant_entities(msg);
    } else {
      break;
    }
  } while (1);

We can follow-up in the broken typesupport implementation later.

@ivanpauno
Copy link
Member

Maybe, the current one is also fine. Just add a TODO explaining why the workaround is needed.

@iuhilnehc-ynos
Copy link
Author

@ivanpauno

Thanks for your support. I prefer to use unique_ptr version because of reducing the number of calling constructor.
I'll update it later (maybe tomorrow).

Signed-off-by: Chen.Lihui <lihui.chen@sony.com>
@iuhilnehc-ynos
Copy link
Author

I prefer to use unique_ptr

Sorry, there will be a problem(not calling constructor after reset()), so I updated it by using the latter workaround.

Copy link
Contributor

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@ivanpauno ivanpauno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, @eboasson what do you think?

@eboasson
Copy link
Collaborator

@ivanpauno LGTM too!

@ivanpauno
Copy link
Member

ivanpauno commented Aug 26, 2020

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@ivanpauno ivanpauno merged commit 416c01d into ros2:master Aug 26, 2020
@ivanpauno
Copy link
Member

Thanks for the fix @iuhilnehc-ynos !

iuhilnehc-ynos pushed a commit to iuhilnehc-ynos/rmw_cyclonedds that referenced this pull request Aug 28, 2020
This reverts commit 416c01d.

Signed-off-by: Chen.Lihui <lihui.chen@sony.com>
iuhilnehc-ynos pushed a commit to iuhilnehc-ynos/rmw_cyclonedds that referenced this pull request Sep 1, 2020
This reverts commit 416c01d.

Signed-off-by: Chen.Lihui <lihui.chen@sony.com>
eboasson pushed a commit to eboasson/rmw_cyclonedds that referenced this pull request May 17, 2021
Signed-off-by: Chen.Lihui <lihui.chen@sony.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants