Skip to content
This repository was archived by the owner on Feb 3, 2025. It is now read-only.

Fix performance metrics deadlock #2917

Merged
merged 2 commits into from
Jan 21, 2021

Conversation

scpeters
Copy link
Member

As reported in #2902, there is a deadlock in SensorManager when subscribing to /gazebo/performance_metrics in a world that has sensors in more than one container (i.e. more than one category of sensors). This can be reproduced by loading the pr2.world with --lockstep and then viewing the /gazebo/performance_metrics topic in the Topic Viewer. I've added a regression test in a9e1ee3 that reproduces this deadlock.

To fix the deadlock, I noticed that the PublishPerformanceMetrics is called by each SensorContainer with protection by a mutex, which causes the deadlock. Since that function is not a member function and uses only global variables, it does not need to be run by separate threads. So I've moved it to SensorManager::Update in 0f48128, and that seems to fix the deadlock.

Load the pr2 world with --lockstep and then subscribe to
/gazebo/performance_metrics to reproduce the deadlock.

Signed-off-by: Steve Peters <scpeters@openrobotics.org>
Currently the PublishPerformanceMetrics function is called within
SensorManager.cc by each SensorContainer with protection by the
mutex of each SensorContainer. This leads to a deadlock when a
world has sensors in more than one container (i.e. more than one
category of sensors) as described in gazebosim#2902.

Since the PublishPerformanceMetrics function is not a member
function of SensorContainer and uses only global variables,
the function can be called by the SensorManager thread instead
of by each SensorContainer thread. This change will avoid the
deadlock.

Signed-off-by: Steve Peters <scpeters@openrobotics.org>
@scpeters scpeters merged commit ca1709d into gazebosim:gazebo9 Jan 21, 2021
@scpeters scpeters deleted the performance_metrics_deadlock branch January 21, 2021 07:53
scpeters added a commit to scpeters/gazebo that referenced this pull request Jan 26, 2021
* Add test for performance metrics deadlock

Load the pr2 world with --lockstep and then subscribe to
/gazebo/performance_metrics to reproduce the deadlock.

* Fix deadlock when publishing performance metrics

Currently the PublishPerformanceMetrics function is called within
SensorManager.cc by each SensorContainer with protection by the
mutex of each SensorContainer. This leads to a deadlock when a
world has sensors in more than one container (i.e. more than one
category of sensors) as described in gazebosim#2902.

Since the PublishPerformanceMetrics function is not a member
function of SensorContainer and uses only global variables,
the function can be called by the SensorManager thread instead
of by each SensorContainer thread. This change will avoid the
deadlock.

Signed-off-by: Steve Peters <scpeters@openrobotics.org>
scpeters added a commit to scpeters/gazebo that referenced this pull request Jan 26, 2021
* Add test for performance metrics deadlock

Load the pr2 world with --lockstep and then subscribe to
/gazebo/performance_metrics to reproduce the deadlock.

* Fix deadlock when publishing performance metrics

Currently the PublishPerformanceMetrics function is called within
SensorManager.cc by each SensorContainer with protection by the
mutex of each SensorContainer. This leads to a deadlock when a
world has sensors in more than one container (i.e. more than one
category of sensors) as described in gazebosim#2902.

Since the PublishPerformanceMetrics function is not a member
function of SensorContainer and uses only global variables,
the function can be called by the SensorManager thread instead
of by each SensorContainer thread. This change will avoid the
deadlock.

Signed-off-by: Steve Peters <scpeters@openrobotics.org>
scpeters added a commit that referenced this pull request Jan 28, 2021
* Add test for performance metrics deadlock

Load the pr2 world with --lockstep and then subscribe to
/gazebo/performance_metrics to reproduce the deadlock.

* Fix deadlock when publishing performance metrics

Currently the PublishPerformanceMetrics function is called within
SensorManager.cc by each SensorContainer with protection by the
mutex of each SensorContainer. This leads to a deadlock when a
world has sensors in more than one container (i.e. more than one
category of sensors) as described in #2902.

Since the PublishPerformanceMetrics function is not a member
function of SensorContainer and uses only global variables,
the function can be called by the SensorManager thread instead
of by each SensorContainer thread. This change will avoid the
deadlock.

Signed-off-by: Steve Peters <scpeters@openrobotics.org>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants