Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RUM-1462 Fix thread safety warnings #2056

Merged
merged 3 commits into from
May 29, 2024
Merged

Conversation

xgouchet
Copy link
Member

What does this PR do?

A rule we used for a while now is using the Thread annotation (e.g.: @UIThread, @WorkerThread, …) to ensure operations that should happen in a specific context didn't cross thread expectations.

Unfortunately we had a large number of @Suppress annotations and thread boundaries crossings in our code base.

This PR provides two things :

  • enhance the rule to detekt some known thread switching (e.g. using executors) to avoid the need of legitimate @Suppress annotations
  • fix places where needed

@xgouchet xgouchet requested review from a team as code owners May 28, 2024 09:13
@codecov-commenter
Copy link

codecov-commenter commented May 28, 2024

Codecov Report

Attention: Patch coverage is 97.05882% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 82.99%. Comparing base (237641e) to head (421d25b).

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #2056      +/-   ##
===========================================
- Coverage    83.16%   82.99%   -0.17%     
===========================================
  Files          494      494              
  Lines        17735    17727       -8     
  Branches      2684     2682       -2     
===========================================
- Hits         14748    14712      -36     
- Misses        2252     2279      +27     
- Partials       735      736       +1     
Files Coverage Δ
...ore/src/main/kotlin/com/datadog/android/Datadog.kt 76.25% <ø> (ø)
...src/main/kotlin/com/datadog/android/api/SdkCore.kt 25.00% <ø> (ø)
...in/com/datadog/android/api/feature/FeatureScope.kt 0.00% <ø> (ø)
...n/com/datadog/android/core/internal/CoreFeature.kt 87.39% <ø> (ø)
...n/com/datadog/android/core/internal/DatadogCore.kt 82.06% <100.00%> (+0.08%) ⬆️
...in/com/datadog/android/core/internal/SdkFeature.kt 89.10% <ø> (ø)
...l/net/info/BroadcastReceiverNetworkInfoProvider.kt 93.94% <ø> (ø)
...droid/core/internal/persistence/AbstractStorage.kt 100.00% <ø> (ø)
...d/core/internal/persistence/ConsentAwareStorage.kt 98.72% <100.00%> (+0.02%) ⬆️
...ence/file/advanced/ConsentAwareFileOrchestrator.kt 96.67% <ø> (ø)
... and 41 more

... and 24 files with indirect coverage changes

@xgouchet xgouchet force-pushed the xgouchet/RUM-1462/thread_safety branch from ef5f721 to 43d5f42 Compare May 28, 2024 10:14
@xgouchet xgouchet force-pushed the xgouchet/RUM-1462/thread_safety branch from 43d5f42 to d727a7e Compare May 28, 2024 12:11
coreFeature.deleteLastViewEvent()
@Suppress("ThreadSafety") // removal of the data is done in synchronous manner
coreFeature.deleteLastFatalAnrSent()
getPersistenceExecutorService().submitSafe("Clear all data", internalLogger) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this going to create issues if we expect this method to return in the same stack immediately ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no such expectations, and it would mostly create issues if called from the main thread on a running app.

synchronized(lockedBatches) {
lockedBatches.forEach {
deleteBatch(it, RemovalReason.Flushed)
executorService.submitSafe("ConsentAwareStorage.dropAll", internalLogger) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question as above ? What if we call this from an unit test tearDown or setup method ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Honestly, if we need to call this from a test, we can adapt the test to this call being async. We should not make things synchronous at runtime because of testing constraints.

@@ -728,6 +729,13 @@ internal class DatadogCoreTest {
)
val mockCoreFeature = mock<CoreFeature>()
testedCore.coreFeature = mockCoreFeature
val mockExecutorService: FlushableExecutorService = mock()
whenever(mockCoreFeature.persistenceExecutorService) doReturn mockExecutorService
whenever(mockExecutorService.submit(any())) doAnswer {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see so this is because we've made those methods async ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

Copy link
Member

@0xnm 0xnm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, but I'm curious about the behaviour of Datadog.clearAllData in iOS SDK.

@@ -22,6 +24,7 @@ interface FeatureScope {
* [DatadogContext] will have a state created at the moment this method is called, before the
* thread switch for the callback invocation.
*/
@AnyThread
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we make this one as implicit in the detector in order to reduce the number of sites where we should put it? like if method is not annotated, then it is AnyThread by default.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, I mostly added it to make it explicit while checking all thread safety warnings

Comment on lines -185 to 195
// TODO RUM-1462 address Thread safety
@Suppress("ThreadSafety") // removal of the data is done in synchronous manner
coreFeature.deleteLastViewEvent()
@Suppress("ThreadSafety") // removal of the data is done in synchronous manner
coreFeature.deleteLastFatalAnrSent()
getPersistenceExecutorService().submitSafe("Clear all data", internalLogger) {
coreFeature.deleteLastViewEvent()
coreFeature.deleteLastFatalAnrSent()
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is done in sync way by design, so that the call on the caller side is blocking (even though it may happen on the main thread) and the same ideology as with features.values.forEach { it.clearAllData() } above. Is it blocking in iOS SDK?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checked, on iOS the call is async

@@ -59,7 +69,7 @@ class ThreadSafety : Rule() {
}
}

private var parentFunGroup: ThreadGroup = ThreadGroup.UNKNOWN
private val parentFunGroupStack: MutableList<ThreadGroup> = mutableListOf(ThreadGroup.UNKNOWN)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: since we are treating it like a stack, we can use Stack type here (or Deque -> LinkedList) which will allows us simply use push/pop/peek methods instead of add(index, ...) and removeAt(index, ...)

telemetryEventHandler.handleEvent(event, writer)
// If an issue arise, use the unbound logger to prevent a cycle that would lead
// to a stack overflow
executorService.submitSafe("Telemetry event handling", InternalLogger.UNBOUND) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to create a dedicated thread for the telemetry processing? RUM monitor thread may already be busy and we don't need processing thread to be especially this one.

on another hand, even though TelemetryEventHandler.handleEvent is annotated with @WorkerThread, in reality this annotation may be redundant, because the amount of work done there before the switch to withWriteContext is very small.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, I'll change that.

@xgouchet xgouchet requested a review from 0xnm May 29, 2024 09:27
@xgouchet xgouchet merged commit 8e9ada1 into develop May 29, 2024
20 checks passed
@xgouchet xgouchet deleted the xgouchet/RUM-1462/thread_safety branch May 29, 2024 11:34
@xgouchet xgouchet added this to the 2.11.x milestone Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants