-
-
Notifications
You must be signed in to change notification settings - Fork 376
ref: Use RunLoop observer for watchdog detection #6237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
db41d1f to
c7cfc40
Compare
|
29cd58b to
418cf15
Compare
Performance metrics 🚀
|
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| 1fecbb8 | 1242.78 ms | 1265.40 ms | 22.62 ms |
| d05d866 | 1211.78 ms | 1230.96 ms | 19.18 ms |
| 32e7197 | 1226.91 ms | 1245.48 ms | 18.56 ms |
| 0e9c5ae | 1226.10 ms | 1254.14 ms | 28.04 ms |
| 397b9c9 | 1230.23 ms | 1249.29 ms | 19.06 ms |
| 718c372 | 1220.09 ms | 1235.15 ms | 15.06 ms |
| 45482a6 | 1225.88 ms | 1254.27 ms | 28.39 ms |
| f97a070 | 1218.88 ms | 1253.12 ms | 34.24 ms |
| 5846fc5 | 1223.25 ms | 1248.79 ms | 25.54 ms |
| 43597ba | 1214.88 ms | 1243.52 ms | 28.65 ms |
App size
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| 1fecbb8 | 23.75 KiB | 969.28 KiB | 945.53 KiB |
| d05d866 | 23.75 KiB | 878.60 KiB | 854.85 KiB |
| 32e7197 | 23.75 KiB | 866.69 KiB | 842.94 KiB |
| 0e9c5ae | 23.75 KiB | 969.29 KiB | 945.54 KiB |
| 397b9c9 | 23.75 KiB | 959.44 KiB | 935.70 KiB |
| 718c372 | 23.75 KiB | 920.65 KiB | 896.90 KiB |
| 45482a6 | 23.75 KiB | 919.91 KiB | 896.16 KiB |
| f97a070 | 23.75 KiB | 858.68 KiB | 834.93 KiB |
| 5846fc5 | 23.75 KiB | 912.77 KiB | 889.02 KiB |
| 43597ba | 23.75 KiB | 880.32 KiB | 856.58 KiB |
Previous results on branch: hangTrackerAbstraction
Startup times
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| ac854e2 | 1220.81 ms | 1250.20 ms | 29.39 ms |
| 1e7ad41 | 1245.79 ms | 1280.67 ms | 34.88 ms |
| 547ce2a | 1226.90 ms | 1257.82 ms | 30.92 ms |
| fee2eea | 1225.57 ms | 1243.82 ms | 18.25 ms |
App size
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| ac854e2 | 23.75 KiB | 995.72 KiB | 971.97 KiB |
| 1e7ad41 | 23.75 KiB | 994.33 KiB | 970.58 KiB |
| 547ce2a | 23.75 KiB | 995.83 KiB | 972.08 KiB |
| fee2eea | 23.75 KiB | 996.53 KiB | 972.78 KiB |
d2065b3 to
ec5403e
Compare
ec5403e to
915a740
Compare
| func removeLateRunLoopObserver(id: UUID) { | ||
| queue.async { [weak self] in | ||
| guard let self else { return } | ||
| lateRunLoop.removeValue(forKey: id) | ||
| if lateRunLoop.isEmpty { | ||
| DispatchQueue.main.async { [weak self] in | ||
| if self?.finishedRunLoop.isEmpty ?? false { | ||
| self?.stop() | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential bug: Inconsistent threading in observer management creates race conditions. addLateRunLoopObserver uses async dispatch while addFinishedRunLoopObserver is synchronous, leading to potential lifecycle issues.
-
Description: The
HangTrackerclass exhibits inconsistent threading patterns for managing observers. TheaddLateRunLoopObservermethod uses a background queue and then dispatches back to the main queue, whileaddFinishedRunLoopObservermodifies state directly on the main thread. This inconsistency, coupled with complex removal logic involving cross-queue state checks (e.g.,lateRunLoop.isEmpty,finishedRunLoop.isEmpty), creates race conditions. These races can lead to the runloop observer not being stopped when all observers are removed, causing resource leaks, or being stopped prematurely, which would disable hang detection. -
Suggested fix: Ensure all observer management operations are synchronized. Either perform all modifications to
lateRunLoopandfinishedRunLoopon the main thread directly, or consistently dispatch all add/remove operations to the privatequeue. This will eliminate race conditions caused by cross-queue state checks and inconsistent execution contexts.
severity: 0.7, confidence: 0.9
Did we get this right? 👍 / 👎 to inform future reviews.
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Race Conditions in HangTracker's Thread Safety
Race conditions exist in HangTracker due to inconsistent thread safety. add/removeFinishedRunLoopObserver access finishedRunLoop directly, which is unsafe as the dictionary is also accessed on the main thread. Additionally, the remove methods' logic for stopping the tracker involves multiple asynchronous dispatches to check finishedRunLoop and lateRunLoop emptiness, creating a race that can lead to incorrect start/stop decisions.
|
What's the status of this PR? I see it is outdated and has merge conflicts due to being open for a while now. |
philprime
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost LGTM, left some high priority comments
| self.removeObserver = removeObserver | ||
| self.queue = queue | ||
| #if canImport(UIKit) && !SENTRY_NO_UIKIT && (!swift(>=5.9) || !os(visionOS)) && !os(watchOS) | ||
| var maxFPS = 60.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
m: On ProMotion displays we have higher frame rates than 60, e.g. when switching apps it can go up to 120. Do we need to consider that here?
| queue.async { [weak self] in | ||
| guard let self else { return } | ||
| lateRunLoop.removeValue(forKey: id) | ||
| if lateRunLoop.isEmpty { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
h: Seer already mentioned it, I believe the two nested async statements could allow a race-condition
| func removeFinishedRunLoopObserver(id: UUID) { | ||
| finishedRunLoop.removeValue(forKey: id) | ||
| if finishedRunLoop.isEmpty { | ||
| queue.async { [weak self] in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
h: Seer already mentioned it, I believe the two nested async statements could allow a race-condition
| } | ||
|
|
||
| @objc public func start() { | ||
| dispatchPrecondition(condition: .onQueue(.main)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
h: We do not use dispatchPrecondition in the SDK because it causes runtime crashes if used incorrectly. Instead we should do a main thread check and fail softly.
| } | ||
|
|
||
| @objc public func stop() { | ||
| dispatchPrecondition(condition: .onQueue(.main)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
h: See above wrt. not using dispatchPrecondition
| private var hangId = UUID() | ||
|
|
||
| private func waitForHang(semaphore: DispatchSemaphore, started: TimeInterval, isStarting: Bool) { | ||
| dispatchPrecondition(condition: .onQueue(queue)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
h: We do not use dispatchPrecondition in the SDK because it causes runtime crashes. Please use a soft or dispatching to main
| } | ||
|
|
||
| private func stop() { | ||
| dispatchPrecondition(condition: .onQueue(.main)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
h: We do not use dispatchPrecondition in the SDK because it causes runtime crashes. Please use a soft or dispatching to main
| @_spi(Private) import SentryTestUtils | ||
| import XCTest | ||
|
|
||
| struct TestRunLoopObserver: RunLoopObserver { } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
m: While I believe this to be a valid unit test, I am not 100% sure if I understand the mocking here. Do we have a test case using a real runloop to test this feature?
|
FYI @noahsmartin thanks for creating this draft and doing the heavy lifting. |
Semver Impact of This PR🟢 Patch (bug fixes) 📋 Changelog PreviewThis is how your changes will appear in the changelog. New Features ✨
Bug Fixes 🐛
Build / dependencies / internal 🔧Deps
Other
Other
🤖 This preview updates automatically when you update the PR. |
|
|
||
| let finished = tracker.addFinishedRunLoopObserver { [weak self] _ in | ||
| self?.hangStopped() | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hangStopped called on every run loop iteration unnecessarily
Medium Severity
The hangStopped callback is invoked unconditionally on every run loop iteration completion, not just when a hang actually resolves. This triggers updateAppState to set isANROngoing = NO repeatedly, even during normal operation when no hang occurred. The callback should only be invoked after hangStarted was previously called. The currentHangId is tracked but not used to gate hangStopped calls, causing unnecessary state updates and semantic incorrectness.
| } | ||
|
|
||
| private func stop() { | ||
| dispatchPrecondition(condition: .onQueue(.main)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SDK prohibits dispatchPrecondition due to crash risk
Medium Severity · Bugbot Rules
The PR discussion explicitly states "We do not use dispatchPrecondition in the SDK because it causes runtime crashes if used incorrectly. Instead we should do a main thread check and fail softly." This pattern is used in 4 locations: stop(), waitForHang(), and both start() and stop() methods of the Swift integration class. Per SDK guidelines, these should use soft thread checks that log warnings rather than assertions that crash.
Additional Locations (2)
| } | ||
| removeObserver(CFRunLoopGetMain(), observer, .commonModes) | ||
| self.observer = nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Background hang detection loop never terminates after stop
Medium Severity
The stop() method removes the run loop observer but doesn't signal the semaphore to terminate waitForHang. Once started, waitForHang recursively calls itself, only exiting when the semaphore is signaled. After stop(), the observer is gone so .beforeWaiting never fires to signal. The result is an infinite loop on the background queue that keeps timing out and recursing, consuming CPU indefinitely even after the tracker is supposed to be stopped.
Additional Locations (1)
| initWithHangTrackerBridge:SentryDependencyContainer.sharedInstance.hangTracker | ||
| timeoutInterval:options.appHangTimeoutInterval | ||
| hangStarted:^{ [dispatchQueueWrapper dispatchAsyncWithBlock:^{ [self hangStarted]; }]; } | ||
| hangStopped:^{ [dispatchQueueWrapper dispatchAsyncWithBlock:^{ [self hangStopped]; }]; }]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Retain cycle causes memory leak in integration
Medium Severity
The hangStarted and hangStopped blocks passed to swiftImpl capture self strongly via [self hangStarted] and [self hangStopped]. Since self holds a strong reference to swiftImpl, and swiftImpl holds these closures which capture self, this creates a retain cycle. In uninstall, swiftImpl is not set to nil, so the cycle is never broken and the integration object leaks when the SDK is closed.
This introduces a new
HangTrackerclass that observers the run loop and notifies anything registered with it when the runloop is late. The idea is to eventually replace all uses of CADisplayLink with this. They are not exactly the same, display link works on vsync events and a runloop observer hooks directly into the runloop lifecycle.In this PR I use the new HangTracker in the watchdog integration. This seemed like a pretty small use of display links and was a good opportunity to have a first crack at removing them. I also don't think this change requires any kind of new opt-in options, since it's all internal to how the integration works.
The SDK default behavior before this change is to enable the watchdog integration, which means by default the SDK spins the CPU in the background constantly waking up to process these vsync events. Since we are in the final stretch of quality quarter I wanted to get rid of this behavior and replace it with a passive observer, like this.
A few interesting implementation details:
I also manually tested in the swift sample app by making sure when the debugger wasn't attached watchdog events still get sent when I force-quite the app with Xcode. Also verified that if I simulate a hang with the sample app and then force quit while the hang is in progress, watchdog events are not sent.
Closes #6252