Skip to content

Conversation

@noahsmartin
Copy link
Contributor

@noahsmartin noahsmartin commented Sep 22, 2025

This introduces a new HangTracker class that observers the run loop and notifies anything registered with it when the runloop is late. The idea is to eventually replace all uses of CADisplayLink with this. They are not exactly the same, display link works on vsync events and a runloop observer hooks directly into the runloop lifecycle.

In this PR I use the new HangTracker in the watchdog integration. This seemed like a pretty small use of display links and was a good opportunity to have a first crack at removing them. I also don't think this change requires any kind of new opt-in options, since it's all internal to how the integration works.

The SDK default behavior before this change is to enable the watchdog integration, which means by default the SDK spins the CPU in the background constantly waking up to process these vsync events. Since we are in the final stretch of quality quarter I wanted to get rid of this behavior and replace it with a passive observer, like this.

A few interesting implementation details:

  • The hang tracker turns itself on and off by removing the runloop observer when nothing is subscribing to changes anymore.
  • The majority of the code is Swift only, with a small ObjcBridge class wrapping the main implementation
  • I used protocol based decency injection rather than subclassing, since this code can now be fully "swifty"

I also manually tested in the swift sample app by making sure when the debugger wasn't attached watchdog events still get sent when I force-quite the app with Xcode. Also verified that if I simulate a hang with the sample app and then force quit while the hang is in progress, watchdog events are not sent.

Closes #6252

@noahsmartin noahsmartin force-pushed the hangTrackerAbstraction branch from db41d1f to c7cfc40 Compare September 22, 2025 19:19
@codecov
Copy link

codecov bot commented Sep 22, 2025

⚠️ JUnit XML file not found

The CLI was unable to find any JUnit XML files to upload.
For more help, visit our troubleshooting guide.

@noahsmartin noahsmartin force-pushed the hangTrackerAbstraction branch 3 times, most recently from 29cd58b to 418cf15 Compare September 22, 2025 20:07
@github-actions
Copy link
Contributor

github-actions bot commented Sep 22, 2025

Performance metrics 🚀

  Plain With Sentry Diff
Startup time 1236.58 ms 1265.17 ms 28.59 ms
Size 23.75 KiB 996.49 KiB 972.75 KiB

Baseline results on branch: main

Startup times

Revision Plain With Sentry Diff
1fecbb8 1242.78 ms 1265.40 ms 22.62 ms
d05d866 1211.78 ms 1230.96 ms 19.18 ms
32e7197 1226.91 ms 1245.48 ms 18.56 ms
0e9c5ae 1226.10 ms 1254.14 ms 28.04 ms
397b9c9 1230.23 ms 1249.29 ms 19.06 ms
718c372 1220.09 ms 1235.15 ms 15.06 ms
45482a6 1225.88 ms 1254.27 ms 28.39 ms
f97a070 1218.88 ms 1253.12 ms 34.24 ms
5846fc5 1223.25 ms 1248.79 ms 25.54 ms
43597ba 1214.88 ms 1243.52 ms 28.65 ms

App size

Revision Plain With Sentry Diff
1fecbb8 23.75 KiB 969.28 KiB 945.53 KiB
d05d866 23.75 KiB 878.60 KiB 854.85 KiB
32e7197 23.75 KiB 866.69 KiB 842.94 KiB
0e9c5ae 23.75 KiB 969.29 KiB 945.54 KiB
397b9c9 23.75 KiB 959.44 KiB 935.70 KiB
718c372 23.75 KiB 920.65 KiB 896.90 KiB
45482a6 23.75 KiB 919.91 KiB 896.16 KiB
f97a070 23.75 KiB 858.68 KiB 834.93 KiB
5846fc5 23.75 KiB 912.77 KiB 889.02 KiB
43597ba 23.75 KiB 880.32 KiB 856.58 KiB

Previous results on branch: hangTrackerAbstraction

Startup times

Revision Plain With Sentry Diff
ac854e2 1220.81 ms 1250.20 ms 29.39 ms
1e7ad41 1245.79 ms 1280.67 ms 34.88 ms
547ce2a 1226.90 ms 1257.82 ms 30.92 ms
fee2eea 1225.57 ms 1243.82 ms 18.25 ms

App size

Revision Plain With Sentry Diff
ac854e2 23.75 KiB 995.72 KiB 971.97 KiB
1e7ad41 23.75 KiB 994.33 KiB 970.58 KiB
547ce2a 23.75 KiB 995.83 KiB 972.08 KiB
fee2eea 23.75 KiB 996.53 KiB 972.78 KiB

@noahsmartin noahsmartin changed the title Hang tracker abstraction ref: Use RunLoop observer for watchdog detection Sep 23, 2025
@noahsmartin noahsmartin force-pushed the hangTrackerAbstraction branch 7 times, most recently from d2065b3 to ec5403e Compare September 23, 2025 23:06
Comment on lines +76 to +88
func removeLateRunLoopObserver(id: UUID) {
queue.async { [weak self] in
guard let self else { return }
lateRunLoop.removeValue(forKey: id)
if lateRunLoop.isEmpty {
DispatchQueue.main.async { [weak self] in
if self?.finishedRunLoop.isEmpty ?? false {
self?.stop()
}
}
}
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential bug: Inconsistent threading in observer management creates race conditions. addLateRunLoopObserver uses async dispatch while addFinishedRunLoopObserver is synchronous, leading to potential lifecycle issues.
  • Description: The HangTracker class exhibits inconsistent threading patterns for managing observers. The addLateRunLoopObserver method uses a background queue and then dispatches back to the main queue, while addFinishedRunLoopObserver modifies state directly on the main thread. This inconsistency, coupled with complex removal logic involving cross-queue state checks (e.g., lateRunLoop.isEmpty, finishedRunLoop.isEmpty), creates race conditions. These races can lead to the runloop observer not being stopped when all observers are removed, causing resource leaks, or being stopped prematurely, which would disable hang detection.

  • Suggested fix: Ensure all observer management operations are synchronized. Either perform all modifications to lateRunLoop and finishedRunLoop on the main thread directly, or consistently dispatch all add/remove operations to the private queue. This will eliminate race conditions caused by cross-queue state checks and inconsistent execution contexts.
    severity: 0.7, confidence: 0.9

Did we get this right? 👍 / 👎 to inform future reviews.

}
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Race Conditions in HangTracker's Thread Safety

Race conditions exist in HangTracker due to inconsistent thread safety. add/removeFinishedRunLoopObserver access finishedRunLoop directly, which is unsafe as the dictionary is also accessed on the main thread. Additionally, the remove methods' logic for stopping the tracker involves multiple asynchronous dispatches to check finishedRunLoop and lateRunLoop emptiness, creating a race that can lead to incorrect start/stop decisions.

Fix in Cursor Fix in Web

@philprime
Copy link
Member

What's the status of this PR? I see it is outdated and has merge conflicts due to being open for a while now.

Copy link
Member

@philprime philprime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost LGTM, left some high priority comments

self.removeObserver = removeObserver
self.queue = queue
#if canImport(UIKit) && !SENTRY_NO_UIKIT && (!swift(>=5.9) || !os(visionOS)) && !os(watchOS)
var maxFPS = 60.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m: On ProMotion displays we have higher frame rates than 60, e.g. when switching apps it can go up to 120. Do we need to consider that here?

queue.async { [weak self] in
guard let self else { return }
lateRunLoop.removeValue(forKey: id)
if lateRunLoop.isEmpty {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

h: Seer already mentioned it, I believe the two nested async statements could allow a race-condition

func removeFinishedRunLoopObserver(id: UUID) {
finishedRunLoop.removeValue(forKey: id)
if finishedRunLoop.isEmpty {
queue.async { [weak self] in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

h: Seer already mentioned it, I believe the two nested async statements could allow a race-condition

}

@objc public func start() {
dispatchPrecondition(condition: .onQueue(.main))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

h: We do not use dispatchPrecondition in the SDK because it causes runtime crashes if used incorrectly. Instead we should do a main thread check and fail softly.

}

@objc public func stop() {
dispatchPrecondition(condition: .onQueue(.main))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

h: See above wrt. not using dispatchPrecondition

private var hangId = UUID()

private func waitForHang(semaphore: DispatchSemaphore, started: TimeInterval, isStarting: Bool) {
dispatchPrecondition(condition: .onQueue(queue))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

h: We do not use dispatchPrecondition in the SDK because it causes runtime crashes. Please use a soft or dispatching to main

}

private func stop() {
dispatchPrecondition(condition: .onQueue(.main))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

h: We do not use dispatchPrecondition in the SDK because it causes runtime crashes. Please use a soft or dispatching to main

@_spi(Private) import SentryTestUtils
import XCTest

struct TestRunLoopObserver: RunLoopObserver { }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m: While I believe this to be a valid unit test, I am not 100% sure if I understand the mocking here. Do we have a test case using a real runloop to test this feature?

@philprime philprime self-assigned this Jan 14, 2026
@philprime
Copy link
Member

FYI @noahsmartin thanks for creating this draft and doing the heavy lifting.
I am going to take over this PR to get it finished.

@github-actions
Copy link
Contributor

Semver Impact of This PR

🟢 Patch (bug fixes)

📋 Changelog Preview

This is how your changes will appear in the changelog.
Entries from this PR are highlighted with a left border (blockquote style).


New Features ✨

  • Add isiOSAppOnVisionOS, isiOSAppOnMac, isMacCatalystApp to device context by philprime in #6939

Bug Fixes 🐛

  • (logs) Use sendDefaultPii and span_id for attributes by philprime in #7055
  • Fix incorrect variable assignment for 'sampled' key by xjshi in #7120
  • Mark dark theme deprecated by noahsmartin in #7114
  • Update raw_description in runtime context for Mac Catalyst App by philprime in #7082
  • Use correct parsing for stackframes by noahsmartin in #6908
  • Transport correctly handling 4xx and 5xx by dfed in #6618

Build / dependencies / internal 🔧

Deps

  • Bump ruby/setup-ruby from 1.279.0 to 1.281.0 by dependabot in #7160
  • Bump getsentry/craft from 2.18.1 to 2.18.3 by dependabot in #7161
  • Bump getsentry/craft/.github/workflows/changelog-preview.yml from 2.18.1 to 2.18.3 by dependabot in #7159
  • Bump ruby/setup-ruby from 1.276.0 to 1.279.0 by dependabot in #7117
  • Bump mikepenz/action-junit-report from 6.0.1 to 6.1.0 by dependabot in #7116
  • Update swiftlint version by github-actions in #7109
  • Bump ruby/setup-ruby from 1.275.0 to 1.276.0 by dependabot in #7103
  • Bump codecov/test-results-action from 1.1.1 to 1.2.1 by itaybre in #7087
  • Bump ruby/setup-ruby from 1.270.0 to 1.275.0 by itaybre in #7088
  • Bump peter-evans/create-pull-request from 7.0.11 to 8.0.0 by dependabot in #7084
  • Bump actions/download-artifact from 6 to 7 by dependabot in #7048
  • Bump aws-sdk-s3 from 1.205.0 to 1.208.0 by dependabot in #7074
  • Bump ruby/setup-ruby from 1.269.0 to 1.270.0 by dependabot in #7049
  • Update clang-format version by github-actions in #7056
  • Bump actions/cache from 4 to 5 by dependabot in #7052
  • Bump actions/upload-artifact from 5 to 6 by dependabot in #7050
  • Bump codecov/codecov-action from 5.5.1 to 5.5.2 by dependabot in #7051

Other

  • (dx) Add structured Makefile with usage description by philprime in #7129
  • (release) Switch from action-prepare-release to Craft (minimal) by BYK in #7153
  • Use RunLoop observer for watchdog detection by noahsmartin in #6237
  • Update iOS test destination OS version to 18.5 in fast-pr-checks workflow by itaybre in #7168
  • Fix typos in comments in multiple files v2 by philipphofmann in #7139
  • Run visionOS tests on Cirrus Runners + Boot simulator by itaybre in #7147
  • Skip jobs/steps that require secrets for non contributors by itaybre in #7124
  • Add attributable protocol for typed attribute values by philprime in #7077
  • Allow alpha releases on RNSentry.podspec for Cross Platform Test by itaybre in #7130
  • Remove swift5.9 checks by itaybre in #7098
  • Remove duplicate file in project by itaybre in #7093
  • Convert SentryMetricKitIntegration to Swift by noahsmartin in #7076
  • Removes HybridSDK subspec by itaybre in #7019
  • Move testRemoveImageFromTail to flaky plan by itaybre in #7041
  • Use at least xcode 16 for all jobs by itaybre in #7012
  • Cleanup file filter for required files modified by itaybre in #7031
  • Remove assembly workflow files from UI test filter by itaybre in #7030
  • Bumps macOS-14 runner to macOS-15 by itaybre in #7029
  • Ensure required simulators are loaded for all platforms by itaybre in #7022

Other

  • test: Add Options Documentation Sync Tests by philipphofmann in #7075

🤖 This preview updates automatically when you update the PR.

@philprime philprime marked this pull request as draft January 15, 2026 16:43

let finished = tracker.addFinishedRunLoopObserver { [weak self] _ in
self?.hangStopped()
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hangStopped called on every run loop iteration unnecessarily

Medium Severity

The hangStopped callback is invoked unconditionally on every run loop iteration completion, not just when a hang actually resolves. This triggers updateAppState to set isANROngoing = NO repeatedly, even during normal operation when no hang occurred. The callback should only be invoked after hangStarted was previously called. The currentHangId is tracked but not used to gate hangStopped calls, causing unnecessary state updates and semantic incorrectness.

Fix in Cursor Fix in Web

}

private func stop() {
dispatchPrecondition(condition: .onQueue(.main))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SDK prohibits dispatchPrecondition due to crash risk

Medium Severity · Bugbot Rules

The PR discussion explicitly states "We do not use dispatchPrecondition in the SDK because it causes runtime crashes if used incorrectly. Instead we should do a main thread check and fail softly." This pattern is used in 4 locations: stop(), waitForHang(), and both start() and stop() methods of the Swift integration class. Per SDK guidelines, these should use soft thread checks that log warnings rather than assertions that crash.

Additional Locations (2)

Fix in Cursor Fix in Web

}
removeObserver(CFRunLoopGetMain(), observer, .commonModes)
self.observer = nil
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Background hang detection loop never terminates after stop

Medium Severity

The stop() method removes the run loop observer but doesn't signal the semaphore to terminate waitForHang. Once started, waitForHang recursively calls itself, only exiting when the semaphore is signaled. After stop(), the observer is gone so .beforeWaiting never fires to signal. The result is an infinite loop on the background queue that keeps timing out and recursing, consuming CPU indefinitely even after the tracker is supposed to be stopped.

Additional Locations (1)

Fix in Cursor Fix in Web

initWithHangTrackerBridge:SentryDependencyContainer.sharedInstance.hangTracker
timeoutInterval:options.appHangTimeoutInterval
hangStarted:^{ [dispatchQueueWrapper dispatchAsyncWithBlock:^{ [self hangStarted]; }]; }
hangStopped:^{ [dispatchQueueWrapper dispatchAsyncWithBlock:^{ [self hangStopped]; }]; }];
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retain cycle causes memory leak in integration

Medium Severity

The hangStarted and hangStopped blocks passed to swiftImpl capture self strongly via [self hangStarted] and [self hangStopped]. Since self holds a strong reference to swiftImpl, and swiftImpl holds these closures which capture self, this creates a retain cycle. In uninstall, swiftImpl is not set to nil, so the cycle is never broken and the integration object leaks when the SDK is closed.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ref: Use RunLoop observer for watchdog detection

3 participants