conductor: add heartbeat monitor for background workers #1023

vrmiguel · 2024-10-21T16:54:53Z

No description provided.

vrmiguel · 2024-10-21T17:19:49Z

conductor/src/heartbeat_monitor.rs

+
+#[derive(Clone)]
+pub struct HeartbeatUpdater {
+    shared_heartbeat: Arc<AtomicU64>,


Went with a lock-free approach rather than something like Arc<RwLock<Instant>>, as the issue with workers getting stuck might stem from lock contention, so adding more lock contention probably wouldn't help

vrmiguel · 2024-10-21T17:20:43Z

conductor/src/heartbeat_monitor.rs

+        self.shared_heartbeat
+            .store(current_timestamp(), Ordering::Relaxed);
+    }
+}


I think it's better to keep this as something manually updated rather than being updated by yet another background thread

ChuckHend · 2024-10-22T08:54:39Z

conductor/src/heartbeat_monitor.rs

+
+        if current_time >= last_update {
+            let elapsed = Duration::from_secs(current_time - last_update);
+            elapsed < self.update_interval * 2


what is the * 2 part for?

To check if there's been an update within twice the expected timeout duration

I figured that, but why 2x? Maybe that should just be part of the update interval config?

Keep in mind there is healthcheck config on kubernetes side too. Like how many consecutive failed requests will restart the pod. I

Maybe we could replace self.update_interval by timeout_interval and then just use. elapsed < self.timeout_interval, what do you think?

conductor: add heartbeat monitor for background workers

81477ef

vrmiguel commented Oct 21, 2024

View reviewed changes

vrmiguel force-pushed the pro-2174 branch from 6985adb to 7a28de0 Compare October 21, 2024 17:43

vrmiguel marked this pull request as ready for review October 21, 2024 18:22

vrmiguel requested review from nhudson, ianstanton and ChuckHend as code owners October 21, 2024 18:22

Integrate heartbeat monitor into metrics reporter

d8fea11

vrmiguel force-pushed the pro-2174 branch from 7a28de0 to d8fea11 Compare October 21, 2024 19:16

ChuckHend reviewed Oct 22, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

conductor: add heartbeat monitor for background workers #1023

conductor: add heartbeat monitor for background workers #1023

vrmiguel commented Oct 21, 2024

vrmiguel Oct 21, 2024

vrmiguel Oct 21, 2024

ChuckHend Oct 22, 2024

vrmiguel Oct 22, 2024

ChuckHend Oct 22, 2024

vrmiguel Oct 22, 2024

conductor: add heartbeat monitor for background workers #1023

Are you sure you want to change the base?

conductor: add heartbeat monitor for background workers #1023

Conversation

vrmiguel commented Oct 21, 2024

vrmiguel Oct 21, 2024

Choose a reason for hiding this comment

vrmiguel Oct 21, 2024

Choose a reason for hiding this comment

ChuckHend Oct 22, 2024

Choose a reason for hiding this comment

vrmiguel Oct 22, 2024

Choose a reason for hiding this comment

ChuckHend Oct 22, 2024

Choose a reason for hiding this comment

vrmiguel Oct 22, 2024

Choose a reason for hiding this comment