By using read_line, we make stderr similar to stdout.
Moreover, we can replicate the functionality in stdout in stderr and retrieve
the full traceback for python nodes. This way, we can print python errors in the daemon logs
without having to look at the log files.
The newline between each line in the logs seems to have been removed.
I have readded it, so that logs look like this
```bash
─────┬───────────────────────────────────────────────────────────────
│ Logs from webcam.
─────┬───────────────────────────────────────────────────────────────
1 │ Traceback (most recent call last):
2 │ File "<string>", line 1, in <module>
3 │ RuntimeError: Dora Runtime raised an error.
4 │
5 │ Caused by:
6 │ 0: main task failed
7 │ 1: failed to init an operator
8 │ 2: failed to init python operator
9 │ 3: Traceback (most recent call last):
10 │ File "/home/peter/abc_project/webcam.py", line 7, in <module>
11 │ import cv2
12 │
13 │
14 │ ModuleNotFoundError: No module named 'cv2'
15 │
16 │ Location:
17 │ binaries/runtime/src/operator/python.rs:28:9
─────┴───────────────────────────────────────────────────────────
```
instead of
```bash
─────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│ Logs from webcam.
─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ Exception ignored in: <function Operator.__del__ at 0x7fa0339ff7f0>Traceback (most recent call last): File "/home/peter/abc_project/webcam.py", line 53, in __del__ self.video_capture.release()AttributeError: 'Operator' object has no attribute 'video_capture'Traceback (most recent call last): File "<string>", line 1, in <module>RuntimeError: Dora Runtime raised an error.Caused by: 0: main task failed 1: failed to init an operator 2: failed to init python operator 3: Traceback (most recent call last): File "<string>", line 1, in <module> File "/home/peter/abc_project/webcam.py", line 22, in __init__ self.video_capture = cv2.VideoCapture(0) AttributeError: module 'cv2' has no attribute 'VideoCapture'Location: binaries/runtime/src/operator/python.rs:28:9
─────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
```
The previous watchdog requests required waiting for a reply, which could slow down the system under load and also lead to false errors on slower systems (e.g. the CI). Because of this, this commit replaces the watchdog requests with asynchronous heartbeat messages, which don't require replies. Instead, we record the last heartbeat timestamp whenever we receive a heartbeat message. We then periodically check the time since the last received heartbeat message and consider the connection closed if too much time has passed.
We synchronize the start of nodes (even across machines) to avoid missed messages. This led to deadlock issues when nodes exited before they initialized the connection to the dora daemon. The reason was that the other nodes were still waiting for the stopped node to become ready.
This commit fixes this issue by properly handling node exits that occur before sending the subscribe message. If nodes exit, they are removed from the pending list and added to a new 'exited before init' list. Once all nodes have subscribed or exited, we answer the pending subscribe requests. If nodes exited before subscribing we send an error reply to the other nodes because a synchronized start is no longer possible.
To make this logic work across different machines too, we add a `success` field to the `AllNodesReady` message that the daemon sends to the coordinator. The coordinator forwards this flag to other daemons so that they can act accordingly.