Blame


1 c418ae42 2021-02-13 op Debugging for something unrelated, I noticed that on linux gmid’ server process would crash upon SIGHUP. I never noticed it before because (unfortunately) ‘ulimit -c 0’ seems to be the default on various systems (i.e. no core files) and I started testing on-the-fly reconfiguration only recently.
2 c418ae42 2021-02-13 op
3 c418ae42 2021-02-13 op What was particularly strange was that I got not logging whatsoever. I have a compile-time switch for seccomp to raise a catchable SIGSYS to dump the number of the forbidden system call and exit, but in this case my server processes were killed by a SIGSYS without any debugging info.
4 c418ae42 2021-02-13 op
5 c418ae42 2021-02-13 op My first theory was that during the process shutdown (server process gracefully shuts down after a SIGHUP) an unwanted syscall was done, maybe after stderr was flushed and closed and thus my signal handler wasn’t able to print info. But it didn’t seemed the case.
6 c418ae42 2021-02-13 op
7 c418ae42 2021-02-13 op On OpenBSD I have used in the past ktrace(1) to trace the system calls done by a process, so I searched for something similar for linux. Turns out, strace is quite flexible.
8 c418ae42 2021-02-13 op
9 c418ae42 2021-02-13 op I attached strace to the server process:
10 c418ae42 2021-02-13 op
11 c418ae42 2021-02-13 op ```
12 c418ae42 2021-02-13 op -bash-5.1# strace -p 30232
13 c418ae42 2021-02-13 op strace: Process 30232 attached
14 c418ae42 2021-02-13 op epoll_pwait(6,
15 c418ae42 2021-02-13 op ```
16 c418ae42 2021-02-13 op
17 c418ae42 2021-02-13 op Good, the server process is waiting on epoll as it should, let’s send it a SIGHUP:
18 c418ae42 2021-02-13 op
19 c418ae42 2021-02-13 op ```
20 c418ae42 2021-02-13 op -bash-5.1# strace -p 30232
21 c418ae42 2021-02-13 op strace: Process 30232 attached
22 c418ae42 2021-02-13 op epoll_pwait(6, 0x55724496a0, 32, -1, NULL, 8) = -1 EINTR (Interrupted system call)
23 c418ae42 2021-02-13 op --- SIGHUP {si_signo=SIGHUP, si_code=SI_USER, si_pid=30251, si_uid=1000} ---
24 c418ae42 2021-02-13 op write(8, "\1", 1) = 1
25 c418ae42 2021-02-13 op rt_sigreturn({mask=[]}) = ?
26 c418ae42 2021-02-13 op +++ killed by SIGSYS +++
27 c418ae42 2021-02-13 op ```
28 c418ae42 2021-02-13 op
29 c418ae42 2021-02-13 op Oh, what do we have here. rt_sigreturn(2)! (the write is libevent handling the signal)
30 c418ae42 2021-02-13 op
31 c418ae42 2021-02-13 op After an event handler is called, to restore the program stack linux injects a call to rt_sigreturn. If that syscall gets blocked by a BPF filter, it fails to handle the SIGSYS caused by the filter itself and just crash.
32 c418ae42 2021-02-13 op
33 c418ae42 2021-02-13 op But why for SIGHUP it crashes and for the catchable SIGSYS I was using for the debugging it doesn’t? Well, the SIGSYS handler calls directly _exit, so we don’t rearch the sigreturn.
34 c418ae42 2021-02-13 op
35 c418ae42 2021-02-13 op This is just a daily remainder of how low-level seccomp is, and how sometimes it just leaves you clueless, but also a nice opportunity to learn how signals are implemented on linux.