There's another way to intercept syscalls without going as far as a kernel module, using debugging API ptrace. There's a pretty neat article about how to implement custom syscalls using ptrace: https://nullprogram.com/blog/2018/06/23/
Yup! I discuss the pros and cons of using `ptrace` within that post.
It's all about the use case: if being constrained to inferior processes and adding 2-3x overhead per syscall doesn't matter, then `ptrace` is an excellent option. OTOH, if you want to instrument all processes and want to keep instrumentation overhead to a bare minimum, you more or less have to go into the kernel.
I've been looking for a while for a way to capture all file opens and network ops to profile unknown production workloads similar to proc. explorer on Windows, which I believe is implemented using ETW. Unfortunately strace seems to be out of the question purely because of the performance impact. Is the performance impact due to strace or ptrace itself?
It's ptrace itself: every traced syscall requires at least one (but usually 3-4) ptrace(2) calls, plus scattered wait(2)/waitpid(2) calls depending on the operation.
If you want to capture events like file opens and network traffic, I'd take a look at eBPF or the Linux Audit Framework.