I want a "Red Teaming"
Michael Schneider
This is the Hot New Kernel Feature
Because of the many additions to BPF it is also referred to as eBPF, the ‘e’ standing for extended. For tasks which are handled by the Linux kernel, BPF is more efficient than traditional event tracing methods because it allows for direct instrumentation of any kernel function. Thus BPF can be attached to all kinds of event sources. BPF runs in a sandboxed kernel environment which guarantees the program does not cause any harm to the system through its application logic. This makes it well suited as a means to monitor events or debug on production systems. A risk which still remains is an impact on performance if a large number of events is monitored.
An existing collection of tools using BPF can be found in the bpfcc-tools
package for Debian systems. One useful tool is opensnoop
, which allows for monitoring the files opened either system wide or for a single program by tracing the do_sys_open
kernel function.
Another useful BPF program is tcpretrans
, which traces tcp retransmits. Without BPF, a common way to trace tcp retransmits would be using a packetcapture and analyzing the packets by using a tool such as tcpdump. With BPF, we can attach a kprobe so the tcp_retransmit_skb
kernel function, which is a more elegant solution. From tcpretrans:
# initialize BPF b = BPF(text=bpf_text) b.attach_kprobe(event="tcp_retransmit_skb", fn_name="trace_retransmit")
The function trace_retransmit
is a C function defined earlier in the source file. It gathers information on the retransmit and is executed every time a retransmit happens.
Writing programs directly using eBPF instructions is time consuming and is comparable to coding in assembly. There are however approachable front ends available with varying levels of abstraction and capabilities. In this article, bcc (BPF compiler collection) is used. Bcc offers frontends in both Python and Lua. While this first example using the Python interface is very minimalistic, it demonstrates with how little code the ptrace_attach
system call can be monitored to detect a form of process injection.
/usr/bin/python from bcc import BPF BPF(text='int kprobe__ptrace_attach(void *ctx) { bpf_trace_printk("ptrace_attach called\\n"); return 0; }').trace_print()
This program generates the following output when ptrace_attach is called:
# ./ptrace.py derusbi-6572 [003] .... 55527.716367: 0x00000001: ptrace_attach called
The quoted string assigned to the text
variable is restricted C code. kprobe__
is a special function prefix that creates a kprobe (dynamic tracing of a kernel function call) for the specified kernel function name. Our function is executed every time the ptrace_attach system call is used.
bpf_trace_printk
can be used as a convenient hack to write to the common /sys/kernel/debug/tracing/trace_pipe
for debugging purposes, but shouldn’t be used otherwise because trace_pipe
is globally shared. The exact output of bpf_trace_printk
depends on the options set in /sys/kernel/debug/tracing/trace_options
(see /sys/kernel/debug/tracing/README
for more information). In this case, derusbi
is the name and 6572
the PID of the current task. [003]
is the cpu number the current task is running on, followed by IRQ-Options, a timestamp in nanoseconds, a fake value used by BPF for the instruction pointer register and our formatted message.
We can attach multiple kprobes to a function using attach_kprobe
in python. Also, using trace_fields() instead of trace_print()
gives us more control over the values printed.
from bcc import BPF # define BPF program bpf_program = """ int p_event(void *ctx) { bpf_trace_printk("traced very meaningful event!\\n"); return 0; } """ # load BPF program b = BPF(text=bpf_program) #attach kprobes for sys_clone and execve b.attach_kprobe(event=b.get_syscall_fnname("clone"), fn_name="p_event") b.attach_kprobe(event=b.get_syscall_fnname("execve"), fn_name="p_event") while True: try: (task, pid, cpu, flags, ts, msg) = b.trace_fields() except ValueError: continue print("%f\t%d\t%s\t%s" % (ts, pid, task, msg))
This next example uses the BPF_PERF_OUTPUT()
interface. This is the preferred method of pushing per-event data to user space.
#!/usr/bin/python #Adapted example from https://github.com/iovisor/bcc/blob/master/docs/reference_guide.md #as well as https://github.com/iovisor/bcc/blob/master/docs/tutorial_bcc_python_developer.md from bcc import BPF # define BPF program bpf_program = """ #include <linux/sched.h> struct omg_data { u64 gid_pid; u32 pid; u32 gid; u64 ts; //timestamp with nanosecond precision char procname[TASK_COMM_LEN]; //holds the name of the current process }; BPF_PERF_OUTPUT(custom_event); int get_thi_dete(struct pt_regs *ctx) { struct omg_data mdata; mdata.gid_pid = bpf_get_current_pid_tgid(); mdata.pid = (u32) mdata.gid_pid; mdata.gid = mdata.gid_pid >> 32; mdata.ts = bpf_ktime_get_ns(); //gets timestamp in nanoseconds bpf_get_current_comm(&mdata.procname, sizeof(mdata.procname)); custom_event.perf_submit(ctx, &mdata, sizeof(mdata)); return 0; } """ # load BPF program b = BPF(text=bpf_program) b.attach_kprobe(event=b.get_syscall_fnname("clone"), fn_name="get_thi_dete") # header print("%-18s %-6s %-6s %-16s %s" % ("TIME(s)", "PID", "TID", "TASK", "MESSAGE")) # process event start = 0 def print_event(cpu, omg_data, size): global start event = b["custom_event"].event(omg_data) if start == 0: start = event.ts time_s = (float(event.ts - start)) / 1000000000 print("%-18.9f %-6d %-6d %-16s %s" % (time_s, event.gid, event.pid, event.procname, "Traced sys_clone!")) # loop with callback to print_event b["custom_event"].open_perf_buffer(print_event) while 1: b.perf_buffer_poll()
Because we don’t use bpf_trace_printk()
here, we also don’t get the prepackaged information it provides. This means we need to gather event data ourselves. We use the omg_data struct to pass data from kernel to user space. BPF_PERF_OUTPUT('custom_event')
creates a BPF table custom_event
for pushing out custom event data to user space via the perf ring buffer. From bpf.h
:
* u64 bpf_get_current_pid_tgid(void) * Return * A 64-bit integer containing the current tgid and pid, and * created as such: * *current_task*\ **->tgid << 32 \|** * *current_task*\ **->pid**.
Thus, to get the thread group ID we and the PID we just shift the returned value appropriately.
To get the name of the current task we call bpf_get_current_comm()
. From bpf.h
:
* int bpf_get_current_comm(char *buf, u32 size_of_buf) * Description * Copy the **comm** attribute of the current task into *buf* of * *size_of_buf*. The **comm** attribute contains the name of * the executable (excluding the path) for the current task. The * *size_of_buf* must be strictly positive. On success, the * helper makes sure that the *buf* is NUL-terminated. On failure, * it is filled with zeroes.
perf_submit
submits the event for user space via a perf ring buffer. print_event
is the Python function which will handle reading events from the custom_event
stream. b.perf_buffer_poll()
waits for events. This call is blocking.
The new additions to BPF allow for compact, powerful and performant tracing programs. Now it’s just a matter of finding suitable use-cases to harness its power. Despite touching on sensitive areas of the system, the risk associated with impacting other areas of the system is reduced, thanks to being run in a sandboxed environment and static code analysis by the kernel.
Our experts will get in contact with you!
Michael Schneider
Marisa Tschopp
Michèle Trebo
Andrea Covello
Our experts will get in contact with you!