Enhancing Data Understanding
Rocco Gagliardi
When we use a piece of software, we make an act of faith: Not only towards the developer but, above all, versus all other developers who put their hand on the various parts of the complex system that runs the software. When we acquire a piece of software, we tend to be cautious up to the Ethernet port, clumping firewalls, and scanners. Once off the net, in our inner space, we tend to fatally trust the software.
There are many ways to control almost every operation a software performs: Look for the different security models used in computer security, you will find the classic DAC, MAC, RBAC, along with other esoteric ones. For Linux, the Discretionary Access Control (DAC) has been superseded by the SELinux/AppArmor kernel extensions that support the Mandatory Access Control (MAC) mechanism: In the last years, both have been installed and activated by default in the major distributions. They can control a lot, but we can still move us through the whole garden.
To limit the garden, sandboxes have been created. A sandbox, basically, provides a high grade of program separation: Can separate file system, networks, resources, or can use virtualization, with more or less effort and control, depending on the security model based on.
Capabilities security model focus on privileged users, splitting the bunch of permissions a super-user has in different portions, and assign one or more labels to a process to grant them the abilities to execute or access part of the OS. Think of capabilities as a bouquet of accessible resources, not just syscalls. As an example take a look at CAP_SYS_RAWIO
(use man 7 capabilities
):
CAP_SYS_RAWIO * Perform I/O port operations (iopl(2) and ioperm(2)); * access /proc/kcore; * employ the FIBMAP ioctl(2) operation; * open devices for accessing x86 model-specific registers (MSRs, see msr(4)); * update /proc/sys/vm/mmap_min_addr; * create memory mappings at addresses below the value specified by /proc/sys/vm/mmap_min_addr; * map files in /proc/bus/pci; * open /dev/mem and /dev/kmem; * perform various SCSI device commands; * perform certain operations on hpsa(4) and cciss(4) devices; * perform a range of device-specific operations on other devices.
seccomp-bpf is an extension to seccomp that allows filtering of syscalls using a configurable policy implemented with Berkeley Packet Filter rules. It is used by OpenSSH and vsftpd as well as the Google Chrome/Chromium web browsers on Chrome OS and Linux. Is a kind of firewall to control which functionality, provided by the kernel (syscalls), a process can access, limiting the over exposure of the kernel itself versus a generic process.
The number of syscalls may vary; on my system, 380/332[32/64] syscalls are supported (use man 2 syscalls
and take a look into /usr/src/linux-headers-<ver>/include/uapi/asm-generic/unistd.h
for a complete list).
Firejail uses technologies like Capabilities and Seccomp-bpf to sandbox applications.
Take as an example the ping
executable and try to reduce the exposure using the Firejail sandbox. We will:
ping
usesAs the first step, we try to find out the resources accessed by ping
; dump the basic syscalls:
root@ubunthin:~# strace -qcf ping -c1 8.8.8.8 PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data. 64 bytes from 8.8.8.8: icmp_seq=1 ttl=46 time=24.1 ms --- 8.8.8.8 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 24.110/24.110/24.110/0.000 ms % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 0.00 0.000000 0 8 read 0.00 0.000000 0 6 write 0.00 0.000000 0 35 13 open 0.00 0.000000 0 23 close 0.00 0.000000 0 23 fstat 0.00 0.000000 0 31 mmap 0.00 0.000000 0 14 mprotect 0.00 0.000000 0 1 munmap 0.00 0.000000 0 3 brk 0.00 0.000000 0 3 rt_sigaction 0.00 0.000000 0 1 rt_sigprocmask 0.00 0.000000 0 2 ioctl 0.00 0.000000 0 8 8 access 0.00 0.000000 0 1 setitimer 0.00 0.000000 0 1 getpid 0.00 0.000000 0 5 2 socket 0.00 0.000000 0 1 connect 0.00 0.000000 0 1 sendto 0.00 0.000000 0 1 recvmsg 0.00 0.000000 0 1 getsockname 0.00 0.000000 0 7 setsockopt 0.00 0.000000 0 1 getsockopt 0.00 0.000000 0 1 execve 0.00 0.000000 0 2 getuid 0.00 0.000000 0 1 setuid 0.00 0.000000 0 1 geteuid 0.00 0.000000 0 7 capget 0.00 0.000000 0 3 capset 0.00 0.000000 0 2 prctl 0.00 0.000000 0 1 arch_prctl ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000000 195 23 total
We can just start ping
in Firejail without any specific profile or configuration, to have a minimum security gain. Per default, Firejail leaves all capabilities enabled but blacklists some syscalls:
root@ubunthin:/home/rcc# firejail --name=ping ping 8.8.8.8 root@ubunthin:/home/rcc# firejail --list 11971:rcc:firejail --name=ping ping 8.8.8.8 root@ubunthin:/home/rcc# firejail --seccomp.print=ping SECCOMP Filter: VALIDATE_ARCHITECTURE EXAMINE_SYSCAL UNKNOWN ENTRY!!! UNKNOWN ENTRY!!! UNKNOWN ENTRY!!! BLACKLIST 165 mount BLACKLIST 166 umount2 BLACKLIST 101 ptrace BLACKLIST 246 kexec_load BLACKLIST 320 kexec_file_load BLACKLIST 304 open_by_handle_at BLACKLIST 303 name_to_handle_at BLACKLIST 175 init_module BLACKLIST 313 finit_module BLACKLIST 174 create_module BLACKLIST 176 delete_module BLACKLIST 172 iopl BLACKLIST 173 ioperm BLACKLIST 251 ioprio_set BLACKLIST 167 swapon BLACKLIST 168 swapoff BLACKLIST 103 syslog BLACKLIST 310 process_vm_readv BLACKLIST 311 process_vm_writev BLACKLIST 139 sysfs BLACKLIST 156 _sysctl BLACKLIST 159 adjtimex BLACKLIST 305 clock_adjtime BLACKLIST 212 lookup_dcookie BLACKLIST 298 perf_event_open BLACKLIST 300 fanotify_init BLACKLIST 312 kcmp BLACKLIST 248 add_key BLACKLIST 249 request_key BLACKLIST 250 keyctl BLACKLIST 134 uselib BLACKLIST 163 acct BLACKLIST 154 modify_ldt BLACKLIST 155 pivot_root BLACKLIST 206 io_setup BLACKLIST 207 io_destroy BLACKLIST 208 io_getevents BLACKLIST 209 io_submit BLACKLIST 210 io_cancel BLACKLIST 216 remap_file_pages BLACKLIST 237 mbind BLACKLIST 239 get_mempolicy BLACKLIST 238 set_mempolicy BLACKLIST 256 migrate_pages BLACKLIST 279 move_pages BLACKLIST 278 vmsplice BLACKLIST 161 chroot BLACKLIST 184 tuxcall BLACKLIST 169 reboot BLACKLIST 180 nfsservctl BLACKLIST 177 get_kernel_syms RETURN_ALLOW root@ubunthin:/home/rcc# firejail --caps.print=ping chown - enabled dac_override - enabled dac_read_search - enabled fowner - enabled fsetid - enabled kill - enabled setgid - enabled setuid - enabled setpcap - enabled linux_immutable - enabled net_bind_service - enabled net_broadcast - enabled net_admin - enabled net_raw - enabled ipc_lock - enabled ipc_owner - enabled sys_module - enabled sys_rawio - enabled sys_chroot - enabled sys_ptrace - enabled sys_pacct - enabled sys_admin - enabled sys_boot - enabled sys_nice - enabled sys_resource - enabled sys_time - enabled sys_tty_config - enabled mknod - enabled lease - enabled audit_write - enabled audit_control - enabled setfcap - enabled mac_override - enabled mac_admin - enabled syslog - enabled wake_alarm - enabled block_suspend - enabled audit_read - enabled
Going through Capabilities, why should ping
have the permission to create a filesystem node? Basically, we can drop all Capabilities up to CAP_NET_RAW
, to remove access to all unneeded resources up to:
CAP_NET_RAW * use RAW and PACKET sockets; * bind to any address for transparent proxying.
Just start Firejail:
root@ubunthin:/home/rcc# firejail --name=ping --caps.keep=net_raw ping 8.8.8.8 root@ubunthin:/home/rcc# firejail --list 12121:rcc:firejail --name=ping --caps.keep=net_raw ping 8.8.8.8 root@ubunthin:/home/rcc# firejail --caps.print=ping ... net_admin - disabled net_raw - enabled ipc_lock - disabled ...
We dropped all capabilities up to CAP_NET_RAW
, forbidding at the same time access to many syscalls. But, since ping
uses only 23 syscalls, we still are overexposed. With seccomp-bpf it is possible to fine filters and minimizes the syscalls used by ping, switching from blacklist to whitelist.
To whitelist the syscalls ping
can use, start with the strace list of required syscalls and monitor the syslog for results:
root@ubunthin:/home/rcc# firejail --caps.keep=net_raw --shell=none --noprofile --debug --seccomp.keep=read,write,open,close,fstat,mmap,mprotect,munmap,brk,rt_sigaction,rt_sigprocmask,ioctl,access,setitimer,getpid,socket,connect,sendto,recvmsg,getsockname,setsockopt,getsockopt,execve,getuid,setuid,geteuid,capget,capset,prctl,arch_prctl,setresuid,setresgid,getgid,fcntl,clone,set_robust_list,stat,nanosleep,wait4,getdents,exit_group ping -c1 8.8.8.8
Syslog reports the last syscall causing a problem (blocked by Firejail):
Aug 7 21:09:57 ubunthin firejail[28419]: firejail --shell=none --noprofile --seccomp.keep=read,write,open,close,fstat,mmap,mprotect,munmap,brk,rt_sigaction,rt_sigprocmask,ioctl,access,setitimer,getpid,socket,connect,sendto,recvmsg,getsockname,setsockopt,getsockopt,execve,getuid,setuid,geteuid,capget,capset,prctl,arch_prctl,setresuid,setresgid,getgid,fcntl --debug ping -c1 8.8.8.8 Aug 7 21:09:58 ubunthin kernel: [98202.176765] audit: type=1326 audit(1502132998.041:55): auid=1000 uid=0 gid=0 ses=2 pid=28420 comm="firejail" exe="/usr/bin/firejail" sig=31 arch=c000003e syscall=56 compat=0 ip=0x7ff72e5ca40a code=0x0 Aug 7 21:09:58 ubunthin firejail[28419]: exiting...
In this case, ping
tried to access syscall 56, which is not whitelisted, causing Firejail to kill the process. To find out the syscall name, from syscall number:
root@ubunthin:/home/rcc# firejail --debug-syscalls | grep 56 56 - clone
Update the syscall whitelist with clone
and retry
at the end, to the initial strace list, we added setresuid,setresgid,getgid,fcntl,clone,set_robust_list,stat,nanosleep,wait4,getdents,exit_group
:
root@ubunthin:/home/rcc# firejail --caps.keep=net_raw --shell=none --noprofile --debug --seccomp.keep=read,write,open,close,fstat,mmap,mprotect,munmap,brk,rt_sigaction,rt_sigprocmask,ioctl,access,setitimer,getpid,socket,connect,sendto,recvmsg,getsockname,setsockopt,getsockopt,execve,getuid,setuid,geteuid,capget,capset,prctl,arch_prctl,setresuid,setresgid,getgid,fcntl,clone,set_robust_list,stat,nanosleep,wait4,getdents,exit_group ping -c1 8.8.8.8 Command name #ping# DISPLAY :0, 0 Enabling IPC namespace Using the local network stack Parent pid 28503, child pid 28504 The new log directory is /proc/28504/root/var/log Initializing child process Host network configured PID namespace installed Mounting tmpfs on /run/firejail/mnt directory Mounting read-only /bin, /sbin, /lib, /lib32, /lib64, /usr, /etc, /var Mounting tmpfs on /dev/shm Mounting tmpfs on /var/lock Mounting tmpfs on /var/tmp Mounting tmpfs on /var/log Mounting tmpfs on /var/lib/dhcp Mounting tmpfs on /var/lib/snmp Mounting tmpfs on /var/lib/sudo Create the new utmp file Mount the new utmp file Remounting /proc and /proc/sys filesystems Remounting /sys directory Disable /sys/firmware Disable /sys/hypervisor Disable /sys/module Disable /sys/power Disable /sys/kernel/debug Disable /sys/kernel/vmcoreinfo Disable /sys/kernel/uevent_helper Disable /proc/sys/fs/binfmt_misc Disable /proc/sys/kernel/core_pattern Disable /proc/sys/kernel/modprobe Disable /proc/sysrq-trigger Disable /proc/sys/kernel/hotplug Disable /proc/sys/vm/panic_on_oom Disable /proc/irq Disable /proc/bus Disable /proc/sched_debug Disable /proc/timer_list Disable /proc/timer_stats Disable /proc/kcore Disable /proc/kallsyms Disable /lib/modules Disable /usr/lib/debug Disable /boot Disable /dev/port Disable /sys/fs DISPLAY :0, 0 Set caps filter 2000 Ending syscall filter SECCOMP Filter: VALIDATE_ARCHITECTURE EXAMINE_SYSCAL UNKNOWN ENTRY!!! UNKNOWN ENTRY!!! UNKNOWN ENTRY!!! WHITELIST 105 setuid WHITELIST 106 setgid WHITELIST 116 setgroups WHITELIST 32 dup WHITELIST 0 read WHITELIST 1 write WHITELIST 2 open WHITELIST 3 close WHITELIST 5 fstat WHITELIST 9 mmap WHITELIST 10 mprotect WHITELIST 11 munmap WHITELIST 12 brk WHITELIST 13 rt_sigaction WHITELIST 14 rt_sigprocmask WHITELIST 16 ioctl WHITELIST 21 access WHITELIST 38 setitimer WHITELIST 39 getpid WHITELIST 41 socket WHITELIST 42 connect WHITELIST 44 sendto WHITELIST 47 recvmsg WHITELIST 51 getsockname WHITELIST 54 setsockopt WHITELIST 55 getsockopt WHITELIST 59 execve WHITELIST 102 getuid WHITELIST 105 setuid WHITELIST 107 geteuid WHITELIST 125 capget WHITELIST 126 capset WHITELIST 157 prctl WHITELIST 158 arch_prctl WHITELIST 117 setresuid WHITELIST 119 setresgid WHITELIST 104 getgid WHITELIST 72 fcntl WHITELIST 56 clone WHITELIST 273 set_robust_list WHITELIST 4 stat WHITELIST 35 nanosleep WHITELIST 61 wait4 WHITELIST 78 getdents WHITELIST 231 exit_group KILL_PROCESS Save seccomp filter, size 784 bytes seccomp enabled Username root, no supplementary groups starting application LD_PRELOAD=(null) execvp argument 0: ping execvp argument 1: -c1 execvp argument 2: 8.8.8.8 Child process initialized PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data. monitoring pid 2 64 bytes from 8.8.8.8: icmp_seq=1 ttl=45 time=28.6 ms --- 8.8.8.8 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 28.614/28.614/28.614/0.000 ms Sandbox monitor: waitpid 2 retval 2 status 0 Parent is shutting down, bye...
Now we can create a profile for the ping
application that minimizes the system exposure. If ping
tries to make some other calls, or access some other resources, it will be immediately killed.
Security In-Depth paradigm requires multiple layers of security controls so that if one fails, another can take over.
Firejail can be used as an additional line of defense, especially for internet browsers. Comes with a bunch of predefined profiles and is relatively easy to configure additional applications.
I use Firejail for browsing, for software analysis, to use specific temporary settings (like specific DNS), to simulate installation and so on.
The difference between sandboxes and containers can be reduced to the filesystem:
A sandbox works on an existing filesystem, a container has a separate filesystem.
The security focus of the technologies is different:
Sandboxing is fully focused on security, while containers on virtualization/developing.
In any case, since the used security model is the same, having confidence with seccomp-bpf and Capabilities will also help a lot in understanding a securing container environments (aka Docker).
Securing a computer does not end at the Ethernet port. Once installed and trusted, a piece of code is basically free to access a lot of resources not really necessary (think of X).
This overexposure has become critical with container technology, where the border between the resources and virtual resources may be very confusing.
Using Firejail is a quick-win for each application, especially for browsers without sandboxing. It is easy to use and configure, and the impact on the system is very small.
Other tools to try:
Our experts will get in contact with you!
Rocco Gagliardi
Rocco Gagliardi
Rocco Gagliardi
Rocco Gagliardi
Our experts will get in contact with you!