Firejail

A Human Tool to Control Software

by Rocco Gagliardi

time to read: 15 minutes

When we use a piece of software, we make an act of faith: Not only towards the developer but, above all, versus all other developers who put their hand on the various parts of the complex system that runs the software. When we acquire a piece of software, we tend to be cautious up to the Ethernet port, clumping firewalls, and scanners. Once off the net, in our inner space, we tend to fatally trust the software.

There are many ways to control almost every operation a software performs: Look for the different security models used in computer security, you will find the classic DAC, MAC, RBAC, along with other esoteric ones. For Linux, the Discretionary Access Control (DAC) has been superseded by the SELinux/AppArmor kernel extensions that support the Mandatory Access Control (MAC) mechanism: In the last years, both have been installed and activated by default in the major distributions. They can control a lot, but we can still move us through the whole garden.

To limit the garden, sandboxes have been created. A sandbox, basically, provides a high grade of program separation: Can separate file system, networks, resources, or can use virtualization, with more or less effort and control, depending on the security model based on.

Capabilities and seccomp Security Models

Capabilities security model focus on privileged users, splitting the bunch of permissions a super-user has in different portions, and assign one or more labels to a process to grant them the abilities to execute or access part of the OS. Think of capabilities as a bouquet of accessible resources, not just syscalls. As an example take a look at CAP_SYS_RAWIO (use man 7 capabilities):

CAP_SYS_RAWIO
   * Perform I/O port operations (iopl(2) and ioperm(2));
   * access /proc/kcore;
   * employ the FIBMAP ioctl(2) operation;
   * open devices for accessing x86 model-specific registers (MSRs, see msr(4));
   * update /proc/sys/vm/mmap_min_addr;
   * create memory mappings at addresses below the value specified by /proc/sys/vm/mmap_min_addr;
   * map files in /proc/bus/pci;
   * open /dev/mem and /dev/kmem;
   * perform various SCSI device commands;
   * perform certain operations on hpsa(4) and cciss(4) devices;
   * perform a range of device-specific operations on other devices.

seccomp-bpf is an extension to seccomp that allows filtering of syscalls using a configurable policy implemented with Berkeley Packet Filter rules. It is used by OpenSSH and vsftpd as well as the Google Chrome/Chromium web browsers on Chrome OS and Linux. Is a kind of firewall to control which functionality, provided by the kernel (syscalls), a process can access, limiting the over exposure of the kernel itself versus a generic process. The number of syscalls may vary; on my system, 380/332[32/64] syscalls are supported (use man 2 syscalls and take a look into /usr/src/linux-headers-<ver>/include/uapi/asm-generic/unistd.h for a complete list).

Reducing Exposure

Firejail uses technologies like Capabilities and Seccomp-bpf to sandbox applications.

Take as an example the ping executable and try to reduce the exposure using the Firejail sandbox. We will:

Find out the syscalls ping uses
Drop all unneeded capabilities
Switch Firejail from syscalls blacklist to whitelist

As the first step, we try to find out the resources accessed by ping; dump the basic syscalls:

root@ubunthin:~# strace -qcf ping -c1 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=46 time=24.1 ms

--- 8.8.8.8 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 24.110/24.110/24.110/0.000 ms
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
  0.00    0.000000           0         8           read
  0.00    0.000000           0         6           write
  0.00    0.000000           0        35        13 open
  0.00    0.000000           0        23           close
  0.00    0.000000           0        23           fstat
  0.00    0.000000           0        31           mmap
  0.00    0.000000           0        14           mprotect
  0.00    0.000000           0         1           munmap
  0.00    0.000000           0         3           brk
  0.00    0.000000           0         3           rt_sigaction
  0.00    0.000000           0         1           rt_sigprocmask
  0.00    0.000000           0         2           ioctl
  0.00    0.000000           0         8         8 access
  0.00    0.000000           0         1           setitimer
  0.00    0.000000           0         1           getpid
  0.00    0.000000           0         5         2 socket
  0.00    0.000000           0         1           connect
  0.00    0.000000           0         1           sendto
  0.00    0.000000           0         1           recvmsg
  0.00    0.000000           0         1           getsockname
  0.00    0.000000           0         7           setsockopt
  0.00    0.000000           0         1           getsockopt
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         2           getuid
  0.00    0.000000           0         1           setuid
  0.00    0.000000           0         1           geteuid
  0.00    0.000000           0         7           capget
  0.00    0.000000           0         3           capset
  0.00    0.000000           0         2           prctl
  0.00    0.000000           0         1           arch_prctl
------ ----------- ----------- --------- --------- ----------------
100.00    0.000000                   195        23 total

We can just start ping in Firejail without any specific profile or configuration, to have a minimum security gain. Per default, Firejail leaves all capabilities enabled but blacklists some syscalls:

root@ubunthin:/home/rcc# firejail --name=ping ping 8.8.8.8

root@ubunthin:/home/rcc# firejail --list
11971:rcc:firejail --name=ping ping 8.8.8.8

root@ubunthin:/home/rcc# firejail --seccomp.print=ping
SECCOMP Filter:
  VALIDATE_ARCHITECTURE
  EXAMINE_SYSCAL
  UNKNOWN ENTRY!!!
  UNKNOWN ENTRY!!!
  UNKNOWN ENTRY!!!
  BLACKLIST 165 mount
  BLACKLIST 166 umount2
  BLACKLIST 101 ptrace
  BLACKLIST 246 kexec_load
  BLACKLIST 320 kexec_file_load
  BLACKLIST 304 open_by_handle_at
  BLACKLIST 303 name_to_handle_at
  BLACKLIST 175 init_module
  BLACKLIST 313 finit_module
  BLACKLIST 174 create_module
  BLACKLIST 176 delete_module
  BLACKLIST 172 iopl
  BLACKLIST 173 ioperm
  BLACKLIST 251 ioprio_set
  BLACKLIST 167 swapon
  BLACKLIST 168 swapoff
  BLACKLIST 103 syslog
  BLACKLIST 310 process_vm_readv
  BLACKLIST 311 process_vm_writev
  BLACKLIST 139 sysfs
  BLACKLIST 156 _sysctl
  BLACKLIST 159 adjtimex
  BLACKLIST 305 clock_adjtime
  BLACKLIST 212 lookup_dcookie
  BLACKLIST 298 perf_event_open
  BLACKLIST 300 fanotify_init
  BLACKLIST 312 kcmp
  BLACKLIST 248 add_key
  BLACKLIST 249 request_key
  BLACKLIST 250 keyctl
  BLACKLIST 134 uselib
  BLACKLIST 163 acct
  BLACKLIST 154 modify_ldt
  BLACKLIST 155 pivot_root
  BLACKLIST 206 io_setup
  BLACKLIST 207 io_destroy
  BLACKLIST 208 io_getevents
  BLACKLIST 209 io_submit
  BLACKLIST 210 io_cancel
  BLACKLIST 216 remap_file_pages
  BLACKLIST 237 mbind
  BLACKLIST 239 get_mempolicy
  BLACKLIST 238 set_mempolicy
  BLACKLIST 256 migrate_pages
  BLACKLIST 279 move_pages
  BLACKLIST 278 vmsplice
  BLACKLIST 161 chroot
  BLACKLIST 184 tuxcall
  BLACKLIST 169 reboot
  BLACKLIST 180 nfsservctl
  BLACKLIST 177 get_kernel_syms
  RETURN_ALLOW

  root@ubunthin:/home/rcc# firejail --caps.print=ping
  chown               - enabled
  dac_override        - enabled
  dac_read_search     - enabled
  fowner              - enabled
  fsetid              - enabled
  kill                - enabled
  setgid              - enabled
  setuid              - enabled
  setpcap             - enabled
  linux_immutable     - enabled
  net_bind_service    - enabled
  net_broadcast       - enabled
  net_admin           - enabled
  net_raw             - enabled
  ipc_lock            - enabled
  ipc_owner           - enabled
  sys_module          - enabled
  sys_rawio           - enabled
  sys_chroot          - enabled
  sys_ptrace          - enabled
  sys_pacct           - enabled
  sys_admin           - enabled
  sys_boot            - enabled
  sys_nice            - enabled
  sys_resource        - enabled
  sys_time            - enabled
  sys_tty_config      - enabled
  mknod               - enabled
  lease               - enabled
  audit_write         - enabled
  audit_control       - enabled
  setfcap             - enabled
  mac_override        - enabled
  mac_admin           - enabled
  syslog              - enabled
  wake_alarm          - enabled
  block_suspend       - enabled
  audit_read          - enabled

Going through Capabilities, why should ping have the permission to create a filesystem node? Basically, we can drop all Capabilities up to CAP_NET_RAW, to remove access to all unneeded resources up to:

CAP_NET_RAW
  * use RAW and PACKET sockets;
  * bind to any address for transparent proxying.

Just start Firejail:

root@ubunthin:/home/rcc# firejail --name=ping --caps.keep=net_raw ping 8.8.8.8

root@ubunthin:/home/rcc# firejail --list
12121:rcc:firejail --name=ping --caps.keep=net_raw ping 8.8.8.8

root@ubunthin:/home/rcc# firejail --caps.print=ping
...
net_admin           - disabled
net_raw             - enabled
ipc_lock            - disabled
...

We dropped all capabilities up to CAP_NET_RAW, forbidding at the same time access to many syscalls. But, since ping uses only 23 syscalls, we still are overexposed. With seccomp-bpf it is possible to fine filters and minimizes the syscalls used by ping, switching from blacklist to whitelist.

To whitelist the syscalls ping can use, start with the strace list of required syscalls and monitor the syslog for results:

root@ubunthin:/home/rcc# firejail --caps.keep=net_raw --shell=none --noprofile --debug --seccomp.keep=read,write,open,close,fstat,mmap,mprotect,munmap,brk,rt_sigaction,rt_sigprocmask,ioctl,access,setitimer,getpid,socket,connect,sendto,recvmsg,getsockname,setsockopt,getsockopt,execve,getuid,setuid,geteuid,capget,capset,prctl,arch_prctl,setresuid,setresgid,getgid,fcntl,clone,set_robust_list,stat,nanosleep,wait4,getdents,exit_group ping -c1 8.8.8.8

Syslog reports the last syscall causing a problem (blocked by Firejail):

Aug  7 21:09:57 ubunthin firejail[28419]: firejail --shell=none --noprofile --seccomp.keep=read,write,open,close,fstat,mmap,mprotect,munmap,brk,rt_sigaction,rt_sigprocmask,ioctl,access,setitimer,getpid,socket,connect,sendto,recvmsg,getsockname,setsockopt,getsockopt,execve,getuid,setuid,geteuid,capget,capset,prctl,arch_prctl,setresuid,setresgid,getgid,fcntl --debug ping -c1 8.8.8.8
Aug  7 21:09:58 ubunthin kernel: [98202.176765] audit: type=1326 audit(1502132998.041:55): auid=1000 uid=0 gid=0 ses=2 pid=28420 comm="firejail" exe="/usr/bin/firejail" sig=31 arch=c000003e syscall=56 compat=0 ip=0x7ff72e5ca40a code=0x0
Aug  7 21:09:58 ubunthin firejail[28419]: exiting...

In this case, ping tried to access syscall 56, which is not whitelisted, causing Firejail to kill the process. To find out the syscall name, from syscall number:

root@ubunthin:/home/rcc# firejail --debug-syscalls | grep 56
56	- clone

Update the syscall whitelist with clone and retry at the end, to the initial strace list, we added setresuid,setresgid,getgid,fcntl,clone,set_robust_list,stat,nanosleep,wait4,getdents,exit_group:

root@ubunthin:/home/rcc# firejail --caps.keep=net_raw --shell=none --noprofile --debug --seccomp.keep=read,write,open,close,fstat,mmap,mprotect,munmap,brk,rt_sigaction,rt_sigprocmask,ioctl,access,setitimer,getpid,socket,connect,sendto,recvmsg,getsockname,setsockopt,getsockopt,execve,getuid,setuid,geteuid,capget,capset,prctl,arch_prctl,setresuid,setresgid,getgid,fcntl,clone,set_robust_list,stat,nanosleep,wait4,getdents,exit_group ping -c1 8.8.8.8
Command name #ping#
DISPLAY :0, 0
Enabling IPC namespace
Using the local network stack
Parent pid 28503, child pid 28504
The new log directory is /proc/28504/root/var/log
Initializing child process
Host network configured
PID namespace installed
Mounting tmpfs on /run/firejail/mnt directory
Mounting read-only /bin, /sbin, /lib, /lib32, /lib64, /usr, /etc, /var
Mounting tmpfs on /dev/shm
Mounting tmpfs on /var/lock
Mounting tmpfs on /var/tmp
Mounting tmpfs on /var/log
Mounting tmpfs on /var/lib/dhcp
Mounting tmpfs on /var/lib/snmp
Mounting tmpfs on /var/lib/sudo
Create the new utmp file
Mount the new utmp file
Remounting /proc and /proc/sys filesystems
Remounting /sys directory
Disable /sys/firmware
Disable /sys/hypervisor
Disable /sys/module
Disable /sys/power
Disable /sys/kernel/debug
Disable /sys/kernel/vmcoreinfo
Disable /sys/kernel/uevent_helper
Disable /proc/sys/fs/binfmt_misc
Disable /proc/sys/kernel/core_pattern
Disable /proc/sys/kernel/modprobe
Disable /proc/sysrq-trigger
Disable /proc/sys/kernel/hotplug
Disable /proc/sys/vm/panic_on_oom
Disable /proc/irq
Disable /proc/bus
Disable /proc/sched_debug
Disable /proc/timer_list
Disable /proc/timer_stats
Disable /proc/kcore
Disable /proc/kallsyms
Disable /lib/modules
Disable /usr/lib/debug
Disable /boot
Disable /dev/port
Disable /sys/fs
DISPLAY :0, 0
Set caps filter 2000
Ending syscall filter
SECCOMP Filter:
  VALIDATE_ARCHITECTURE
  EXAMINE_SYSCAL
  UNKNOWN ENTRY!!!
  UNKNOWN ENTRY!!!
  UNKNOWN ENTRY!!!
  WHITELIST 105 setuid
  WHITELIST 106 setgid
  WHITELIST 116 setgroups
  WHITELIST 32 dup
  WHITELIST 0 read
  WHITELIST 1 write
  WHITELIST 2 open
  WHITELIST 3 close
  WHITELIST 5 fstat
  WHITELIST 9 mmap
  WHITELIST 10 mprotect
  WHITELIST 11 munmap
  WHITELIST 12 brk
  WHITELIST 13 rt_sigaction
  WHITELIST 14 rt_sigprocmask
  WHITELIST 16 ioctl
  WHITELIST 21 access
  WHITELIST 38 setitimer
  WHITELIST 39 getpid
  WHITELIST 41 socket
  WHITELIST 42 connect
  WHITELIST 44 sendto
  WHITELIST 47 recvmsg
  WHITELIST 51 getsockname
  WHITELIST 54 setsockopt
  WHITELIST 55 getsockopt
  WHITELIST 59 execve
  WHITELIST 102 getuid
  WHITELIST 105 setuid
  WHITELIST 107 geteuid
  WHITELIST 125 capget
  WHITELIST 126 capset
  WHITELIST 157 prctl
  WHITELIST 158 arch_prctl
  WHITELIST 117 setresuid
  WHITELIST 119 setresgid
  WHITELIST 104 getgid
  WHITELIST 72 fcntl
  WHITELIST 56 clone
  WHITELIST 273 set_robust_list
  WHITELIST 4 stat
  WHITELIST 35 nanosleep
  WHITELIST 61 wait4
  WHITELIST 78 getdents
  WHITELIST 231 exit_group
  KILL_PROCESS
Save seccomp filter, size 784 bytes
seccomp enabled
Username root, no supplementary groups
starting application
LD_PRELOAD=(null)
execvp argument 0: ping
execvp argument 1: -c1
execvp argument 2: 8.8.8.8
Child process initialized
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
monitoring pid 2

64 bytes from 8.8.8.8: icmp_seq=1 ttl=45 time=28.6 ms

--- 8.8.8.8 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 28.614/28.614/28.614/0.000 ms
Sandbox monitor: waitpid 2 retval 2 status 0

Parent is shutting down, bye...

Now we can create a profile for the ping application that minimizes the system exposure. If ping tries to make some other calls, or access some other resources, it will be immediately killed.

Where and when using Firejail

Security In-Depth paradigm requires multiple layers of security controls so that if one fails, another can take over.

Firejail can be used as an additional line of defense, especially for internet browsers. Comes with a bunch of predefined profiles and is relatively easy to configure additional applications.

I use Firejail for browsing, for software analysis, to use specific temporary settings (like specific DNS), to simulate installation and so on.

Sandboxes vs. Containers

The difference between sandboxes and containers can be reduced to the filesystem:

A sandbox works on an existing filesystem, a container has a separate filesystem.

The security focus of the technologies is different:

Sandboxing is fully focused on security, while containers on virtualization/developing.

In any case, since the used security model is the same, having confidence with seccomp-bpf and Capabilities will also help a lot in understanding a securing container environments (aka Docker).

Summary

Securing a computer does not end at the Ethernet port. Once installed and trusted, a piece of code is basically free to access a lot of resources not really necessary (think of X).

This overexposure has become critical with container technology, where the border between the resources and virtual resources may be very confusing.

Using Firejail is a quick-win for each application, especially for browsers without sandboxing. It is easy to use and configure, and the impact on the system is very small.

Tools

Other tools to try:

About the Author

Rocco Gagliardi has been working in IT since the 1980s and specialized in IT security in the 1990s. His main focus lies in security frameworks, network routing, firewalling and log management.

You want to test the security of your firewall?

Our experts will get in contact with you!

Transition to OpenSearch

Rocco Gagliardi

Graylog v5

Rocco Gagliardi

auditd

Rocco Gagliardi

Security Frameworks

Rocco Gagliardi

You want more?

Further articles available here

You need support in such a project?

Our experts will get in contact with you!

You want more?

Further articles available here

Firejail

A Human Tool to Control Software

Capabilities and seccomp Security Models

Reducing Exposure

Where and when using Firejail

Sandboxes vs. Containers

Summary

Tools

About the Author

Links

Tags

You want to test the security of your firewall?

Transition to OpenSearch

Graylog v5

auditd

Security Frameworks

You want more?

You need support in such a project?

You want more?