### WHAT IS IT? NsJail is a process isolation tool for Linux. It makes use of the the namespacing, resource control, and seccomp-bpf syscall filter subsystems of the Linux kernel. It can help, among others, with: * Securing networking services (e.g. web, time, DNS), by isolating them from the rest of the OS * Hosting computer security challenges (so-called CTFs) * Containing invasive syscall-level OS fuzzers This is NOT an official Google product. ### WHAT TYPE OF ISOLATION DOES THIS TOOL PROVIDE? 1. Linux namespaces: UTS (hostname), MOUNT (chroot), PID (separate PID tree), IPC, NET (separate networking context), USER 2. FS constraints: chroot(), pivot_root(), RO-remounting 3. Resource limits (wall-time/CPU time limits, VM/mem address space limits, etc.) 4. Programmable seccomp-bpf syscall filters 5. Cloned and separated Ethernet interfaces 6. Cgroups for memory utilization control ### WHICH USE-CASES ARE SUPPORTED? #### Isolation of network services (inetd-style) + Server: ``` $ ./nsjail -Ml --port 9000 --chroot /chroot/ --user 99999 --group 99999 -- /bin/sh -i ``` + Client: ``` $ nc 127.0.0.1 9000 / $ ifconfig / $ ifconfig -a lo Link encap:Local Loopback LOOPBACK MTU:65536 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) / $ ps wuax PID USER COMMAND 1 99999 /bin/sh -i 3 99999 {busybox} ps wuax / $ ``` #### Isolation, with access to a private, cloned interface (requires euid==0) ``` $ sudo ./nsjail --user 9999 --group 9999 --iface eth0 --chroot /chroot/ -Mo --iface_vs_ip 192.168.0.44 --iface_vs_nm 255.255.255.0 --iface_vs_gw 192.168.0.1 -- /bin/sh -i / $ id uid=9999 gid=9999 / $ ip addr sh 1: lo: mtu 65536 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: vs: mtu 1500 qdisc noqueue link/ether ca:a2:69:21:33:66 brd ff:ff:ff:ff:ff:ff inet 192.168.0.44/24 brd 192.168.0.255 scope global vs valid_lft forever preferred_lft forever inet6 fe80::c8a2:69ff:fe21:cd66/64 scope link valid_lft forever preferred_lft forever / $ nc 217.146.165.209 80 GET / HTTP/1.0 HTTP/1.0 302 Found Cache-Control: private Content-Type: text/html; charset=UTF-8 Location: http://www.google.ch/?gfe_rd=cr&ei=cEzWVrG2CeTI8ge88ofwDA Content-Length: 258 Date: Wed, 02 Mar 2016 02:14:08 GMT 302 Moved

302 Moved

The document has moved here. / $ ``` #### Isolation of local processes ``` $ ./nsjail -Mo --chroot /chroot/ --user 99999 --group 99999 -- /bin/sh -i / $ ifconfig -a lo Link encap:Local Loopback LOOPBACK MTU:65536 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) / $ id uid=99999 gid=99999 / $ ps wuax PID USER COMMAND 1 99999 /bin/sh -i 4 99999 {busybox} ps wuax / $exit $ ``` #### Isolation of local processes (and re-running them) ``` $ ./nsjail -Mr --chroot /chroot/ --user 99999 --group 99999 -- /bin/sh -i BusyBox v1.21.1 (Ubuntu 1:1.21.0-1ubuntu1) built-in shell (ash) Enter 'help' for a list of built-in commands. / $ ps wuax PID USER COMMAND 1 99999 /bin/sh -i 2 99999 {busybox} ps wuax / $ exit BusyBox v1.21.1 (Ubuntu 1:1.21.0-1ubuntu1) built-in shell (ash) Enter 'help' for a list of built-in commands. / $ ps wuax PID USER COMMAND 1 99999 /bin/sh -i 2 99999 {busybox} ps wuax / $ ``` #### Bash in a minimal file-system with uid==0 and access to /dev/urandom ``` $ ./nsjail -Mo --user 0 --group 99999 -R /bin/ -R /lib -R /lib64/ -R /usr/ -R /sbin/ -T /dev -R /dev/urandom --keep_caps -- /bin/bash -i bash-4.3# ls -l / total 40 drwxr-xr-x 2 65534 65534 12288 Jun 17 23:27 bin drwxrwxrwt 2 0 99999 60 Jun 19 12:31 dev drwxr-xr-x 25 65534 65534 4096 Jun 9 18:29 lib drwxr-xr-x 2 65534 65534 4096 Apr 15 22:27 lib64 dr-xr-xr-x 260 65534 65534 0 Jun 19 12:31 proc drwxr-xr-x 2 65534 65534 16384 Jun 11 21:03 sbin drwxr-xr-x 21 65534 65534 4096 Apr 24 16:13 usr bash-4.3# ls -l /dev/ total 0 crw-rw-rw- 1 65534 65534 1, 9 Jun 9 18:33 urandom bash-4.3# id uid=0 gid=99999 groups=99999,65534 ``` ### MORE INFO? To see the options, simply type: ``` ./nsjail --help ``` The commandline options should be reasonably well-documented ``` Usage: ./nsjail [options] -- path_to_command [args] Options: --help|-h Help plz.. --mode|-M VALUE Execution mode (default: o [MODE_STANDALONE_ONCE]): l: Wait for connections on a TCP port (specified with --port) [MODE_LISTEN_TCP] o: Immediately launch a single process on a console using clone/execve [MODE_STANDALONE_ONCE] e: Immediately launch a single process on a console using execve [MODE_STANDALONE_EXECVE] r: Immediately launch a single process on a console, keep doing it forever [MODE_STANDALONE_RERUN] --chroot|-c VALUE Directory containing / of the jail (default: none) --rw Mount / as RW (default: RO) --user|-u VALUE Username/uid of processess inside the jail (default: your current uid). You can also use inside_ns_uid:outside_ns_uid convention here --group|-g VALUE Groupname/gid of processess inside the jail (default: your current gid). You can also use inside_ns_gid:global_ns_gid convention here --hostname|-H VALUE UTS name (hostname) of the jail (default: 'NSJAIL') --cwd|-D VALUE Directory in the namespace the process will run (default: '/') --port|-p VALUE TCP port to bind to (enables MODE_LISTEN_TCP) (default: 0) --bindhost VALUE IP address port to bind to (only in [MODE_LISTEN_TCP]), '::ffff:127.0.0.1' for locahost (default: '::') --max_conns_per_ip|-i VALUE Maximum number of connections per one IP (default: 0 (unlimited)) --log|-l VALUE Log file (default: /proc/self/fd/2) --time_limit|-t VALUE Maximum time that a jail can exist, in seconds (default: 600) --daemon|-d Daemonize after start --verbose|-v Verbose output --keep_env|-e Should all environment variables be passed to the child? --env|-E VALUE Environment variable (can be used multiple times) --keep_caps Don't drop capabilities (DANGEROUS) --silent Redirect child's fd:0/1/2 to /dev/null --skip_setsid Don't call setsid(), allows for terminal signal handling in the sandboxed process --pass_fd VALUE Don't close this FD before executing child (can be specified multiple times), by default: 0/1/2 are kept open --pivot_root_only Only perform pivot_root, no chroot. This will enable nested namespaces --disable_no_new_privs Don't set the prctl(NO_NEW_PRIVS, 1) (DANGEROUS) --rlimit_as VALUE RLIMIT_AS in MB, 'max' for RLIM_INFINITY, 'def' for the current value (default: 512) --rlimit_core VALUE RLIMIT_CORE in MB, 'max' for RLIM_INFINITY, 'def' for the current value (default: 0) --rlimit_cpu VALUE RLIMIT_CPU, 'max' for RLIM_INFINITY, 'def' for the current value (default: 600) --rlimit_fsize VALUE RLIMIT_FSIZE in MB, 'max' for RLIM_INFINITY, 'def' for the current value (default: 1) --rlimit_nofile VALUE RLIMIT_NOFILE, 'max' for RLIM_INFINITY, 'def' for the current value (default: 32) --rlimit_nproc VALUE RLIMIT_NPROC, 'max' for RLIM_INFINITY, 'def' for the current value (default: 'def') --rlimit_stack VALUE RLIMIT_STACK in MB, 'max' for RLIM_INFINITY, 'def' for the current value (default: 'def') --persona_addr_compat_layout personality(ADDR_COMPAT_LAYOUT) --persona_mmap_page_zero personality(MMAP_PAGE_ZERO) --persona_read_implies_exec personality(READ_IMPLIES_EXEC) --persona_addr_limit_3gb personality(ADDR_LIMIT_3GB) --persona_addr_no_randomize personality(ADDR_NO_RANDOMIZE) --disable_clone_newnet|-N Don't use CLONE_NEWNET. Enable networking inside the jail --disable_clone_newuser Don't use CLONE_NEWUSER. Requires euid==0 --disable_clone_newns Don't use CLONE_NEWNS --disable_clone_newpid Don't use CLONE_NEWPID --disable_clone_newipc Don't use CLONE_NEWIPC --disable_clone_newuts Don't use CLONE_NEWUTS --enable_clone_newcgroup Use CLONE_NEWCGROUP --uid_mapping|-U VALUE Add a custom uid mapping of the form inside_uid:outside_uid:count. Setting this requires newuidmap to be present --gid_mapping|-G VALUE Add a custom gid mapping of the form inside_gid:outside_gid:count. Setting this requires newuidmap to be present --bindmount_ro|-R VALUE List of mountpoints to be mounted --bind (ro) inside the container. Can be specified multiple times. Supports 'source' syntax, or 'source:dest' --bindmount|-B VALUE List of mountpoints to be mounted --bind (rw) inside the container. Can be specified multiple times. Supports 'source' syntax, or 'source:dest' --tmpfsmount|-T VALUE List of mountpoints to be mounted as RW/tmpfs inside the container. Can be specified multiple times. Supports 'dest' syntax --tmpfs_size VALUE Number of bytes to allocate for tmpfsmounts (default: 4194304) --disable_proc Disable mounting /proc in the jail --seccomp_policy|-P VALUE Path to file containing seccomp-bpf policy (see kafel/) --seccomp_string VALUE String with kafel seccomp-bpf policy (see kafel/) --cgroup_mem_max VALUE Maximum number of bytes to use in the group (default: '0' - disabled) --cgroup_mem_mount VALUE Location of memory cgroup FS (default: '/sys/fs/cgroup/memory') --cgroup_mem_parent VALUE Which pre-existing memory cgroup to use as a parent (default: 'NSJAIL') --iface_no_lo Don't bring up the 'lo' interface --iface|-I VALUE Interface which will be cloned (MACVLAN) and put inside the subprocess' namespace as 'vs' --iface_vs_ip VALUE IP of the 'vs' interface --iface_vs_nm VALUE Netmask of the 'vs' interface --iface_vs_gw VALUE Default GW for the 'vs' interface Examples: Wait on a port 31337 for connections, and run /bin/sh nsjail -Ml --port 31337 --chroot / -- /bin/sh -i Re-run echo command as a sub-process nsjail -Mr --chroot / -- /bin/echo "ABC" Run echo command once only, as a sub-process nsjail -Mo --chroot / -- /bin/echo "ABC" Execute echo command directly, without a supervising process nsjail -Me --chroot / --disable_proc -- /bin/echo "ABC" ```