175 lines
6.2 KiB
Markdown
175 lines
6.2 KiB
Markdown
### WHAT IS IT?
|
|
NsJail is a process isolation tool for Linux. It makes use of the the namespacing, resource control, and seccomp-bpf syscall filter subsystems of the Linux kernel.
|
|
|
|
It can help, among others, with:
|
|
* Securing networking services (e.g. web, time, DNS), by isolating them from the rest of the OS
|
|
* Hosting computer security challenges (so-called CTFs)
|
|
* Containing invasive syscall-level OS fuzzers
|
|
|
|
This is NOT an official Google product.
|
|
|
|
### WHAT KIND OF ISOLATION DOES IT PROVIDE?
|
|
1. Linux namespaces: UTS (hostname), MOUNT (chroot), PID (separate PID tree), IPC, NET (separate networking context), USER
|
|
2. FS constraints: chroot(), pivot_root(), RO-remounting
|
|
3. Resource limits (wall-time/CPU time limits, VM/mem address space limits, etc.)
|
|
4. Programmable seccomp-bpf syscall filters
|
|
|
|
### WHICH USE-CASES ARE COVERED?
|
|
#### Isolation of network servers (inetd-style)
|
|
|
|
+ Server:
|
|
```
|
|
$ ./nsjail -Ml --port 9000 --chroot /chroot/ --user 99999 --group 99999 -- /bin/sh -i
|
|
```
|
|
|
|
+ Client:
|
|
```
|
|
$ nc 127.0.0.1 9000
|
|
/ $ ifconfig
|
|
/ $ ifconfig -a
|
|
lo Link encap:Local Loopback
|
|
LOOPBACK MTU:65536 Metric:1
|
|
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
|
|
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0
|
|
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
|
|
/ $ ps wuax
|
|
PID USER COMMAND
|
|
1 99999 /bin/sh -i
|
|
3 99999 {busybox} ps wuax
|
|
/ $
|
|
|
|
```
|
|
|
|
#### Isolation of local processes
|
|
```
|
|
$ ./nsjail -Mo --chroot /chroot/ --user 99999 --group 99999 -- /bin/sh -i
|
|
/ $ ifconfig -a
|
|
lo Link encap:Local Loopback
|
|
LOOPBACK MTU:65536 Metric:1
|
|
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
|
|
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0
|
|
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
|
|
/ $ id
|
|
uid=99999 gid=99999
|
|
/ $ ps wuax
|
|
PID USER COMMAND
|
|
1 99999 /bin/sh -i
|
|
4 99999 {busybox} ps wuax
|
|
/ $exit
|
|
$
|
|
```
|
|
|
|
#### Isolation of local processes (and re-running them)
|
|
```
|
|
$ ./nsjail -Mr --chroot /chroot/ --user 99999 --group 99999 -- /bin/sh -i
|
|
BusyBox v1.21.1 (Ubuntu 1:1.21.0-1ubuntu1) built-in shell (ash)
|
|
Enter 'help' for a list of built-in commands.
|
|
/ $ ps wuax
|
|
PID USER COMMAND
|
|
1 99999 /bin/sh -i
|
|
2 99999 {busybox} ps wuax
|
|
/ $ exit
|
|
BusyBox v1.21.1 (Ubuntu 1:1.21.0-1ubuntu1) built-in shell (ash)
|
|
Enter 'help' for a list of built-in commands.
|
|
/ $ ps wuax
|
|
PID USER COMMAND
|
|
1 99999 /bin/sh -i
|
|
2 99999 {busybox} ps wuax
|
|
/ $
|
|
```
|
|
|
|
### MORE INFO?
|
|
Type:
|
|
```
|
|
./nsjail --help'
|
|
```
|
|
The commandline options are reasonably well-documented
|
|
```
|
|
Usage: ./nsjail [options] -- path_to_command [args]
|
|
Options:
|
|
--help|-h
|
|
Help plz..
|
|
--mode|-M [val]
|
|
Execution mode (default: l [MODE_LISTEN_TCP]):
|
|
l: Listen to connections on a TCP port (specified with --port) [MODE_LISTEN_TCP]
|
|
o: Immediately launch a single process on a console [MODE_STANDALONE_ONCE]
|
|
r: Immediately launch a single process on a console, keep doing it forever [MODE_STANDALONE_RERUN]
|
|
--chroot|-c [val]
|
|
Directory containing / of the jail (default: '/chroot')
|
|
--user|-u [val]
|
|
Username/uid of processess inside the jail (default: 'nobody')
|
|
--group|-g [val]
|
|
Groupname/gid of processess inside the jail (default: 'nogroup')
|
|
--hostname|-H [val]
|
|
UTS name (hostname) of the jail (default: 'NSJAIL')
|
|
--cwd|-D [val]
|
|
Directory in the namespace the process will run (default: '/')
|
|
--port|-p [val]
|
|
TCP port to bind to (only in [MODE_LISTEN_TCP]) (default: 31337)
|
|
--max_conns_per_ip|-i [val]
|
|
Maximum number of connections per one IP (default: 0 (unlimited))
|
|
--log|-l [val]
|
|
Log file (default: stderr)
|
|
--time_limit|-t [val]
|
|
Maximum time that a jail can exist, in seconds (default: 600)
|
|
--daemon|-d
|
|
--verbose|-v
|
|
Verbose output (default: false)
|
|
--keep_env|-e
|
|
Should all environment variables be passed to the child? (default: false)
|
|
--keep_caps
|
|
Don't drop capabilities (DANGEROUS) (default: false)
|
|
--rlimit_as [val]
|
|
RLIMIT_AS in MB, 'max' for RLIM_INFINITY, 'def' for the current value (default: 512)
|
|
--rlimit_core [val]
|
|
RLIMIT_CORE in MB, 'max' for RLIM_INFINITY, 'def' for the current value (default: 0)
|
|
--rlimit_cpu [val]
|
|
RLIMIT_CPU, 'max' for RLIM_INFINITY, 'def' for the current value (default: 600)
|
|
--rlimit_fsize [val]
|
|
RLIMIT_FSIZE in MB, 'max' for RLIM_INFINITY, 'def' for the current value (default: 1)
|
|
--rlimit_nofile [val]
|
|
RLIMIT_NOFILE, 'max' for RLIM_INFINITY, 'def' for the current value (default: 32)
|
|
--rlimit_nproc [val]
|
|
RLIMIT_NPROC, 'max' for RLIM_INFINITY, 'def' for the current value (default: 'def')
|
|
--rlimit_stack [val]
|
|
RLIMIT_STACK in MB, 'max' for RLIM_INFINITY, 'def' for the current value (default: 'def')
|
|
--persona_addr_compat_layout
|
|
personality(ADDR_COMPAT_LAYOUT) (default: false)
|
|
--persona_mmap_page_zero
|
|
personality(MMAP_PAGE_ZERO) (default: false)
|
|
--persona_read_implies_exec
|
|
personality(READ_IMPLIES_EXEC) (default: false)
|
|
--persona_addr_limit_3gb
|
|
personality(ADDR_LIMIT_3GB) (default: false)
|
|
--persona_addr_no_randomize
|
|
personality(ADDR_NO_RANDOMIZE) (default: false)
|
|
--disable_clone_newnet|-N
|
|
Enable networking inside the jail (default: false)
|
|
--disable_clone_newuser
|
|
Don't use CLONE_NEWUSER (default: false)
|
|
--disable_clone_newns
|
|
Don't use CLONE_NEWNS (default: false)
|
|
--disable_clone_newpid
|
|
Don't use CLONE_NEWPID (default: false)
|
|
--disable_clone_newipc
|
|
Don't use CLONE_NEWIPC (default: false)
|
|
--disable_clone_newuts
|
|
Don't use CLONE_NEWUTS (default: false)
|
|
--disable_sandbox
|
|
Don't enable the seccomp-bpf sandboxing (default: false)
|
|
--rw
|
|
Mount / as RW (default: RO)
|
|
--silent
|
|
Redirect child's fd:0/1/2 to /dev/null (default: false)
|
|
--bindmount_ro [val]
|
|
List of mountpoints to be mounted --bind (ro) inside the container. Can be specified multiple times. Supports 'source' syntax, or 'source:dest'. (default: none)
|
|
--bindmount|-B [val]
|
|
List of mountpoints to be mounted --bind (rw) inside the container. Can be specified multiple times. Supports 'source' syntax, or 'source:dest'. (default: none)
|
|
--tmpfsmount|-T [val]
|
|
List of mountpoints to be mounted as RW/tmpfs inside the container. Can be specified multiple times. Supports 'dest' syntax. (default: none)
|
|
--iface|-I [val]
|
|
Interface which will be cloned (MACVTAP) and put inside the subprocess' namespace
|
|
--tmpfs_size [val]
|
|
Number of bytes to allocate for tmpfsmounts in bytes (default: 4194304)
|
|
```
|