This commit adds extra setup when cgroupsv2 is enabled. In particular,
we make sure that the root namespace has setup cgroup.subtree_control
with the controllers we need.
If the necessary controller are not listed, we have to move all
processes out of the root namespace before we can change this
(the 'no internal processes' rule:
https://unix.stackexchange.com/a/713343). Currently we only
handle the case where the nsjail process is the only process in
the cgroup. It seems like this would be relatively rare, but since
nsjail is frequently the root process in a Docker container (e.g.
for hosting CTF challenges), I think this case is common enough to
make it worth implementing.
This also adds `--detect_cgroupv2`, which will attempt to detect
whether `--cgroupv2_mount` is a valid cgroupv2 mount, and if so
it will set `use_cgroupv2`. This is useful in containerized
environments where you may not know the kernel version ahead of time.
References:
https://github.com/redpwn/jail/blob/master/internal/cgroup/cgroup2.go
Currently, we always kill children by sending them a SIGKILL signal in
case we've got a fatal signal. This is rather inflexible and forbids
some usecases where e.g. child process listen for specific signals to
shut down gracefully.
Add a new command configuration `--forward_signals` that allows the user
to opt-in to forwarding fatal signals to the child process.
This commit fixes the incorrect cgroups_mem_max unit described in a config.proto comment.
We do not perform any calculations on this value and we don't specify the values unit (k/M/G) when writing to memory cgroup controller files, so the value is specified in bytes.
The default `rlimit_stack` value was set to 1048576. However, this value is in MiB and so is later multiplied by 1024*1024 in b3d544d155/config.cc (L161-L162) and it ends up as a limit of 1 TB for the stack size.
This PR changes it to 8 MB which is a more sane default or, at least I took it from my virtual machine's ulimits:
```
$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 31175
max locked memory (kbytes, -l) 16384
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 31175
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
```