This commit adds extra setup when cgroupsv2 is enabled. In particular,
we make sure that the root namespace has setup cgroup.subtree_control
with the controllers we need.
If the necessary controller are not listed, we have to move all
processes out of the root namespace before we can change this
(the 'no internal processes' rule:
https://unix.stackexchange.com/a/713343). Currently we only
handle the case where the nsjail process is the only process in
the cgroup. It seems like this would be relatively rare, but since
nsjail is frequently the root process in a Docker container (e.g.
for hosting CTF challenges), I think this case is common enough to
make it worth implementing.
This also adds `--detect_cgroupv2`, which will attempt to detect
whether `--cgroupv2_mount` is a valid cgroupv2 mount, and if so
it will set `use_cgroupv2`. This is useful in containerized
environments where you may not know the kernel version ahead of time.
References:
https://github.com/redpwn/jail/blob/master/internal/cgroup/cgroup2.go
Currently, we always kill children by sending them a SIGKILL signal in
case we've got a fatal signal. This is rather inflexible and forbids
some usecases where e.g. child process listen for specific signals to
shut down gracefully.
Add a new command configuration `--forward_signals` that allows the user
to opt-in to forwarding fatal signals to the child process.
In 9b8d91bd7f the default for rlimit_as
was increased to 4096 MB, but old default remained in the man page,
readme, etc. This patch corrects those spots with the right value.
These were found by external tooling while preparing the Debian package.
* Uknown -> Unknown
* Writting -> Writing
* commited -> committed
* processess -> processes
Signed-off-by: Christian Blichmann <mail@blichmann.eu>