shutdown busybox and start another PID1 process

Sun Aug 10 09:29:14 UTC 2014

On Sun, Aug 10, 2014 at 09:38 AM, Laurent Bercot said:
>  The *only* way to be 100% sure they are no remaining processes at
> some point in a system's lifetime is to kill them all - like aliens,
> or zombies. Trying to be smarter or more gentle than that is a
> losing battle: any process can sneakily fork while you're not
> looking and leave a child running somewhere.

Yes, there are some processes that do sneaky forks on shutdown,
creating children that need to be killed.  I certainly do try
to be gentle at first because some programs have a legitimate
need to clean up after themselves when they get the TERM signal.
I think I wait for up to 5 or 10 seconds and if processes are
still being stubborn then I get medieval on them.

I've been able to deal with processes that spawn children on
shutdown but it might be possible for someone to intentionally
build a fork bomb that I would have trouble with.  I believe that
if someone with root is malicious enough then they can defeat any
shutdown solution.  For example, all they need do is delete the
shutdown scripts.  My goal is to correctly handle all normal,
non-malicious situations.  I don't know of any other Linux distro
that does even that correctly.

You certainly don't want to kill your own process because there
is more work to be done after you've killed off most processes.
Also, sometimes processes are needed to keep filesystems mounted,
such as ntfs-3g processes.  This was one of the reason I felt
compelled to write my own shutdown code.  IMO ntfs-3g and similar
processes should not be killed.  They will die off naturally
when you unmount the filesystem they are supporting.  So

    kill -9 -1

is a no-go for me.  Likewise killall5 (which IMO is better)
doesn't work for me.  They defeat the purpose of my shutdown
script. If I nuke an ntfs-3g process then any file on it that is
mounted will not get unmounted cleanly.  Usually killing off
processes is done before filesystems are remounted read-only.
This is actually a problem when running a live system with
persistence files on an ntfs filesystem.  We definitely want to
offer live persistence on Windows only machines.  I want to be
able to unmount our persistence files cleanly and even more
important I don't want to risk damaging their Windows file
system.  IMO even non-live systems should umount ntfs correctly,
not just by killing off the ntfs-3g process.  I don't know of any
non-live distro that handles this correctly.  

> Note that neither sysvinit (which busybox init is a variant of)
> nor runit will run your scripts as process 1, 

I've been using "telinit u" to get my own script to run with pid
1.  Works just fine on Debian and Gentoo.  I thought they were
both based on sysv.  I don't know if it works with the busybox
init.  If not, it should not be very hard to fix, just capture
the TERM or HUP signal and exec yourself.

This trick to grab pid 1 was the reason I posted in this thread
in the first place.  I do this after I pivot_root into a small
busybox system in tmpfs.  Then I put my own script in /sbin/init
on the tmpfs and call "telinit u".  If you are not going to
pivot_root then I think Sam's bind mount suggestion is great to
avoid deleting the original /sbin/init.

Peace, James