[Bug 13856] New: Wrong PID written by start-stop-daemon -S -b -m -p

bugzilla at busybox.net bugzilla at busybox.net
Tue Jun 15 16:32:38 UTC 2021


https://bugs.busybox.net/show_bug.cgi?id=13856

            Bug ID: 13856
           Summary: Wrong PID written by start-stop-daemon -S -b -m -p
           Product: Busybox
           Version: 1.33.x
          Hardware: Other
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P5
         Component: Other
          Assignee: unassigned at busybox.net
          Reporter: mwadsten at digi.com
                CC: busybox-cvs at busybox.net
  Target Milestone: ---

Created attachment 9001
  --> https://bugs.busybox.net/attachment.cgi?id=9001&action=edit
.config file used in build

After updating an embedded Linux system (using a Linux kernel 2.6.35) from
busybox 1.20.2 to 1.33.1, I found that certain init scripts using
start-stop-daemon were causing messages like "start-stop-daemon: warning:
killing process XXX: No such process" to be printed to the console.

My investigation found that the PID file created when using start-stop-daemon
-S -b -m -p PIDFILE has the wrong PID. The PID inside the file would be 2 less
than the correct PID.

I am able to demonstrate the issue with the following:

    /tmp # cat ssd-test.sh
    #!/bin/ash

    exec >> /tmp/ssd-test.txt
    echo ssd-test: PID is $$

    sleep 5

    exit 0
    /tmp # start-stop-daemon -S -q -b -m -p /tmp/test.pid -x /tmp/ssd-test.sh ;
ps ax|grep ash ; sleep 1; tail /tmp/ssd-test.txt ; cat /tmp/test.pid
     2478 root      0:00 {ssd-test.sh} /bin/ash /tmp/ssd-test.sh
     2480 root      0:00 grep ash
    ssd-test: PID is 2478
    2476
    /tmp #

As you can see, the process has PID 2478, but the PID file says 2476.

I determined that the changes in this commit (specifically switching to xvfork
on all systems) is the cause:
https://git.busybox.net/busybox/commit/?id=088fec36fedff2cd50437c95b7fb430abf8d303c
. Changing the uses of xvfork in debianutils/start-stop-daemon.c to xfork
resolves the issue.

This might be a compatibility issue with the 2.6.35 kernel, or whatever version
of glibc is present on this platform. I am unable to reproduce this with a host
build of the same code, running under WSL2 (Ubuntu 20.04; Linux 5.4.72). I had
a coworker try my reproduction on another, newer Linux-based system (kernel
5.x?), and we were unable to reproduce it there either. (That system also uses
musl instead of glibc.)

In any case, it is technically incorrect to use vfork the way that
debianutils/start-stop-daemon.c does, because there is no defined guarantee
that the result of getpid() in the grandchild process is correct.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the busybox-cvs mailing list