suspected bug in timeout command

David Laight David.Laight at ACULAB.COM
Tue Mar 1 17:06:35 UTC 2022


From: Denys Vlasenko
> Sent: 01 March 2022 16:40
> 
> On Tue, Feb 15, 2022 at 12:31 PM Rob Landley <rob at landley.net> wrote:
> > On 2/14/22 10:09 AM, Roberto A. Foglietta wrote:
> > >  However, if this bug shows-up, probably it means that the system has
> > > a lot of processes running and a lot of processes created and
> > > destroyed compared to the max PID available. Thus, the system might be
> > > incorrectly configured compared with its typical usage which probably
> > > is the main reason because nobody complained before.
> >
> > Nah, a shell script can spin through an awful lot of PIDs pretty fast, and if
> > you're doing a -j 8 build that has a lot of script snippets (let alone parallel
> > autoconf etc) vs something with say a 10 second timeout?
> 
> I try the below and it seems to be able to spawn "only"
> ~1500 processes/second.
> 
> $ time sh -c 'i=30000; while test $((--i)) != 0; do sleep 0 & done 2>/dev/null'
> real    0m19.190s
> user    0m23.062s
> sys    0m6.732s
> 
> My memory is hazy on this, but IIRC kernel also actually has some
> defensive code to not immediately reuse pids which just died.

The Linux krnel only has protection for code inside the kernel.
Basically there is a ref-counted structure that you need to send the
signal - not the pid itself.
I can't quite remember whether the pid itself can be reused even before
that structure is freed.

NetBSD does guarantee not to reuse a pid for a reasonable number
of forks after a process exits.

> It's interesting to find out why pids are reused that fast on the
> affected system.

They are allocated by a simple numeric scan for a free pid.
So it depends on where the scan is and the pid being freed.
It can be reused for the very next fork().

> Meanwhile: what "timeout" is doing is it tries to get out
> of the way of the PROG to be launched so that timeout's parent
> sees PROG (not timeout) as a child. E.g. it can send signals
> to it, get waitpid notifications if PROG has been stopped
> with a signal, and such.

I also believe it is only checking the pid every (few?) seconds.

> And PROG also has no spurious "timeout" child.
> "timeout" exists as an orphaned granchild.
> 
> Let's go with a solution with fd opened to /proc/PID?

I think you need to verify some part of the process state.
Especially for pre-pidfd kernels.
Probably the process start time.
If that changes the pid has been reused.

That gets the timing window down to 'we checked it was the
right process', but the pid was reused before we could send
the signal.
It also requires the process to exit on exactly its timeout.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)



More information about the busybox mailing list