RFD: Rework/extending functionality of mdev

Mon Mar 16 23:32:40 UTC 2015

On 16.03.2015 22:25, Didier Kryn wrote:
>      I had not caught the point that you wanted it a general purpose
> tool - sorry.

It's a lengthy and complex discussion, not very difficult to miss 
something, so no trouble. Please ask, when there are questions.

>> The netlink reader is a long lived daemon. It shall not exit, and
>> handle failures internally where possible, but if it fails, pure
>> restarting without intervening other action to control / correct the
>> failure reason, doesn't look as a good choice. So it needs any higher
>> instance to handle this, normally init or a different system
>> supervisor program (e.g. inittab respawn action).
>
>      OK, then this higher instance cannot be an ordinary supervisor,
> because it must watch two intimely related processes and re-spawn both
> if one of them dies. Hence, it is yet another application. This is why I
> thought fifosvd was a good candidate to do that. Also because it already
> contains some supervision logic to manage the operation handler.

Supervision is a system depended function, which differs on the 
philosophy the init process is working and handles such things. So 
before we are talking about which supervision, we need to tell, which 
type of supervision you are using, that is mainly which init system you use.

>      So, if fifosvd is a general usable tool, it must come with a
> companion general usable tool, let's call it fifosvdsvd, designed to
> monitor pairs of pipe-connected daemons.

A pipe is an unidirectional thing. Writing a program, sitting at the 
read end of a pipe, to watch the other side is logical mixing of 
functions, but ...

>> Where as the device operation handler (including conf parser) is
>> started on demand, when incoming events require this. The job of the
>> fifosvd is this on demand pipe handling, including failure management.
>>
>>
>>>      2) fifosvd would never close any end of the pipe because it could
>>> need them to re-spawn any of the other processes. Like this, no need for
>>> a named pipe as long as fifosvd lives.
>>
>> Dit you look at my pseudo code? It does *not* use a named pipe (fifo)
>> for netlink operation, but a normal private pipe (so pipesvd may fit
>> better it's purpose). Where as hotplug helper mechanism won't work
>> this way, and require a named pipe (different setup, by just doing
>> slight different startup).
>
>      Yes, but it cannot work if the two long-lived daemons are
> supervised by an ordinary supervisor. Because one end of the pipe is
> lost if one of the processes die, and this kind of supervisor will
> restart only the one which died.

... you are wrong. When the netlink process dies, the write end of the 
pipe is automatically closed by the kernel. This let at first the 
handler process detect, end-of-file when waiting for more messages, so 
that process exits. fifosvd then checks the pipes and gets an error, 
telling the pipe has shutdown on the write end, so fifosvd does the 
expected thing, it exits too.

Even if that exit may be delayed somewhat, it does not matter, when the 
higher instance respawns the hotplug system due to the netlink exit. The 
new pipe will be established in parallel, while the old pipe (including 
processes) vanish after some small amount of time.

That is your supervision chain is slight different:

- netlink is supervised by the higher instance, but itself watches for 
failures on the pipe (in case the read end dies unexpectedly)

- the supervision of the pipe read side is a bit complexer, as we use an 
on demand handler process, so we have two different cases: a handler 
process is currently active or not:

- when no handler process is active, fifosvd detects a pipe failure of 
the write end immediately and just exit. So there is no need of 
supervision, only some resources have to be freed

- when there is an active handler process, this process is supervised by 
fifosvd, but itself checks for eof on the pipe, and exit. Meanwhile 
waits fifosvd for the exit of the handler process and checks the exit 
status (if successfull or any failure). Nevertheless which way fifosvd 
takes, at the end it detects, the write end of the pipe has gone and 
takes his hat.

so supervision chain is:

init -> netlink -> fifosvd -> handler

>      At some point you considered that the operation handler might be
> either long-lived or dieing on timeout. I suggest that the supervision
> logic is identical in the two cases.

That was an alternative in the discussion, to show how I got to my 
solution and picked up a solution Laurent mentioned. So the 
alternatives, show the steps of improving my approach has gone.

I highly prefer this last one (netlink reader the Unix way). It is the 
version with the most flexibility, and even addresses the wish, to use a 
private pipe and not a named pipe for netlink operation, without adding 
extra overhead for that possibility.

Indeed are the alternatives very similar, do the same principal 
operation, but move around some code a bit, for different purposes, to 
see which impact each alternative may have. They do not implement any 
new thing, and it is not intended to implement them in parallel. The 
alternatives in this message were all for same usage: hotplug handlig 
with netlink reader.

--
Harald