RFD: Rework/extending functionality of mdev

Mon Mar 16 21:25:04 UTC 2015

Le 16/03/2015 19:18, Harald Becker a écrit :
> On 16.03.2015 10:15, Didier Kryn wrote:
>
>>> 4) netlink reader the Unix way
>>>
>>> Why let our netlink reader bother about where he sends the event
>>> messages. Just let him do his netlink receiption job and forward the
>>> messages to stdout.
>>>
>>> netlink reader:
>>>    set stdout to non blocking I/O
>>>    establish netlink socket
>>>    wait for event messages
>>>      gather event information
>>>      write message to stdout
>>>
>>> hotplug startup code:
>>>    create a private pipe
>>>    spawn netlink reader, redirect stdout to write end of pipe
>>>    spawn fifosvd - xdev parser, redirect stdin from read end of pipe
>>>    close both pipe ends (write end open in netlink, read in fifosvd)
>
>
>>      1) why not let fifosvd act as the startup code? It is anyway the
>> supervisor of processes at both ends of the pipe and in charge of
>> re-spawning them in case they die. Netlink receiver should be restarted
>> immediately to not miss events, while event handler should be restarted
>> on event (see comment below).
>
> This would make the fifosvd specific to the netlink / hotplug 
> function. My intention is, to get a general usable tool.

     I had not caught the point that you wanted it a general purpose 
tool - sorry.
>
> You won't gain something otherwise, as the startup of the daemon has 
> to be done nevertheless. It does not matter if you start fifosvd, and 
> then it forks again bringing it into background, and then fork again 
> to start the netlink part, or do it slight different, start an inital 
> code snipped, that do the pipe creation and the forks (starting the 
> daemons in background), then step away. This is same operation only 
> moved a bit around, but may be not blocking other usages.

     Sure it is the same. My point was about supervision.
>
> The netlink reader is a long lived daemon. It shall not exit, and 
> handle failures internally where possible, but if it fails, pure 
> restarting without intervening other action to control / correct the 
> failure reason, doesn't look as a good choice. So it needs any higher 
> instance to handle this, normally init or a different system 
> supervisor program (e.g. inittab respawn action).

     OK, then this higher instance cannot be an ordinary supervisor, 
because it must watch two intimely related processes and re-spawn both 
if one of them dies. Hence, it is yet another application. This is why I 
thought fifosvd was a good candidate to do that. Also because it already 
contains some supervision logic to manage the operation handler.

     So, if fifosvd is a general usable tool, it must come with a 
companion general usable tool, let's call it fifosvdsvd, designed to 
monitor pairs of pipe-connected daemons.
>
> Where as the device operation handler (including conf parser) is 
> started on demand, when incoming events require this. The job of the 
> fifosvd is this on demand pipe handling, including failure management.
>
>
>>      2) fifosvd would never close any end of the pipe because it could
>> need them to re-spawn any of the other processes. Like this, no need for
>> a named pipe as long as fifosvd lives.
>
> Dit you look at my pseudo code? It does *not* use a named pipe (fifo) 
> for netlink operation, but a normal private pipe (so pipesvd may fit 
> better it's purpose). Where as hotplug helper mechanism won't work 
> this way, and require a named pipe (different setup, by just doing 
> slight different startup).

     Yes, but it cannot work if the two long-lived daemons are 
supervised by an ordinary supervisor. Because one end of the pipe is 
lost if one of the processes die, and this kind of supervisor will 
restart only the one which died.
>
>
>>      And I have a suggestion for simplicity: Let be the
>> timeout/no-timeout feature be a parameter only for the event handler; it
>> does not need to change the behaviour of fifosvd. I think it is fine to
>> restart event handler on-event even when it dies unexpectedly.
>
> ???

     At some point you considered that the operation handler might be 
either long-lived or dieing on timeout. I suggest that the supervision 
logic is identical in the two cases.

     Didier