[PATCH 0/5] Fix ntpd to not poll frequently

Tue Sep 30 09:44:55 UTC 2014

On Mon, Sep 29, 2014 at 03:43:25PM +0200, Denys Vlasenko wrote:
> My view is that short poll intervals are okay to use only
> at startup. We should grow poll interval soon, and quickly grow it
> to somewhere around 5...20 minutes. This is critical to avoid
> NTP server overload.

Agreed.

> But when this is done, further increases in poll interval is
> *not as useful as they were before*. For example,
> going from 30 second interval to 20 minute interval
> reduces NTP traffic by x40.
> Raising interval further, even to huge values such as 12 hours,
> would make almost no change to NTP server load

Well, 12 hours is 36 times longer than 20 minutes. If you are saying
that it makes a little difference for a typical public NTP server as
it's used by other clients with much shorter polling interval, then I
agree.

> (it will anyway
> be dominated by traffic from machines which can't increase
> their poll interval because of bigger delay (poorer network
> connectivity).

With bad network connectivity (large jitter) well written clients will
actually increase their polling interval as the measured offsets have
a large error and the clock needs a longer time to drift that much. A
shorter polling interval would not improve offset, only increase the
frequency error. This is what the 5th patch in the set was trying to
implement.

> I think the logic of changing interval should be *different*
> for short intervals and for long intervals. Up to some
> threshold value (say 10 mins), increase should be agressive.
> They it should become moderate. After another threshold,
> (say 1 hour), it should become aggressive in _dropping_
> interval.
> 
> This way, it is possible to avoid flooding NTP servers, yet
> keep users reasonably happy (avoid the situation
> where user complains "wtf, I have ntpd running,
> it'd been three hours since it saw first packet with
> offset of 25 seconds and the damn thing still
> does hell knows what instead of fixing my clock error").

Yes, but it shouldn't drop the poll before the offset is known (i.e.
the server is reachable again).

> Do you object to the above logic? What, in your opinion,
> is wrong with it?

The logic is fine, we just don't seem to agree in what cases the poll
should be shortened.

> > The previous interval multiplied by a constant > 1.0, up to a maximum.
> > Rounding on the log scale is ok and the maximum could be shorter than
> > 4096 sec, but if it's already longer than that, I'd say it shouldn't
> > be shortened.
> 
> Such logic would treat very small poll intervals and very
> large ones the same. It will try to play nice with NTP servers
> even when poll interval is already so big that NTP server overload
> is very unlikely.

I'm not sure, I think it would be safer to follow the rule that if you
don't get a reply, you shouldn't ask again sooner than you would
otherwise.

At this point I'm mostly concerned with the problem that nothing
increases the interval from the minimum (32 seconds) when the server
is unreachable. What are your thoughts on that?

-- 
Miroslav Lichvar