[PATCH 0/5] Fix ntpd to not poll frequently

Wed Sep 24 14:13:14 UTC 2014

Hi Miroslav,

Thanks for testing bbox's ntpd!

On Thursday 18 September 2014 16:19, Miroslav Lichvar wrote:
> I was testing the busybox NTP implementation to see if it doesn't poll
> the servers too frequently. This is a big problem for public NTP
> servers, where a small percentage of bad clients can take most of the
> resources.
> 
> Unless a shorter polling interval is needed to keep the clock well
> synchronized, an NTP client should be normally always slowly
> increasing the polling interval up to a maximum, usually 1024 seconds
> or more.

I don't completely agree with this.
Not overloading the servers is a good goal, but it should be balanced by
the need to syncronize local clock reasonably quickly.

My experience with ISC ntpd (admittedly somewhat dated) is that it
didn't try to do that hard enough. Somehow it seemed to its authors
that "we need several minutes to sync the clock" is resonable.
It is not. Think about it. If you are setting the mechanical clock
by looking at another, (presumably) correct clock, how long does it take?
Few seconds, not minutes.

Keeping this in mind, bbox ntpd currently does a few things to speed up
clock sync. Such as "revert to MINPOLL polling interval if we step the clock".
The rationale is that if ntpd does discover that step is needed,
something unusual happened. Such as my laptop hibernating:
apparently my CMOS clock is busted, it doesn't "tick".
So after hibernating, the clock is off by at least a few seconds,
sometimes much more. ntpd needs to basically start syncing anew.
If it would do it with one request per 20 minutes, it won't go
"reasonably fast", right?

> In my testing, the busybox ntpd is working mostly very well, the
> polling interval usually reaches the maximum of 4096 seconds, but it
> seems there are some cases where it can get stuck at a much shorter
> interval or decrease the interval unnecessarily.
> 
> This patch set is an attempt to handle these cases better. Testing was
> done in a simulator which is available here:
> 
> https://mlichvar.fedorapeople.org/clknetsim/
> 
> There are other improvements that could be made, like controlling the
> interval for each peer separately, but I thought this would be a step
> in the right direction.
> 
> Miroslav Lichvar (5):
>   ntpd: don't wait for good offset before disabling burst mode

Applied, thanks!

>   ntpd: don't reset polling interval unnecessarily

"""
Don't reset the polling interval to the minimum when all peers are
unreachable or the clock was stepped to avoid frequent polling.
"""

If all peers are unreachable, most likely it is a network problem.
Who know how long it lasted? What if it lasted many hours?
I do want to syncronize my clock soon after network problem is fixed,
not 20 minutes after that.

>   ntpd: split out poll adjusting code

Applied, thanks.

>   ntpd: keep increasing polling interval

"""
Keep increasing the polling interval in the following situations:
- no replies are received from a peer
- no source can be selected
- peer claims to be unsynchronized (e.g. we are polling it too
  frequently)
- recv() returns with an error (e.g. the host doesn't exist or is not
  running an NTP service)
"""

I am not sure any of these conditions warrant increasing poll interval.

Can you explain why you think it should be done?

>   ntpd: don't decrease polling interval with large jitters

This one won't apply because I didn't apply patches 2 and 4.

-- 
vda