n1.taur.dk

NTP tests 2010.04.02 (IPv6 access)

These are tests on ntpd 2.4.6, ntpd 2.4.6p1-RC6, and chrony-git on a pair of Atom machines.

In all cases a local GPSDO was used as time server for the test machines. It has been verified to have <10us error on packet timing.
A separate process tracks the local clock offset at 1Hz update rate. Graphs are in seconds on Y, hours on X.

My interest is not "will this work in any scenario over any latency link", rather my interest is "How does this perform with a high quality low latency timesource, for when I have multiple machines that need to be locked hard", and "Has the PLL tau in the linux kernel become deadly after they 'corrected' it".

below: ntpd reference implementation with kernel discipline (default), 2.4.6 on Linux 2.6.32-10 and 2.4.6p1-RC6 on 2.6.34-0.19.rc2.git. Poll is forced to be fast, 4 => 16 seconds.

below: ntpd reference implementation, "disable kernel" in configuration. It appears as if the non-kernel pll in ntpd has a much longer time constant than the one in the kernel. ?: If the kernel tau is right, then ntpd is very heavily damped, and nowhere near the 6% overshoot Mills talks about. If ntpd's tau is right, then the kernel will oscillate at long update intervals.

below: ntpd reference implementation, with the default kernel loop, and poll=10. ?: This looks like the 6% overshoot Mills talks about, though that is more like 20%. The loop gain shouldn't be upped any more. So the kernel pll appears to be performing - now I need to test if it will become unstable at long tau.

below: ntpd reference implementation, with the default kernel loop, and poll=13 (2 hour poll interval). This was chosen to emulate poll=10 with 12% succes rate. ?: No, it does not become unstable. In fact, the overshoot is the same. And as this test takes 24 hours to run, I'll wait a bit before I repeat it on precocious.

below: chronyd with various settings. Settling time in this particular scenario (sync against a low latency high performance clock) is very fast. ?: I looks like chrony trusts its reference unconditionally. When the error is outside of the window of possibility, it will stop trying to be a PLL and forcibly correct. It seems a bit more noisy. Only, I would not expect it to be more noisy in the period _between_ updates of the clock, and in particular it is more noisy when chronyd is not yet started. So I may need to retest without TV streaming across the network.
If you do not mind the network load of chrony set to 1Hz startup, then you get almost instant sub-200us sync. On my local network, that's fine, the GPSDO will handle 10k requests per second.

?: So ntpd will behave as a PLL unless it exceeds its reset interval (default 0.128s), where chronyd will behave as a PLL unless it exceeds the uncertainty interval of the reference.