Building a stratum 1 NTP server on EC2

I won’t even try to pretend I’m not a huge geek here…

I’ve run public NTP servers for ages, and been somewhat intrigued ever since a dedicated server ended up misclassified as being in Brazil and got an enormous volume of traffic, but without impacting performance. I have a DigitalOcean droplet in Singapore that’s serving a few terabytes a month of NTP queries; the pool tells me it’s about 3% of pool traffic for all of Singapore.

NTP servers are categorized into strata, essentially indicating how far down in the chain a clock is. A stratum 1 server gets its time directly from a (non-NTP) reference source, and a system syncing to a stratum 1 server becomes stratum 2, and so forth. (As an aside, strata don’t actually have enormous significance: a local stratum 3 is probably more accurate than a stratum 1 on the other side of the globe.)

For years, EC2 instances have had Time Sync available over a link-local address. This, incidentally, has proven the point about server stratum not being everything: the Time Sync server is generally at stratum 3, but I infer that everything up to stratum 1 occurs within the availability zone, if not data center, level. Given the link-local Time Sync address and some good stratum 1 clocks within the country, my servers running in AWS typically end up syncing to Time Sync. I have a couple servers in the pool at stratum 4 as a result. It’s further down “the chain,” but it’s a particularly good chain.

But more recently, Amazon has been making PTP available on certain instance types in certain regions. While PTP can be succinctly described as a more precise version of NTP (it is, after all, the Precision Time Protocol), it’s a lot more precise, making use of hardware timestamping the whole way, and can therefore achieve sub-microsecond accuracy.

And so, I spun up an instance in Malaysia and put it in the pool. I’ve kept it at only 512 kbps because there were only 3 IPv4 servers in all of Malaysia, and AWS bandwidth pricing is obscene. It has the PTP Hardware Clock (PHC) set up as a reference source, and is reporting offsets in the range of a handful of nanoseconds.

I went a step further and configured it for NTS. This required a hostname, and in trying to use Porkbun to find a novel domain name, I stumbled over the fact that ntpservers.org was open. Obviousy, I went ahead and registered it.

So now there is malaysia-1.ntpservers.org online, as a stratum 1 NTP server in Malaysia supporting NTS.

As an aside: I ended up bringing up malaysia-2.ntpservers.org as well in a (seemingly paradoxical) attempt to save money. It’s a cheap instance as a provider including 2TB of bandwidth for under $10/month (USD). Amusingly, its default config synced to the pool and had selected my stratum 1 server. I cleaned the config up slightly to point to some stratum 1 clocks in Japan an Singapore, and it’s been fielding a lot more traffic from the pool at a higher bandwidth setting.

I have not yet listed it anywhere outside registering it in the pool, because it’s currently more of a novelty than a long-term committment. I am tempted to eventually withdraw malaysia-1 from the pool but list it as an available stratum 1 supporting NTS on the relevant sites to hopefully cut down bandwidth costs and make it reasonable to maintain long-term.

Right now the other AWS regions supporting PTP instances are all areas well-served by stratum 1 clocks: Tokyo and the United States. It will be interesting to see if it becomes available in, say, India, where the available options are more limited.

It’s back!

While the blogs aren’t updated very often, at some point they stopped working an just displayed a 502 error.

If I’m being honest, I have not idea what happened. The old setup was incredibly convoluted: a CDN in front of an Amazon ALB, in front of an oversized EC2 instance, using EFS for storage and an RDS instance for the database. The site was extremely behind on updates, and relied on a couple of separate custom plugins to properly render everything. At some point something in the chain stopped working.

Rather than debug it, I did something I’ve wanted to do for ages: I ripped everything apart and just moved it to a cheap shared hosting account at Dathorn. (Where “cheap” refers to price, not quality. I love Dathorn.) I kept Bunny in front as a CDN, in large part because the blog is basically static these days and because my prepaid $25 account will still last more than a year at current traffic levels. I even fixed (I think?) an issue where the fonts on my blog were being pulled from a defunct @font-face provider. Don’t expect a flurry of updates here, but with any luck, the site will stay online now!