It's About Time

Our awesome sysadmin team did some serious overtime over the weekend, thanks to a fun little leap second bug. It took down a scary number of servers, though fortunately our most important external public services escaped largely unscathed (mostly thanks to a high level of redundancy). I too lost a server to this bug and had to spend a little while dealing with the fallout.

Things like this do serve as an important reminder of the sometimes startling effects of invalid assumptions when applied to computers, eg. the assumption that there are always 60 seconds in a minute (though in this particular case the actual kernel bug was far more complicated than that).

Addendum: Bron Gondwana of our FastMail team has now written an excellent write-up of the leap-second incident.

Posted: 2012-07-02 00:20:50 UTC by Xiven




  Xiven (2012-07-02 15:11:33 UTC)

    Also had 1 java installation go CPU crazy on one of our development servers.