Timestamp limitations when processing RFC3164 formatted logs

I've been working on a log viewing/searching application on-and-off the last few months for sol1, and one of the things that been bugging me is processing RFC3164 timestamps.

According to RFC3164, the TIMESTAMP field in the HEADER section of a syslog packet is in the format of "Mmm dd hh:mm:ss" (e.g. Oct 11 22:14:15). Astute readers will note that there is no year field in the timestamp.

This can be most annoying when you're processing years worth of logs - especially when logs haven't been rotated and a single log file contains messages that span multiple years. To process that log file you end up messing with stat's reported atimes, mtimes, and ctimes, which can get very cludgy very quickly.

Of course, the writers of the RFC thought about this problem, and allowed syslog protocol implementors to specify the year in a number of ways:

  • Write the year out to the beginning of the CONTENT field in the syslog message. Something like
    Aug 24 05:34:00 2001 quasimodo postfix/smtpd[25077]: connect from rusty.slug.org.au[202.177.212.193]
    
  • Format the TIMESTAMP in ISO 8601. This breaks compatibility with other syslog daemons that choke on TIMESTAMP formats other than the RFC default.
  • Do post-logging processing of the file and apply some tricks to guess the year. All of the tricks are based on manipulating the output of stat, which forces you to either not touch the original file, or take a snapshop of stat output before moving it. This makes it cludgier if you want to do offline processing on a separate machine.

The first two options are the more elegant solutions, but the default syslog (sysklogd) for the more common Linux distros don't support either of these options.

Fortunately there are other syslog implementations out there that do implement these features. Hooray for open source! The two most popular alternatives are syslog-ng and rsyslog. Rsyslog is my personal favourite, particularly because of the MySQL backend.

Awesomely, while reading the release notes for Fedora 8 I noticed Fedora is switching to rsyslog by default. Hopefully this is a catalyst for other distros to switch to a 21st century syslog implementation.