Aquarionics’ logging system was designed to work against mod_log_sql, a module that, er, logs to an SQL database. This worked until we upgraded to Apache 2, which log_sql didn’t support until recently. Since part of the logging system is the bit of AqCom that shows who linked here recently, I’d rather not convert it to run off plain text files (though I may be converting it to use Sqlite at some point), so I created a perl script that feeds the log into the database in log_sql’s format. It looks like this:

#!/usr/bin/perl

use DBD::mysql;

#Database options:
$dbUser = "user";
$dbPass = "password";
$dbName = "epistula";

$database = DBI->connect("dbi:mysql:$dbName:localhost:1114", $dbUser, $dbPass);

#204.95.98.252 - - [24/Dec/2003:15:23:38 +0000] "GET /archive/writing/2003/08/
	19 HTTP/1.0" 200 11873 "-" "msnbot/0.11 (+http://search.msn.com/msnbot.htm)"

while (<>) {
  my ($client, $identuser, $authuser, $date, $method,
      $url, $protocol, $status, $bytes, $referer,$agent) =

/^(S+) (S+) (S+) [(.*?)] "(S+) (.*?) (S+)" (S+) (S+) "(.*?)" "(.*?)"$/;
  # ...
        #$database->quote($thisdir);
        $q = "insert into apachelogs (remote_host, remote_user, request_time,
			request_method, request_uri, request_protocol, status, bytes_sent, referer, agent)
        values
        (".$database->quote($client).", ".$database->quote($authuser).", '".$date."', "
			.$database->quote($method).", ".$database->quote($url).", "
			.$database->quote($protocol).", ".$database->quote($status).", "
			.$database->quote($bytes).", ".$database->quote($referer).", "
			.$database->quote($agent).")";

        #print $database->quote($url)."n";
        my $sth = $database->prepare($q);
        $sth->execute();

}

…and is run using this crontab line:

@reboot tail -f /var/log/apache2/www.aquarionics.com | $EPBIN/apache2db.pl &

Now, the important thing to remember is that this gets pretty big pretty quickly, since it logs every line. It’s vitally important that you don’t under any circumstances, forget that you commented out this crontab line:

@daily echo "delete from apachelogs where time_stamp < `date +%Y%m%d --date '1 month ago'`" | mysql epistula

Because otherwise you’ll discover that your daily database dumps start to hit 16Mb each… BZ compressed… 380Mb uncompressed… oh, lets say four months and twelve days later.

For example.

(I ran the above query, or one like it, just before I started this entry. It’s just stopped:

mysql> delete from apachelogs where time_stamp < 20040825;
Query OK, 913830 rows affected (21 min 44.87 sec)

Reformatting for the girlymen who don’t have 2000px wide displays and are reading the RSS feed. See? This is why I want to only do partial content, because that way when I do something like this it only fucks up in IE