Category > Perl
Probably Exceptionally Ridiculous Language
Geek at Christmas
So, as is traditional I spend my christmas holidays playing with epistula. Now I have referer tracking turned working again.
The problem with referer tracking is extracting the data from log files. When the server had mod_log_sql it was easy (I have an entire log stats suite built for mod_log_sql), but since log_sql doesn’t support Apache 2 yet (A patch to make it do so was released yesterday. It remains untested) I had to brush off my extremely limited perl skillz to create this, a perl program to send apache logs to mysql:
#!/usr/bin/perl
use DBD::mysql;
#Database options:
$dbUser = "username";
$dbPass = "password";
$dbName = "database";
$database = DBI->connect(
"dbi:mysql:$dbName:localhost:1114",
$dbUser, $dbPass
);
while (<>) {
my ($client, $identuser, $authuser, $date, $method,
$url, $protocol, $status, $bytes, $referer,$agent) =
/^(S+) (S+) (S+) [(.*?)] "(S+) (.*?) (S+)" (S+) (S+) "(.*?)" "(.*?)"$/;
$q = "insert into apachelogs
(remote_host, remote_user, request_time, request_method,
request_uri, request_protocol, status, bytes_sent, referer, agent)
values
(".$database->quote($client).", ".$database->quote($authuser).", '"
.$date."', ".$database->quote($method).", ".$database->quote($url)
.", ".$database->quote($protocol).", ".$database->quote($status)
.", ".$database->quote($bytes).", ".$database->quote($referer)
.", ".$database->quote($agent).")";
my $sth = $database->prepare($q);
$sth->execute();
}
Those who spoke on this:
Logging
Aquarionics' logging system was designed to work against mod_log_sql, a module that, er, logs to an SQL database. This worked until we upgraded to Apache 2, which log_sql didn't support until recently. Since part of the logging system is the bit of AqCom that shows who linked here recently, I'd rather not convert it to run off plain text files (though I may be converting it to use Sqlite at some point), so I created a perl script that feeds the log into the database in log_sql's format. It looks like this:
#!/usr/bin/perl
use DBD::mysql;
#Database options:
$dbUser = "user";
$dbPass = "password";
$dbName = "epistula";
$database = DBI->connect("dbi:mysql:$dbName:localhost:1114", $dbUser, $dbPass);
#204.95.98.252 - - [24/Dec/2003:15:23:38 +0000] "GET /archive/writing/2003/08/
19 HTTP/1.0" 200 11873 "-" "msnbot/0.11 (+http://search.msn.com/msnbot.htm)"
while (<>) {
my ($client, $identuser, $authuser, $date, $method,
$url, $protocol, $status, $bytes, $referer,$agent) =
/^(S+) (S+) (S+) [(.*?)] "(S+) (.*?) (S+)" (S+) (S+) "(.*?)" "(.*?)"$/;
# ...
#$database->quote($thisdir);
$q = "insert into apachelogs (remote_host, remote_user, request_time,
request_method, request_uri, request_protocol, status, bytes_sent, referer, agent)
values
(".$database->quote($client).", ".$database->quote($authuser).", '".$date."', "
.$database->quote($method).", ".$database->quote($url).", "
.$database->quote($protocol).", ".$database->quote($status).", "
.$database->quote($bytes).", ".$database->quote($referer).", "
.$database->quote($agent).")";
#print $database->quote($url)."n";
my $sth = $database->prepare($q);
$sth->execute();
}
...and is run using this crontab line:
@reboot tail -f /var/log/apache2/www.aquarionics.com | $EPBIN/apache2db.pl &
Now, the important thing to remember is that this gets pretty big pretty quickly, since it logs every line. It's vitally important that you don't under any circumstances, forget that you commented out this crontab line:
@daily echo "delete from apachelogs where time_stamp < `date +%Y%m%d --date '1 month ago'`" | mysql epistula
Because otherwise you'll discover that your daily database dumps start to hit 16Mb each... BZ compressed... 380Mb uncompressed... oh, lets say four months and twelve days later.
For example.
(I ran the above query, or one like it, just before I started this entry. It's just stopped:
mysql> delete from apachelogs where time_stamp < 20040825; Query OK, 913830 rows affected (21 min 44.87 sec)
Reformatting for the girlymen who don't have 2000px wide displays and are reading the RSS feed. See? This is why I want to only do partial content, because that way when I do something like this it only fucks up in IE
- 2004-09-25 13:32:13
- By Aquarion
- From Casarufus, Letchworth
- More Journal Entries
- Filed under Aqcom, Perl & Programming
paul:
I took a look at this and found the regex didn’t quite work: I came up with this one.
(I would have wrapped this in pre tags, but I read the warning: no innocent pair of angle brackets deserves that).
while () {
my ($client, $identuser, $authuser, $date, $method, $url, $protocol, $status,$bytes, $referer, $agent) =/(S+).*? (S+) (S+) [(.*?)] “(S*) (S+) (S+)” (S+) (S+) ”(S+)” ”(.*)?”/;