One hosting service I have contains several domains. In order to understand which of these web sites was the most trafficked, I needed to analyze the logs a bit. There are sever Apache log analysis solutions out there, but frankly I just needed some basic information, and didn’t want a software bloated with a lot of feature 99% of which I didn’t need.

Perl came to the rescue, and in some 15 minutes I wrote a working script.

What I needed to know

Basically, I wanted to know the number of hits and the bytes transferred for each of the web sites, in order to make a ranking.

The Apache log is something that is a standard now, and is something like this:

75.137.96.53 - - [16/Dec/2021:00:01:16 -0500] "GET /assets/js/main.js?v=5.6.0 HTTP/1.1" 200 6696 "https://www.mydomain1.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) Gecko/20100101 Firefox/95.0"

Being a shared Apache, my log also has an extra field at the beginning, with the various domain names:

www.mydomain1.com 75.137.96.53 - - [16/Dec/2021:00:01:16 -0500] "GET /assets/js/main.js?v=5.6.0 HTTP/1.1" 200 6696 "https://www.mydomain1.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) Gecko/20100101 Firefox/95.0"

The script

The script is meant to be used feeding a log on its standard input, so for instance.

# tail -1000 access_log | process_log.pl

Here goes the script, with some comments where needed.

use Path::Class qw/file/;
use Number::Bytes::Human qw/format_bytes/;
use Text::Table;
use Arthas::Defaults::520;

my $sites;
my $totbytes = 0;

while (my $row = <STDIN>) {
    chomp $row;

    # Regex which parses a line of the Apache log.
    my (
       $sitename, $clientip,  $rfc1413, $username, 
       $when,     $reqstring, $status,  $bytesout,
       $referer,  $useragent
    ) = $row =~ /^(\S+) (\S+) (\S+) (\S+) \[(.+)\] \"(.+)\" (\S+) (\S+) \"(.*)\" \"(.*)\"/o;

    $sites->{$sitename} //= {
       hits    => 0,
       okhits  => 0,
       bytes   => 0,
    };
    $sites->{$sitename}->{hits}++;

    # Don't care about byte count on non-OK hits. Maybe we should add that as well, but... no.
    next if $status != 200;

    $sites->{$sitename}->{okhits}++;
    $sites->{$sitename}->{bytesout} += $bytesout;
    $totbytes += $bytesout;
}

# Create a nicely-formatted table
my $tb = Text::Table->new('Site','OK Hits','Transfer Out');

# Only display the first 10 entries, sorted by number of 200 OK hits
my $ii = 0;
for my $sk(
    sort { $sites->{$b}->{okhits} <=> $sites->{$a}->{okhits} } keys %$sites
) {
    my $sv = $sites->{$sk};
    my $tout = format_bytes($sv->{bytesout});
    $tb->add($sk, $sv->{okhits}, $tout);
    last if $ii == 9;
    $ii++;
}

say $tb;

# We also want the totals, why not?
my $totbytes_h = format_bytes($totbytes);
say "TOTAL TRANSFER: $totbytes_h";

We use a couple of nice Perl modules here, which are Number::Bytes::Human, used to automatically convert bytes in something easier to read, and especially Text::Table, which produces a great nicely-formatted table on the console.

Here’s the output:

Site                             OK Hits Transfer Out
www.domain1.com                  128360  2.6G        
www.mydomain2.com                108854  3.5G        
www.thedomain3.it                98164   4.2G        
www.ourdomain4.com               59108   6.6G        
www.infodomain.info              35417   560M        
www.custdomain.com               29424   520M        
www.shopdomain.it                20547   623M        
www.bardomain.it                 12633   403M        
www.bdm.it                       11330   292M        
basehostx.com                    10030   1.4M   
TOTAL TRANSFER: 29G

Text::Table also supports borders for cells and other niceties, which I didn’t use.

This quick and dirty solution is very easy to adapt and improve, for example to show the IPs which hit the server most, grouped by web site.