gnuplot 2d

edit /etc/apache2/mod_log_config.conf and add a new logformat (plots) – (“mod_log_config.conf” is referenced by “httpd.conf”)

LogFormat “%h %l %u %t “%r” %>s b  “{Referer}i” “%{User-Agent}i”” combined LogFormat “%h %{%d.%m.%Y:%H:%M:%S}t %D %U” plots

The characteristics of the “%” directives are:

…h Remote host…{format}t time & date

…D time taken to serve the request, in microseconds.…U URL path requested

Add another “customlog” directive to your “virtualhost” section in “httpd.conf”:

DocumentRoot /home/h/hensler.net/public_html/bernhard/ ServerName bernhard.hensler.net

IndexOptions

DirectoryIndex index.htm index.html index.shtml start.htm start.html start.shtm index.php

CustomLog “/usr/local/visas/logfiles/hensler.net/%Y/%m/%d/access_log” vhost_combined

CustomLog “/usr/local/visas/logfiles/hensler.net/bernhard.access_log” plots

Concatenate logs from all virtual hosts e.g.: cat hensler.access_log niko.access_log bernhard.access_log max.access_log > plot_log (sample line: 66.249.111.111 30.08.2009:14:15:17 4372853 /blog/) and start gnuplot from the command line:

$ gnuplot

reset set terminal png small color

set output “2dplot.png”

set title “average response time”

set style data points

set pointsize 1

set grid

set xlabel “time”

set timefmt “%d.%m.%Y:%H:%M:%S”

set format x “%H:%Mn%d/%b”

set xdata time

set xrange [ “30.08.2009:00:00” : “30.08.2009:23:59” ]

set ylabel “response time”

set yrange [ 0 : 10000 ]

plot “/usr/local/visas/logfiles/hensler.net/plot_log” using 2:3 title “2d”

gnuplot 3d

Then read this excellent article about “A New Visualization for Web Server Logs” and create a perl script:

#
# prepare-for-gnuplot.pl: convert access log files to gnuplot input
# Raju Varghese. 2007-02-03

use strict;

my $tempFilename = “./tmp/temp.dat”;
my $ipListFilename = “./tmp/iplist.dat”;
my $urlListFilename = “./tmp/urllist.dat”;

my (%ipList, %urlList);

sub ip2int {
my ($ip) = @_;
my @ipOctet = split (/./, $ip);
my $n = 0;
foreach (@ipOctet) {
$n = $n*256 + $_;
}
return $n;
}

# prepare temp file to store log lines temporarily
open (TEMP, “>$tempFilename”);

# reads log lines from stdin or files specified on command line

while (<>) {
chomp;
my ($ip, $time, $D, $url, $sc) = split;
$time =~ s/[//;
next if ($url =~ /(gif|jpg|png|js|css)$/);
print TEMP “$ip $time $D $url $scn”;
$ipList{$ip}++;
$urlList{$url}++;
}

# process IP addresses

my @sortedIpList = sort {ip2int($a) <=> ip2int($b)} keys %ipList;
my $n = 0;
open (IPLIST, “>$ipListFilename”);
foreach (@sortedIpList) {
++$n;
print IPLIST “$n $ipList{$_} $_n”;
$ipList{$_} = $n;
}
close (IPLIST);

# process URLs

my @sortedUrlList = sort {$urlList {$b} <=> $urlList {$a}} keys %urlList;
$n = 0;
open (URLLIST, “>$urlListFilename”);
foreach (@sortedUrlList) {
++$n;
print URLLIST “$n $urlList{$_} $_n”;
$urlList{$_} = $n;
}
close (URLLIST);

close (TEMP); open (TEMP, $tempFilename);
while () {
chomp;
my ($ip, $time, $D, $url, $sc) = split;
print “$time $ipList{$ip} $urlList{$url} $scn”;
}
close (TEMP);

Run this perl script and redirect output to a file from the command line:

$ perl gnuplot.pl “/usr/local/visas/logfiles/hensler.net/bernhard.access_log” > gnuplot.input

The fields in gnuplot.input, the output file of the Perl script, are date/time, ip rank, url rank.

Run gnuplot from the command line: $ gnuplot and the following commands:

reset
set terminal png small color
set output “3dplot.png”
set style data dots
set xdata time
set timefmt “%d.%m.%Y:%H:%M:%S”
set zlabel “Content”
set ylabel “IP address”
splot “gnuplot.input” using 1:2:3 title “3d”

Image taken from oreillynet, my website is not producing sufficient data …

  • X, the time axis–a full day from midnight to midnight of November 16.
  • Y, the requester’s IP address, with the conventional dotted decimal format sorted and given an ordinal number between 1 and 120,000, representing the number of clients that accessed the web server.
  • Z, the URL (or content) sorted by popularity. Of the approximately 60,000 distinct pages on the site, the most popular URLs are near the zero point of the Z-axis and the least popular ones at the top.

http://www.ibm.com/developerworks/linux/library/lgnuplot
http://www.oreillynet.com/pub/a/sysadmin/2007/02/02/3d-logfile-visualization.html?page=1
http://phasorburn.com/index.php/archive/excel-0-gnuplot-1

A final step will cover loadrunner tools like openSTA and jmeter.

See also Part I of this tutorial.