_____________________________
MISSA ver.
0.1
Method for Incremental
and Scheduled Statistics for Analog
Specially indicated for
a massive virtual hosting environments.
Also covering Report
Magic customization of results.
_____________________________
0. COPYRIGHT
Copyright (c) 2000 by Jaume
Teixi.
You are free to distribute
this software under the terms of the GNU
General Public License.
1. ADVERTISEMENT
Please consider this method
as a work in progress not already tested on a other systems than:
-Debian
GNU/Linux 2.2 with: Analog 4.01, Report
Magic 1.41, Apache 1.3.12
2. REQUIREMENTS
-Unix like system
with Apache
-Analog and Report
Magic
-Perl-5 or a perl
interpreter
-Cron
3. PURPOSE
We are on a virtual hosting
environment with a lot of domains who generates a lot of web traffic that
Apache is serving and registering on a separate log files for each domain.
We have a separate Analog
configuration file for each domain where as one of the main aspects of
Analog is the customization (the other is the speed:) we have specific
analysis options for each domains (customer needs like focus on some part
of the website, omit analysis of customer own access to its domain, etc.).
As we want some cosmetics
to the reports we also have setup a Report Magic configuration file for
each domain.
Of course we want to rollover
log files once analyzed.
But now we consider analysis of web traffic a task to be done by itself doesn't wasting our daily work time ;-)
So this method consists of:
A cron task will be executed
on the last day of each month, will run Analog analyzing log files for
each domain and adding analysis to previous cached Analog reports, then
will move analyzed log files to an archive folder, will restart apache
in order to startup with new log files, will run Report Magic in order
to make some cosmetic to the reports and will notify webmaster and customer
of the domain that the new report has been generated.
4. ENVIRONMENT
We are running Apache with
a lot of virtual domains
All domains are generating
logs into /var/www/logs/customerdomain1.com.log
and all logs are rotated each week.
Our own company domains
are generating logs into /var/www/logs/ourcompanydomains/companydomain1.com.log
We also have setup Apache
with an alias rule for each virtual host that says:
Alias
/stats /var/www/reports/customerdomain1.com
Read Apache
Documentation on how to handle these things.
5. SCHEDULING CRON
Just edit your /etc/crontab
and add the following:
15 4
28 * * root /etc/rmagic/missa > /var/log/missa.log
2>&1 &
15 4
29 * * root /etc/rmagic/missa > /var/log/missa.log
2>&1 &
15 4
30 * * root /etc/rmagic/missa > /var/log/missa.log
2>&1 &
15 4
31 * * root /etc/rmagic/missa > /var/log/missa.log
2>&1 &
It will just run missa on
days 28 to 31 at 4:15 am and log results into /var/log/missa.log
6. MISSA ORGANIZATION
Creating /etc/rmagic/missa
file:
This is the main file, first
will check if is last day of month and if it is then will start running
our automated Analog and Rmagic files, keeping processed log files on an
Analog cache file and moving processed logs in order to not to process
again.
/etc/rmagic/missa
will contain which missa files to process:
/etc/rmagic/missa_clients
is for processing our clients domains
/etc/rmagic/missa_ours
is for processing our domains (probably on a different path or machine)
/etc/rmagic/missa_total
will process over Analog cache's in order to get global statistics
7. ANALOG SETUP
We have setup separate analog
file for each virtual domain: analog_customerdomain1.com,
analog_customerdomain2.com,
etc. Where we can specify specific report for each customer
Important Analog customization
for Missa:
...
REFREPEXCLUDE
http://www.customerdomain1.com/*
LOGFILE
/var/www/logs/customerdomain1.com.log*
CACHEFILE
/var/www/reports/cache/customerdomain1.com.cache
OUTPUT
COMPUTER
OUTFILE
/var/www/reports/output/customerdomain1.com.dat
CACHEOUTFILE
/var/www/reports/cache/customerdomain1.com.cache.new
HOSTNAME
"customerdomain1.com"
HOSTURL
http://www.customerdomain1.com
...
So we will run Analog tacking
info of previous stats on customerdomain1.com.cache
and we will process logs from customerdomain1.com.log*
that will take actual log customerdomain1.com.log
and rotated logs such as customerdomain1.com.log.10.gz
Analog will produce output
on computer format in customerdomain1.com.dat
and will cache this info on customerdomain1.com.cache.new
Read Analog
docs in order to get more info on it.
8. REPORT MAGIC SETUP
Also we have each Report
Magic config file for each virtual domain: rmagic_customerdomain1.com,
rmagic_customerdomain2.com, and so on.
Important part for Report
Magic - Missa customization:
...
[statistics]
File_In
= /var/www/reports/output/customerdomain1.com.dat
...
[reports]
File_Out
= /var/www/reports/customerdomain1.com/
...
So read Analog report from
customerdomain1.com.dat,
make Report Magic html's cosmetics and output it all on /customerdomain1.com/
Report
Magic documentation will help you on handle these things.
9. THE MISSA PROCESS
You need to create /etc/rmagic/missa_clients:
This file will contain 6 lines for each virtual host:
a: just run analog
for this virtual host with his own customized report:
analog
+G +g/etc/rmagic/analog_customerdomain1.com
b: move processed logs
to another part
mv /var/www/logs/customerdomain1.com.log*
/var/oldlgs/
c: gracefully restart
apache in order to get up with cleaned log files
apachectl
graceful
d: move *.cache.new
to just *.cache because will be historic reports for next month.
mv
/var/www/reports/cache/customerdomain1.com.cache.new /var/www/reports/cache/customerdomain1.com.cache
e: notify webmaster
(I guess if you're reading this: you) and (your) customer through missa_clients_email
perl process.
perl -s
/etc/rmagic/missa_clients_email -Email="info@customerdomain2.com" -Webmaster="webmaster@ourhostingserver.com"
-Servername="customerdomain2.com"
Create /etc/rmagic/missa_ours: is the same for missa_clients but with specific parts for our company domains.
Create /etc/rmagic/missa_total. This file ony will run analog_global and rmagic_global that will process all Analog cached reports for all customers virtual hosts. Then will run analog_total and rmagic_total that will process Analog cached reports from customers plus our company cached reports. And of course will notify for email us about this
Important parts form analog_global:
...
REFREPEXCLUDE
http://www.ourhostingserver.com/*
LOGFILE
/tmp/nothing_logged.log
CACHEFILE
/var/www/reports/cache/*
OUTPUT
COMPUTER
OUTFILE
/var/www/reports/output/global.dat
...
and rmagic_global:
...
[statistics]
File_In
= /var/www/reports/output/global.dat
...
[reports]
File_Out
= /var/www/reports/global/
...
Important parts from analog_total:
...
REFREPEXCLUDE
http://www.ourhostingserver.com/*
LOGFILE
/tmp/nothing_logged.log
CACHEFILE
/var/www/reports/cache/*
CACHEFILE
/var/www/reports/cache/ourcompanydomains/*
OUTPUT
COMPUTER
OUTFILE
/var/www/reports/output/total.dat
...
and rmagic_total:
...
[statistics]
File_In
= /var/www/reports/output/total.dat
...
[reports]
File_Out
= /var/www/reports/total/
...
10. FINAL CONSIDERATIONS:
As on missa_total will run
analog_global and analog_total this means that total report will have all
requests from our customers and our own domains so is analog_global (all
customer doms) plus missa_ours (that runs analog over our own doms), but
this will produce some wrong outputs: for example "Number of Hosts" (Cannot
difference if a host has requested a domain from our customers and a domain
for our company, will produce 2 counts when really is 1). But for bytes
and requests we will have a good global summary.
As stayed above this is a
work in progress and probably you will find some ease improvements to this
so please sent it to me.
Thanks.
_____________________________