Using:
- The offline package Google Charts (see other page)
- The access.log from apache webserver (I use a json format version, but with some tweaking you could use the default apache logging)
- A SQL database like mysql/mariadb
- Some scripting and a scheduler (like cron) for having it automated
1. Create a new config for writing the apache logging in json format (which I also use for ingesting into ElasticSearch).
vi /etc/apache2/conf-available/access-log.confLogFormat "{\"index\":{}}\n { \"@timestamp\":\"%{%Y-%m-%d}tT%{%T}t.%{msec_frac}tZ\", \"apache\": { \"date\":\"%{%Y-%m-%d}tT%{%T}t.%{msec_frac}tZ\", \"process\":\"%D\", \"filename\":\"%f\", \"remoteIP\":\"%a
\", \"host\":\"%V\", \"request\":\"%U\", \"query\":\"%q\", \"method\":\"%m\", \"status\":\"%>s\", \"userAgent\":\"%{User-agent}i\", \"referer\":\"%{Referer}i\" }}" jsonlog
CustomLog /var/log/apachelog/jsonlog.log jsonlog
2. [Optional] Create a filesystem in memory (because of the possible high number of IO's)
# Temporary for apache logs (also think of cleanup after a few days)
mount -t tmpfs -o size=50M tmpfs /var/log/apachelog
3. Enable the Apache config
a2enconf access-log
systemctl restart apache2
Check the logfile, you should see something like this:
{"index":{}}
{ "@timestamp":"2022-09-14T16:55:00.463Z", "apache": { "date":"2022-09-14T16:55:00.463Z", "process":"90", "filename":"-", "remoteIP":"2.168.xxx.xxx", "host":"192.168.xxx.xxx", "request":"-", "query":"", "method":"-", "status":"408", "userAgent":"-", "referer":"-" }}
{"index":{}}
{ "@timestamp":"2022-09-14T16:55:16.994Z", "apache": { "date":"2022-09-14T16:55:16.994Z", "process":"61422", "filename":"/usr/local/nagios/sbin/tac.cgi", "remoteIP":"2.168.xxx.xxx", "host":"192.168.xxx.xxx", "request":"/nagios/cgi-bin/tac.cgi", "query":"", "method":"GET", "status":"200", "userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36 Edg/105.0.1343.33", "referer":"https://192.168.xxx.xxx/nagios/cgi-bin/tac.cgi" }}
4. Create the database in mysql/mariadb
mysql -u root --password=<PASSWORD>
CREATE DATABASE apachelog;
CREATE USER 'alread'@'localhost' IDENTIFIED BY '<A SELECT PASSWORD>';
GRANT SELECT ON apachelog.* to 'alread'@'localhost';
use apachelog;
CREATE TABLE apachelog (id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY, date DATETIME, ip TINYTEXT, status INT , continent TINYTEXT, country TINYTEXT,province TINYTEXT,city TINYTEXT,postal TINYTEXT,request TEXT );
CREATE TABLE etlapachelog (id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY, date DATETIME , ip TINYTEXT, country TINYTEXT, sumip INT, sumipnok INT );
MariaDB [apachelog]> describe apachelog;
+-----------+------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+------------+------+-----+---------+----------------+
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| date | datetime | YES | | NULL | |
| ip | tinytext | YES | | NULL | |
| status | int(11) | YES | | NULL | |
| continent | tinytext | YES | | NULL | |
| country | tinytext | YES | | NULL | |
| province | tinytext | YES | | NULL | |
| city | tinytext | YES | | NULL | |
| postal | tinytext | YES | | NULL | |
| request | text | YES | | NULL | |
+-----------+------------+------+-----+---------+----------------+
5. Create a script which is scheduled once a day (after midnight at f.e. 04:00)
Copy the next code and make a script (chmod 755 executable;-). Also put in your own password..
I am using mmdblookup from MaxMind for GeoIP location information which I update once a week via:
-------------------------------------------------------------------------------
export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
LICENSEKEY="<Your own license code...you get it for free after registration>"
cd /tmp
wget "https://download.maxmind.com/app/geoip_download?edition_id=GeoLite2-City&license_key="${LICENSEKEY}"&suffix=tar.gz" -O GeoLite2-City.tar.gz 2>/dev/null
wget "https://download.maxmind.com/app/geoip_download?edition_id=GeoLite2-ASN&license_key="${LICENSEKEY}"&suffix=tar.gz" -O GeoLite2-ASN.tar.gz 2>/dev/null
wget "https://download.maxmind.com/app/geoip_download?edition_id=GeoLite2-Country&license_key="${LICENSEKEY}"&suffix=tar.gz" -O GeoLite2-Country.tar.gz 2>/dev/null
for FILE in GeoLite2-City.tar.gz GeoLite2-ASN.tar.gz GeoLite2-Country.tar.gz
do
FILENAME=`tar tvzf ${FILE} |grep mmdb |awk '{ print $6 }'`
tar xvzf ${FILE} --strip-components=1 ${FILENAME} >/dev/null 2>&1
EXTFN=`basename ${FILENAME}`
find /tmp/${EXTFN} -size +5M -exec mv {} /usr/share/GeoIP/${EXTFN} \;
rm ${FILE}
done
exit
----------------------------------------------------------------------------------------------
#!/bin/bash
#
# Description: Script to have apachelog converted to sessions by geo location in database per day
# variables
PATH=/sbin:/usr/sbin:/usr/ccs/bin:/bin:/usr/bin:/etc:/usr/contrib/bin:/users/root:/usr/lib:/usr/lib/acct:/usr/ucb:/usr/dt/bin:/usr/bin/X11
DAY=`date --date=yesterday +%Y-%m-%d`
#DAY="2022-09-09"
DBTABLE=apachelog
ls /var/log/apachelog/jsonlog.log_ES_${DAY}T* >/dev/null 2>&1
if [ "$?" -gt "0" ]
then
echo "The imputfiles do not exist!!!!"
exit 0
fi
zcat /var/log/apachelog/jsonlog.log_ES_${DAY}T* |grep remoteIP | sed "s/\\\/[ESC]/g" |sed "s/'/./g" |sed "s/;/./g" | while read LINE
do
DT=`echo ${LINE} | sed 's/.*@timestamp\":\"//g'| cut -d "," -f 1,1 | sed 's/[TZ]/\ /g' |tr -d '@' | tr -d '"'`
DATE=`date +%Y-%m-%d\ %H:%M:%S --date "${DT}"`
IPLOOKUP=`echo ${LINE} | sed 's/.*remoteIP\":\"//g'| cut -d "," -f 1,1 | tr -d '"'`
STAT=`echo ${LINE} | sed 's/.*status\":\"//g'| cut -d "," -f 1,1 | tr -d '"'`
REQ=`echo ${LINE} | sed 's/.*request\":\"//g'| cut -d '"' -f 1,1 |sed "s/,/./g" | tr -d '"'`
CONT=`mmdblookup --ip ${IPLOOKUP} --file /usr/share/GeoIP/GeoLite2-City.mmdb continent names en 2>/dev/null |grep "\"" |cut -d "\"" -f 2,2 `
if [ -z "${CONT}" ]
then
CONT="NULL"
else
CONT="'${CONT}'"
fi
COUNTRY=`mmdblookup --ip ${IPLOOKUP} --file /usr/share/GeoIP/GeoLite2-City.mmdb country names en 2>/dev/null |grep "\"" |cut -d "\"" -f 2,2 `
if [ -z "${COUNTRY}" ]
then
COUNTRY="NULL"
else
COUNTRY="'${COUNTRY}'"
fi
PROV=`mmdblookup --ip ${IPLOOKUP} --file /usr/share/GeoIP/GeoLite2-City.mmdb subdivisions 2>/dev/null |grep "\"" |grep "utf8_string" |head -n 1 |cut -d "\"" -f 2,2 `
if [ -z "${PROV}" ]
then
PROV="NULL"
else
PROV="'${PROV}'"
fi
CITY=`mmdblookup --ip ${IPLOOKUP} --file /usr/share/GeoIP/GeoLite2-City.mmdb city names en 2>/dev/null |grep "\"" |cut -d "\"" -f 2,2 |sed "s/'/\ /g"`
if [ -z "${CITY}" ]
then
CITY="NULL"
else
CITY="'${CITY}'"
fi
POSTAL=`mmdblookup --ip ${IPLOOKUP} --file /usr/share/GeoIP/GeoLite2-City.mmdb postal code 2>/dev/null |grep "\"" |cut -d "\"" -f 2,2 `
if [ -z "${POSTAL}" ]
then
POSTAL="NULL"
else
POSTAL="'${POSTAL}'"
fi
SQLDATE="'${DATE}'"
IPADDRESS="'${IPLOOKUP}'"
STATUS="'${STAT}'"
REQUEST="'${REQ}'"
echo "INSERT INTO ${DBTABLE} VALUES (NULL,$SQLDATE,$IPADDRESS,$STATUS,$CONT,$COUNTRY,$PROV,$CITY,$POSTAL,$REQUEST);" | mysql -u root --password=<PASSWORD> ${DBTABLE} >/dev/null 2>&1
done
sleep 5
# Making the Database ETL exports FOR graphs fast retrieval!!!!
QUERYDB=apachelog
TABLE=apachelog.etlapachelog
CLEANUP="DELETE FROM ${TABLE} "
echo "${CLEANUP} ;" | mysql -u root --password=<PASSWORD> ${QUERYDB}
QUERY="INSERT IGNORE INTO ${TABLE} (date, ip, country, sumip, sumipnok ) SELECT date, ip, country, count(if(status>='200' and status < 300,1,null)) AS sumip, count(if(status>='400',1,null)) AS sumipnok FROM ${QUERYDB} WHERE date_format(date, '%Y-%m-%d') <> CURDATE() AND country IS NOT NULL GROUP BY date_format(date, '%Y-%m-%d'),ip"
exit
6. Now create a index.php on your website
Copy and paste the next code into this index.php
--------------------------------------------------------------------
<?php
$con = mysqli_connect('localhost','alread','<PASSWORD>','apachelog');
$minpv = 0;
$maxpv = 999;
?>
<!DOCTYPE HTML>
<html>
<head>
<meta charset="utf-8">
<link rel="shortcut icon" href="/images/favicon.ico">
<link rel="stylesheet" href="/stylesheets/base.css">
<link rel="stylesheet" href="/stylesheets/skeleton.css">
<link rel="stylesheet" href="/stylesheets/layout.css">
<?php
$query = "SELECT COUNT(*) FROM apachelog";
$exec = mysqli_query($con,$query);
$numsel = mysqli_fetch_row($exec);
?>
<title>
Apache Users at Unix4Life
</title>
<script type="text/javascript" src="/charts/loader.js"></script>
<script type="text/javascript" src="/charts/maps/api/js?key=:293"></script>
<script type="text/javascript">
google.charts.load('current', {packages: ['geochart']});
google.charts.setOnLoadCallback(drawRegionsMap);
function drawRegionsMap() {
var data = google.visualization.arrayToDataTable([
['Country', 'Total unique users'],
<?php
if (isset($_POST['period']))
{
$period = $_POST['period'];
}
else
{
$period = 365;
}
$query = "SELECT country, COUNT( DISTINCT(ip) ) AS cip FROM etlapachelog WHERE sumip > $minpv AND date > DATE_SUB(NOW(), INTERVAL $period DAY) GROUP BY country";
$users = "SELECT COUNT( DISTINCT(ip) ) FROM etlapachelog WHERE sumip > $minpv AND date > DATE_SUB(NOW(), INTERVAL $period DAY)";
$texec = mysqli_query($con,$users);
$tusers = mysqli_fetch_row($texec);
$exec = mysqli_query($con,$query);
while($row = mysqli_fetch_array($exec)){
echo "['".$row['country']."',".$row['cip']."],";
}
?>
]);
var options = {
resolution: 'countries'
};
var chart = new google.visualization.GeoChart(document.getElementById('mapperiod'));
chart.draw(data, options);
}
</script>
<script type="text/javascript">
google.charts.load('current', {packages: ['corechart']});
google.charts.setOnLoadCallback(drawChart);
function drawChart() {
var data = google.visualization.arrayToDataTable([
['Day', 'Total users'],
<?php
$query = "SELECT date_format(date, '%Y-%m-%d') as day, COUNT(ip) AS cip FROM etlapachelog WHERE sumip > $minpv GROUP BY date_format(date, '%Y-%m-%d')";
$exec = mysqli_query($con,$query);
$numrules = 0;
while($row = mysqli_fetch_array($exec)){
$numrules = $numrules + 1;
echo "['".$row['day']."',".$row['cip']."],";
}
?>
]);
var options = {
<?php
echo "title: 'Total website unique users per day',";
?>
titleTextStyle: {
color: '#1a237e',
fontSize: 24,
bold: true
},
hAxis: {
title: 'Date (yyyy-mm-dd)',
textStyle: {
color: '#01579b',
fontSize: 12,
fontName: 'Arial',
bold: true,
italic: true
},
titleTextStyle: {
color: '#01579b',
fontSize: 20,
fontName: 'Arial',
bold: false,
italic: true
}
},
vAxis: {
title: 'Users',
textStyle: {
color: '#1a237e',
fontSize: 20,
bold: false
},
titleTextStyle: {
color: '#1a237e',
fontSize: 20,
bold: true
}
},
colors: ['#a52714']
};
var chart = new google.visualization.LineChart(document.getElementById('totalusers'));
chart.draw(data, options);
}
</script>
</head>
<body>
<div class="container">
<div class="sixteen columns">
<h1 class="remove-bottom" style="margin-top: 40px"><img src="/./images/UnixPrompt.png"> Unix4Life</h1>
<?php
$thisyear = date('Y');
echo "<pre>At this moment..(samples: $numsel[0])</pre>";
?>
<hr/>
</div>
</div><!-- container -->
<?php
echo "<H3><pre>Map of period: $period days from today - Total unique users: $tusers[0]</pre></H3>";
?>
<div id="mapperiod" style="width: 1024px; height: 768px;"></div>
<div id="totalusers" style="width: 1500px; height: 500px;"></div>
</body>
</html>
------------------------------------------------------------------------------------------------
Why use a second table etlapachelog next to the apachelog? =>Think of millions of lines per year in the apachelog and then doing a query for information per day....that will take some time ;-)
With the ETL/extraction (doing the work once a day instead of every time you look at the overview) on the raw table you get less lines to query.
By-the-way: You see some things like sumip&sumipnok being stored..you can use that for making a graph about the pages being viewed per day...