D3.js World Map Real-Time Data Visualizations

Intro

So far the tutorials published on this blog were on the server-side of web development, this time let's do something that includes some client-side programming.

This is a quick introduction to data visualizations using the D3.js JavaScript library, Node.js open source runtime system and the Node Package Manager (npm).

The scenario for this tutorial is that we have a website with good traffic and want to see real-time what countries our visitors come from.

We want to draw a colored dot for each visitor on the world map at his approximate location and maybe fade out these dots after a short time so they don't fill the map.

The application will run in a virtual machine created with my favourite tool Otto by HashiCorp.

Step 1 - create Node.js development environment

1. $ mkdir d3worldmap

2. $ cd d3worldmap

3. create a 'package.json' file with content:

{
 "name": "d3worldmap",
 "version": "0.0.1",
 "description": "D3.js World Map Real-Time Visualizations",
 "author": "Mihail Juganaru <first.last@example.com>",
 "dependencies": {}
 }

This describes a clean project with no dependencies and there are no script files inside the project directory.

4. let Otto do its witchery and wait until you get logged in the virtual machine through SSH:

$ otto compile && otto dev && otto dev ssh

Don't forget to note the IP address of the VM, ex: 100.64.44.204.

5. now we should be in the VM Bash and the project shared folder created on the host OS, check this out with:

$ ls -a
 . .. .otto .ottoid package.json

Good, we see our 'package.json' here and Otto's generated file and folder.

How about we also check that we have Node.js installed (it may be a bit outdated but it does its job for this tutorial):

$ node --version
 v4.1.0

And how about npm?

$npm --version
 2.14.3

This is Otto's power, starting from just a 'package.json' file the environment was created unattended so we can begin working fast on the application.

6. one of the first packages that I like to install for Node.js applications is 'nodemon'. Nodemon is an utility that monitors for any changes in the source code and automatically restarts the Node.js server. This feature may not be of much use in this tutorial but I will show you how to install and use it in a virtual machine with this occasion.

a) install this package globally:

$ sudo npm install -g nodemon

b) find Node.js path:

$ npm config get prefix
 /opt/node-v4.1.0-linux-x64

c) add the path above concatenated with '/bin' subdirectory to $PATH environment variable:

$ export PATH=$PATH:/opt/node-v4.1.0-linux-x64/bin

d) test that nodemon is working:

$ nodemon --version
 1.9.2

e) also add nodemon path to ~/.bashrc so it works on server reload:

$ echo 'export PATH=$PATH:/opt/node-v4.1.0-linux-x64/bin' | tee -a ~/.bashrc

Step 2 - create the Node.js foundation of our project

1. install Express.js and Socket.IO packages to the Node.js application for the base of our project (if you have a Windows host you need the '--no-bin-links' parameter otherwise it will fail):

$ npm install --save --no-bin-links express socket.io

2. create a file called 'server.js' in our project directory that will take care of routing through Express.js and send random data to the web browser using Socket.IO every 500 miliseconds:

var app = require('express')();
var http = require('http').Server(app);
var io = require('socket.io')(http);

// send index.html file for all requests
app.get('/', function(req, res) {
  res.sendFile(__dirname + '/index.html');
});

// set port 3001 for our web application
http.listen(3001, function(){
  console.log('listening on *:3001');
});

// our dummy function that sends data to the web browser and calls itself every 500ms
setInterval(function() {
  var msg = Math.random();
  console.log(msg);
  io.emit('message', msg);
}, 500);

3. create an 'index.html' file where we will display the random data from our server:

<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>D3.js World Map Real-Time Visualizations</title>
<script src="/socket.io/socket.io.js"></script>
</head>
<body>
<div id="message"></div>
<script>
    var socket = io();

    socket.on('message', function(msg) {
      console.log(msg);
      document.getElementById("message").innerHTML = msg; 
    });
</script>
</body>
</html>

4. modify the 'package.json' file to setup the main script as 'server.js' and the $ npm run dev command to start our server with Nodemon installed in Step 1:

{
  "name": "d3worldmap",
  "version": "0.0.1",
  "description": "D3.js World Map Real-Time Visualizations",
  "author": "Mihail Juganaru <first.last@example.com>",
  "dependencies": {
    "express": "^4.13.4",
    "socket.io": "^1.4.6"
  },
  "main": "server.js",
  "scripts": {
    "dev": "nodemon -L --exec npm start"
  }  
}

We have to use the -L (legacy watch) parameter because the application is running in a virtual machine and otherwise Nodemon cannot see that the files in the shared folder were changed.

5. fire up the Node.js application in development mode:

$ npm run dev

> d3worldmap@0.0.1 dev /vagrant
> nodemon -L --exec npm start

[nodemon] 1.9.2
[nodemon] to restart at any time, enter `rs`
[nodemon] watching: *.*
[nodemon] starting `npm start`

> d3worldmap@0.0.1 start /vagrant
> node server.js

listening on *:3001
0.05353755597025156
0.2247560266405344
0.1977998425718397
...

We can see in console the random numbers generated from our server each 500 ms.

6. access the application from the host OS web browser on the virtual machine IP address and port 3001:

http://100.64.44.204:3001

Now we should see the same numbers from our server refreshing real-time in the web browser.

d3worldmap_foundationGreat! We have built the foundation of our client-server application. For now we can stop the Node.js application with [CTRL-C] and go to next step.

Step 3 - setting up the data source

In this step we should replace the random number generator in our Node.js application with real-time IP addresses from the Apache web server log that is usually found at '/var/log/apache2/access.log'.

Now obviously setting up Apache on our virtual machine would be pretty much useless since we cannot test properly from many different IP addresses and with a big enough server hit rate to make a proper simulation.

If you are implementing this on a live server with real data you can skip to Step 4 on your own responsibility. 😛

Below I will show you 2 ways to generate random access logs:

A) use a Ruby script

We are lucky with the fact that Otto's generated VM has Ruby already installed (fun fact: the 'Vagrantfile' generated by Otto is also Ruby 😀 ) and there is a script available on the web that generates pretty good and complete HTTP logs:

1. download the script:

$ wget https://gist.github.com/fetep/2037301/raw/f18fafd5d7dd45765192998d0dd65f61bfea05d7/genhttplogs.rb

2. run the Ruby script:

$ ruby genhttplogs.rb

Press [CTRL-C] to stop it.

3. the script above runs too fast, so let's change the last line to accept the speed as a command line parameter instead of a fixed value:

ipgen = IPGenerator.new(100, 10)
LogGenerator.new(ipgen).write_qps($stdout, ARGV[0].to_f)

(the previous line also has 2 parameters that you can tweak, session count = 100 and session length = 10, tuning these values change the repeat rate of the IP addresses)

4. run the script with a speed parameter (queries/second, lower is slower, this will output a line every half second):

$ ruby genhttplogs.rb 2

5. now run the script but write the output to an 'access.log' file in the same folder:

$ ruby genhttplogs.rb 2 >> access.log

6. and now do something 'crazy', write the output to the file and also run the 'tail' command on the same file (we'll see later why):

$ ruby genhttplogs.rb 2 >> access.log & tail -f -s 0.25 access.log

The & (amperstand) control operator (not to be confused with && which is used to run commands sequentially) means that the first command will run in background.

Stop the 'tail' command with [CTRL-C], then bring the background Ruby script to front with $ fg command and finally close the Ruby script with [CTRL-C].

Good, this was the Ruby script way to simulate an Apache 'access.log' file. Next we'll see an alternative.

B) use a Node.js script

We can write an Apache log random generator in Node.js pretty easy inspired by the Ruby script above. Since we only need the visitors IP addresses for this tutorial we can skip for now generating the other data.

1. create a new file called 'iprand.js' in our project directory with this content:

var args = process.argv.slice(2);

var speed = args[0];

function genRandNo() {
    return 1 + Math.floor(Math.random() * 255);
}

function genRandIP() {
    var no = [];
    for (var i = 0; i < 4; i++) {
        no[i] = genRandNo();
    }
    return no.join('.');
}

function writeLine() {
    console.log(genRandIP());

    // add a random factor to the interval
    setTimeout(writeLine, (Math.random() + 1) * 1000 / speed);
}

setTimeout(writeLine, 1000);

2. run the script above with 'node':

$ node iprand.js 2

This will print a list of random IPs one by one with a pause after each line between 0.5 and 1 seconds, cancel script with [CTRL-C].

3. rewrite the last Ruby command above using the Node.js script instead:

$ node iprand.js 2 >> access.log & tail -f -s 0.25 access.log
[1] 2562
251.83.169.39
15.135.82.205
12.138.109.95
..

Don't forget to close the 'node' process after you close 'tail' with [CTRL-C]. You can do it as above bringing it to front with $ fg and then [CTRL-C] or if you have noticed the process id (PID) on the first line you can do $ kill PID.

If you want to find all process ids for running Node.js applications run:

$ ps -ef | grep "[n]ode".

I've promised [myself] to do more client-side programming in this tutorial and I got lost again in Bash... 😀

Step 4 - rewrite our Node.js server to use the data source instead of generating random numbers

Now that you've seen how to setup the data source, a simulated Apache 'access.log' file, let's see how to read this in Node.js and send the results to the host web browser.

1. install Node.js 'tail' dependency:

$ npm install --save --no-bin-links tail

2. edit 'server.js' so it doesn't output to console, comment line:

// console.log(msg);

3. delete 'access.log' for a clean start and run 'iprand.js' script with a slower IP generation rate in background:
$ rm access.log && node iprand.js 0.25 >> access.log &
[1] 3037

4. start the Node.js application with Nodemon:

$ npm run dev

5. edit 'server.js' script:

var app = require('express')();
var http = require('http').Server(app);
var io = require('socket.io')(http);
var Tail = require('tail').Tail;

// send index.html file for all requests
app.get('/', function(req, res) { 
  res.sendFile(__dirname + '/index.html');
});

// set port 3001 for our web application
http.listen(3001, function(){
  console.log('listening on *:3001');
});

tail = new Tail("./access.log");

// regular expression to parse the access.log line
var regExp = /^(\S+) (\S+) (\S+) \[(\d{2}\/\w{3}\/\d{4}:\d{2}:\d{2}:\d{2} [\+|-]\d{4})\] \"(\S+ .*? \S+)\" (\d{3}) ([\d|-]+) "([^"]*)" "([^"]*)"/;

tail.on("line", function(line) {
    var matches = line.match(regExp);
    
    if (matches !== null) { // Ruby generated access.log    - extract IP using the regular expression
        ip = matches[1];
    } else {                 // Node.js generated access.log    - just the IP address on each line
        ip = line;
    }
    
    console.log(ip);
    io.emit('message', ip);  
});

tail.on("error", function(error) {
    console.log('ERROR: ', error);
});

6. save the script and check the console, the application will restart through Nodemon and should print line by line the IP addresses generated by the 'iprand.js' and written in the 'access.log' file:

[nodemon] restarting due to changes...
[nodemon] starting `npm start`

> d3worldmap@0.0.1 start /vagrant
> node server.js

Tail starting:
filename: ./access.log
options: {}
listening on *:3001
34.103.225.137
187.30.11.168
7.212.216.246
...

7. check in host web browser that it receives the correct data - random IP addresses

http://100.64.44.204:3001/

d3worldmap_ips
8. close the Node.js application with [CTRL-C], close the 'iprand.js' script with $ fg, [CTRL-C] or $ kill 3037 if you've been paying attention to Step 4.3.

Step 5 - get geolocation data in Node.js from the IP addresses

For this we will use the free GeoLite2 City database.

1. download the binary file to the project folder:

$ wget http://geolite.maxmind.com/download/geoip/database/GeoLite2-City.mmdb.gz

2. unpack database:

$ gzip -d GeoLite2-City.mmdb.gz

3. install Node.js 'maxmind-db-reader' package for reading MaxMind DB files:

$ npm install --save --no-bin-links node-maxmind-db

4. update the 'server.js' script file to send geolocation data instead of the IP address:

var app = require('express')();
var http = require('http').Server(app);
var io = require('socket.io')(http);
var Tail = require('tail').Tail;
var mmdbreader = require('maxmind-db-reader');

// open database
var cities = mmdbreader.openSync('./GeoLite2-City.mmdb');

// send index.html file for all requests
app.get('/', function(req, res) { 
  res.sendFile(__dirname + '/index.html');
});

// set port 3001 for our web application
http.listen(3001, function(){
  console.log('listening on *:3001');
});

tail = new Tail("./access.log");

tail.on("line", function(line) {
    var matches = line.match(regExp);
    
    if (matches !== null) { // Ruby generated access.log    - using the regular expression
        ip = matches[1];
        console.log(ip);
    } else {                 // Node.js generated access.log    - just the IP address
        ip = line;
        console.log(ip);
    }
    
    cities.getGeoData(ip, function(err, geodata) {
        if (!err && (geodata !== null) && (geodata.location !== null) && (geodata.location.latitude !== null) && (geodata.location.longitude !== null)) {
            console.log(geodata.location.latitude + ', ' + geodata.location.longitude);
            
            io.emit('message', geodata.location);
        }
    });
});

tail.on("error", function(error) {
    console.log('ERROR: ', error);
});

5. edit the 'index.html' file of your project to write geolocation data:

document.getElementById("message").innerHTML = msg.latitude + ', ' + msg.longitude;

6. fire up the Node.js scripts again, this time with a faster rate:

$ rm access.log && node iprand.js 4 >> access.log & npm run dev

7. refresh the application on the host OS web browser

http://100.64.44.204:3001/

d3worldmap_geolocation
That's all? YES! We now have a working Node.js server that sends location data based on IP addresses taken real-time from 'access.log'. In a real-world application you might want to check at least for response codes and ignore Googlebot, etc...

Step 6 - front-end D3.js data visualizations

1. add 'world-110m2.json' file to project, this contains the JSON data that will be used by D3.js to draw the world map

2. edit the 'index.html' file from your project, read the comments to understand what I've done!

<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>D3.js World Map Real-Time Visualizations</title>
<script src="/socket.io/socket.io.js"></script>
<script src="http://d3js.org/d3.v3.min.js" charset="utf-8"></script>
<script src="http://d3js.org/topojson.v1.min.js"></script>
<style>
path {
  stroke: white;
  stroke-width: 0.25px;
  fill: grey;
}

circle {
  stroke: white;
  stroke-width: 0.5px;
}
</style>
</head>
<body>
<script>
    // canvas size
    var width = 960,
        height = 500;

    var fadeout = false; // enable/disable dots fadeout
    
    // create Mercator projection
    var projection = d3.geo.mercator()
        .center([0, 5 ])
        .scale(115);
    
    // create svg HTML element
    var svg = d3.select("body").append("svg")
        .attr("width", width)
        .attr("height", height);

    // the path element to draw countries
    var path = d3.geo.path().projection(projection);

    // the group container
    var g = svg.append("g");
    
    // draw all countries shapes read from JSON
    d3.json("world-110m2.json", function(error, topology) {
        g.selectAll("path")
        .data(topojson.feature(topology, topology.objects.countries).features)
        .enter()
        .append("path")
        .attr("d", path);
    });

    // create the Socket.IO client
    var socket = io();

    socket.on('message', function(msg) {
        console.log(msg);
  
        // create a random id for each dot
        var rnd = parseInt(Math.random() * Number.MAX_SAFE_INTEGER);

        // append dot (circle shape) to map using the Mercator projection
        g.append("circle")
        .attr("cx", function(d) {
            return projection([msg.longitude, msg.latitude])[0];
        })
        .attr("cy", function(d) {
            return projection([msg.longitude, msg.latitude])[1];
        })
        .attr("r", 10) // circle radius
        .attr("id", "dot" + rnd)
        .style("fill", function() {
            return "hsl(" + Math.random() * 360 + ",100%,50%)"; // fill with random color
        });
    
        console.log('plot ' + rnd);
    
        // fade out dot if variable value is true
        if (fadeout) {
            setTimeout(function() {
                g.selectAll("circle#dot" + rnd)
                .style("opacity", 1)
                .transition().duration(500).style("opacity", 0);
                
                console.log('fadeout ' + rnd);
            }, 1000);
        }      
    });
</script>
</body>
</html>

3. edit 'server.js' to route the 'world-110m2.json' file:

... skip ...
// open database
var cities = mmdbreader.openSync('./GeoLite2-City.mmdb');

// send world map json file
app.get('/world-110m2.json', function(req, res) {
  res.sendFile(__dirname + '/world-110m2.json');
});
 ... skip ...

4. refresh the application at

http://100.64.44.204:3001/

You should see some pretty colored dots that appear in random locations on the world map.
d3worldmap_dots

Outro

Play with the Ruby and Node.js scripts, change the IP generation rate, change the 'fadeout' variable in 'index.html' to 'true' and make the dots disappear!

Don't forget to close the 'access.log' generation script after you close the Node.js application in your virtual machine or it can grow pretty big.

I hope that you will have the opportunity to use the application on a real Apache 'access.log' file. It can also be adapted very easy to show the traffic of a Node.js web application! 😉

¡Hasta luego!

Links:

https://nodejs.org/

https://www.npmjs.com/

https://d3js.org/

http://nodemon.io/

http://expressjs.com/

http://socket.io/

https://books.google.ro/books?id=6SizCQAAQBAJ

http://dev.maxmind.com/geoip/geoip2/geolite2/

http://www.d3noob.org/2013/03/a-simple-d3js-map-explained.html

https://github.com/mbostock/topojson/releases/v1.0.0

Leave a Reply

Your email address will not be published. Required fields are marked *