VisualBlackBox: Example of real time push driven data visualization

by paulmcode

Apologies for the long delay in posts; I have been busy learning about Big Data/Hadoop/Spark/etc infrastructure lately… which leads me into a perfect introduction for this post.  Data scientists who can create compelling big data visualizations are in extreme demand from my experience (not that I consider myself particularly capable in either of those areas at this point, just getting started).  Being the curious programmer that I am, I wanted to delve into this area a little more on my own.  Knowing that I am a very goal driven person, I created a little project to basically train myself on the new data visualization technologies and as a bonus I get to help others who are interested as well.  So along came…

Visual Black Box – https://github.com/PaulMontgomery/VisualBlackBox

VBB is an open source/Apache 2 licensed, push driven, real time, HTML5 data visualization tool (like line graphs, bar charts, heat maps, US election result maps, etc).  VBB can logically multicast a streaming SVG to all web browser listeners.  This easily integrates the data visualization widget into the web page/dashboard look and feel as it is not sending a whole web page.  The relatively new technologies that I wanted to learn/use were:

  • d3.js – Trust me, just click that link and gaze in awe at the amazing data visualizations.  This is what really opened my eyes about the current state of web browser visualization abilities.
    • Which includes SVG (ok, not very new but still cool and underutilized until recently in my opinion)
    • I also wanted to brush up some on my Javascript
  • Node.js – I’ve used many web frameworks and the simplicity and ease of use of this project pulls me back every time.  While it isn’t good for “long” running server/compute-intensive tasks (which hold up the incoming connections potentially), I would argue that much of that deficiency can be offset through use of asynchronous communications (socket.io for example) and AMQP+compute farms.

The idea behind VBB is to take the complexity out of data visualization.  Imagine the skills needed to create scalable big data visualizations:

  • Data Scientist – To understand the structure and meaning of the data results (not to mention creating them to begin with)
  • Web Administrator – Host the data visualization web site, create a scalable infrastructure
  • Client Side/Web Developer – Create the web pages, data visualizations and all client side content
    • Note that the HTML developer and the data visualization (like d3.js) developer may often be different people as the skill sets are fairly different and yet ironically somewhat related.
  • Server Side Developer/Operations – Create the infrastructure to ingest the data, transform the data (optional), send the data to a centralized data repository, set up a monitoring system and if they are really good… they’ll do all of this securely.

I won’t claim that VBB can replace all of those roles but it has the potential to reduce them considerably with a full set of “GUI widgets” (I use that term for individual GUI/SVG visualizations; like 1 line graph or 1 bar graph).  VBB can read a TSV file (simple file format, supported by Excel, etc), broadcast a message to all web browsers viewing the data visualization that an update is ready, push the data updates to all browsers and all browsers update their real time graphs.  So if you can write a TSV file, you can make a real time data visualization. I tried to add lots of helpful comments in the code to make it easy to learn.  It is also structured to make adding new GUI widgets very simple and self contained via JSON configuration files.

Along the way I found several similar technologies and ideas which are obviously much more polished/developed but may not be quite as easy to learn as my examples.  So if you want to learn how to create your own real time visualizations, you might consider giving VisualBlackBox a try as learning guide.  If you just want to display some data quickly, consider using some of the tools listed below.  Disclaimer: I spent very little time examining these technologies in detail so please take my opinions very lightly and research for yourself.  At any rate, I wanted to share some thoughts on what I studied and found:

  • Google Charts – Really easy to use, has “Dynamic Data” real time capabilities, nice abstraction layers, supports most basic chart requirements.  If you don’t have super complex needs, this is a pretty good choice.  I’m really impressed with many of Google’s open source projects.
  • Chart.js – Super configurable charts, fairly good range of complex charts.
  • d3.js – This is the ultimate low level, infinitely capable data visualization framework.  There are practically no limits to what can be done with this amazing library.  Having said that, I would give a few minor warnings: this library is very unforgiving to debug (few/no warnings or help; it let’s you fail relentlessly and sometimes quietly) and it has a high learning curve (with power comes complexity and a background in Javascript, HTML5, SVG, CSS and data theory is required to get started).
  • Snap.js – This is a low level library, similar to d3.js, but is geared more towards general SVG GUIs than data visualization (which it would easily handle).  I just wanted to give this one special mention as a really amazing generic SVG library for those interested.