June 10th, 2009

dbcounter – quick visual database stats


At the moment, I am digging through a couple of databases for an upcoming project. I did not really find a tool to quickly get an overview over a large set of categorical data. So I decided to roll my own and write a little nodebox script that walks over a CSV file, determines all the unique value attributes, counts how often they occur and plots the output as an area chart. The tool is good for getting a quick overview of categorical data, esp. missing values and the data diversity.

Download the dbcounter script including a sample data set of the Titanic passengers. (needs nodebox – OS X only)

Sample pdf output

On a related note, you can also use the freshly released Parallel Sets application by Robert Kosara to determine relationships between the attributes. But that’s step 2 :)

On another related note, I cannot stress enough how awesome python is.

Information aesthetics showcase @ siggraph

The well-formed.eigenfactor project will be at display at the Information Aesthetics Showcase, curated by Victoria Szabo, at SIGGRAPH 2009, August 3–7 in New Orleans. I will also give a little Monday morning talk on the project and am really excited to be part of this first intrusion of the information aesthetics scene into the conference on computer graphics!