June 10th, 2009

dbcounter – quick visual database stats

titanic-2

At the moment, I am digging through a couple of databases for an upcoming project. I did not really find a tool to quickly get an overview over a large set of categorical data. So I decided to roll my own and write a little nodebox script that walks over a CSV file, determines all the unique value attributes, counts how often they occur and plots the output as an area chart. The tool is good for getting a quick overview of categorical data, esp. missing values and the data diversity.

Download the dbcounter script including a sample data set of the Titanic passengers.
(needs nodebox – OS X only)

Sample pdf output

On a related note, you can also use the freshly released Parallel Sets application by Robert Kosara to determine relationships between the attributes. But that’s step 2 :)

On another related note, I cannot stress enough how awesome python is.

4 Responses to 'dbcounter – quick visual database stats'

Subscribe to comments with RSS or TrackBack to 'dbcounter – quick visual database stats'.

  1. moritz_stefaner
    June 10th, 2009 at 7:34 pm

    published a new post: dbcounter – quick visual database stats http://is.gd/XFTJ

    This comment was originally posted on Twitter

  2. kurren
    June 10th, 2009 at 9:04 pm

    dbcounter – quick visual database stats http://bit.ly/18GmdF #Stats #Python

    This comment was originally posted on Twitter

  3. [...] dbcounter – quick visual database stats I’m putting this info in my things to remember pile. [...]

  4. francesco
    September 14th, 2009 at 9:01 am

    An alternative to Nodebox exists for Windows and Linux users, it is called shoebot and is compatible with Nodebox API:

    http://tinkerhouse.net/shoebot/

Leave a Reply