June 10th, 2009

dbcounter — quick visual database stats

titanic-2

At the moment, I am dig­ging through a cou­ple of data­bases for an upcom­ing project. I did not really find a tool to quickly get an overview over a large set of cat­e­gor­i­cal data. So I decided to roll my own and write a lit­tle node­box script that walks over a CSV file, deter­mines all the unique value attrib­utes, counts how often they occur and plots the out­put as an area chart. The tool is good for get­ting a quick overview of cat­e­gor­i­cal data, esp. miss­ing val­ues and the data diversity.

Down­load the dbcounter script includ­ing a sam­ple data set of the Titanic pas­sen­gers. (needs node­boxOS X only)

Sam­ple pdf output

On a related note, you can also use the freshly released Par­al­lel Sets appli­ca­tion by Robert Kosara to deter­mine rela­tion­ships between the attrib­utes. But that’s step 2 :)

On another related note, I can­not stress enough how awe­some python is.

2 Responses to 'dbcounter — quick visual database stats'

Subscribe to comments with RSS or TrackBack to 'dbcounter — quick visual database stats'.

  1. […] dbcounter – quick visual data­base stats I’m putting this info in my things to remem­ber pile. […]

  2. francesco
    September 14th, 2009 at 9:01 am

    An alter­na­tive to Node­box exists for Win­dows and Linux users, it is called shoe­bot and is com­pat­i­ble with Node­box API:

    http://tinkerhouse.net/shoebot/

Leave a Reply