June 10th, 2009

dbcounter — quick visual database stats

titanic-2

At the moment, I am dig­ging through a cou­ple of data­bases for an upcom­ing project. I did not really find a tool to quickly get an overview over a large set of cat­e­gor­i­cal data. So I decided to roll my own and write a lit­tle node­box script that walks over a CSV file, deter­mines all the unique value attrib­utes, counts how often they occur and plots the out­put as an area chart. The tool is good for get­ting a quick overview of cat­e­gor­i­cal data, esp. miss­ing val­ues and the data diversity.

Down­load the dbcounter script includ­ing a sam­ple data set of the Titanic pas­sen­gers. (needs node­boxOS X only)

Sam­ple pdf output

On a related note, you can also use the freshly released Par­al­lel Sets appli­ca­tion by Robert Kosara to deter­mine rela­tion­ships between the attrib­utes. But that’s step 2 :)

On another related note, I can­not stress enough how awe­some python is.