Together with Lutz Bornmann, Rüdiger Mutz and Felix de Moya Anegon, I have been looking into which institutions (universities or research-focused institutions) are most active in different subject areas of science and which have published the most excellent papers. Based on my colleagues data analysis, we produced a small web application which allows to browse and explore the data set. The application is password protected, so you will need to end an email to password-request at excellencemapping.net to request access.
As you know, I was never too fond of awards — until I won two of them in one night :)
We just finished the documentation for emoto – a data art project visualising the online response to the Olympics London 2012.
In many ways, the crowning piece of the project, and a conceptual counterpoint to the ephemeral web activities, our data sculpture preserved the more than 12 million tweets we collected in physical form. We had 17 plates CNC-milled — one for each day of the games — with a relief heatmap indicating the emotional highs and lows of each day. Overlay projections highlighted individual stories, and visitors could scroll through the most retweeted tweets per hour for each story using a control knob.
The tweets and topics displayed in the installation can also be investigated in interactive heatmaps. Rollover the rows to see a tooltip display of the most retweeted tweet on the given topic at the respective point in time.
Find a brief documentation at moritz.stefaner.eu/projects/emoto/
or read more on the project here:
Article and interview on Creators Project
Data Stories podcast episode #11 with Stephan Thiel on emoto
In two current projects, we use images as a datastore in parts of our processing timeline, and that proved quite handy, so I thought, I would briefly share the technique with you.
In the emoto project, we use 2D matrices to store how many tweets (brightness) fall in which sentiment category (vertical) over time (horizontal):
This is not exciting per se, but the trick here is that we use this as an elevation map for the 3D models we produce for the data sculpture. So the images are only a “messenger” between two processing steps — data analysis and the 3D modelling tool. Yet, in this form, it is much more easy to detect gaps, and get a glimpse of the data structure immediately. Also, think about it this way — if your database is an image, you can apply image transformation techniques to modify your data! (Think enhance contrast, minimum/maximum, slicing, blurring,…) What can be very difficult numeric operations if only working with numbers, can be very simple operations in Photoshop, and, again, the result is immediately inspectable. The catch is, when working with grey scale, you have only 256 steps available — but in our case, that was enough.
The second trick is to use color as a lookup hash in a 2D matrix. For instance, you might want to check in which country a certain point on earth is. You do have a list of polygons for each country, but how inefficient, error prone and tedious it is to loop through all of them, and calculate a hit test with a polygon… Also, how do you calculate a hit test with a polygon, anyways?
Now here is an incredibly simple way to do it: Pick a unique color for each country. Go through all the polygons of a country and draw them on a simple map mapping lat and long to x and y coordinates in the desired precision.
Now, for any point on earth, you just need to look up the color of the pixel belonging to its map coordinate, and — there you have the code of the corresponding country. Very handy! Again, all the difficult data processing has been taken care of by the image processing algorithm..
So, next time you have a tricky data transformation issue to solve — maybe image processing can be part of the solution! I am sure there are many more tricks along these lines to discover.
( + Thanks to Stephan and Steffen from Studio NAND for developing these workflows with me!)
A true mamooth project has finally launched: emoto.
Together with a huge team around Drew Hemment and Studio NAND, and a partnership with MIT Senseable City Lab, we aim at visualising the online reponse to the Olympic Games for the London 2012 Festival and Cultural Olympiad in the Northwest.
Basically, the idea is to track Twitter messages for content (which topics, disciplines, athletes etc they refer) to and emotional tone (are they cheering, swearing, being indifferent) and make that info available real-time on http://emoto2012.org, as a supplement or even alternative to traditional ways of consuming the Games coverage.
Our goal is to reveal both the big picture as well as the little anecdotes that make up the big, big stream of messages.
After the games, we will turn the collected tweets into an actual physical object, to archive these ephemeral little “things flying by” forever.
And during the games, we are posting insights and in-depth analyses (here is a first post on the Opening Ceremony), so there is also a little data journalistic angle to the whole package.
I have to say, this is probably one of the most ambitious projects I have worked on this far, and despite some small rocks encountered along the way, I am really happy how it turned out.. I hope you like it, too!
I am happy to announce my most out-there infovis related activity this year: The open data cooking workshop. Organized together with Prozessagenten and Miska Knapek, we will invite 15 participants to explore the data-expressive qualities of food together. Our idea is to cook food with local ingredients that represents local (open) data about the region where the workshop is. If you think about it, there are some many ways food can be used to express data: 2d painting/drawing, 3d sculpture, taste dimensions, texture, all the cultural connotations (potato vs. caviar), preparation processes and variables (e.g. automated oven temperature regulation), presentation, … The possibilities are endless!
Together with Stephan Thiel (who did all the heavy lifting) from NAND.io, I am happy to present a small new visualization: Tyne, a visualization of the sensor data generated by ~flowmill, a tide mill floating on the river Tyne, in Newcastle.
Stephan has already a great write-up on the nand.io site, which I recommend reading first, so here are a few comments beyond this project description:
What I find quite interesting about the project is the use of simulation as visualization. Although we used little image thumbnails as icons for each visualization, the actual visualization is in fact a particle simulation which is seeded with the five sensor values measured at a given point in time. Four of these are used as physics parameter – expanding the stream for values greater than the mean, contracting for values below. The fifth parameter — wheel speed — is directly related to the water speed and is thus guiding the particle speed. This experiential, process-based, anecdotal, slowly unravelling form of visualization, evoking thoughts of water and wood at the same time, reflects our experience of this highly unusual project after visiting the flowmill ourselves. It became clear that the precise values of the sensors themselves are only side actors in a larger, association-rich and quite unusual system, which we wanted to reflect in our work. Also, the anecdotal nature of the measurements (only every half hour, with quite varying results) called for a treatment of the values beyond a simple line or area chart, so we decided to represent each “data anecdote” in a like-wise closed, single anecdotal visualization, representing the situation at a given, but ultimately arbitrary point in time. Also, the imprecision in visual translation did not happen without thought, in fact, an overly precise display of the values would have, ironically, resulted in “lying about the data”, given the imprecise nature of the system generating the values. (Compute that, line chart afficionados.)
Scaling and transforming real-time sensor data in a robust manner is always tricky.To get the data into a form that it was handleable and also allowed comparisons across the very different scales in the different variables (with values ranging from fractions of zero to hundreds), we employed a z-score scaling, which centers the data around zero (i.e. “usual” data points lie around zero), and also scales the data such that around two thirds of the data lie between –1 and 1. This helps both in using the values in a predictable way, but also allowing quick identification of high, low, or average values, without having to learn different scales across variables — in the end, who knows, if a salinity value of 238 is high or low for a North English river.
Finally, I want to share one dismissed approach which was hard to control, and aesthetically not exactly the thing we were after, but quite interesting nevertheless:
Based on Kyle McDonald’s code, which in turn was based on John McCabe’s explorations of multi-scale Turing patterns, we toyed with the idea of trying to force the algorithm to produce blobs of specific sizes by “injecting” black pixels while the algorithm was running. Also, we muted all but the bottom and a few top layers, resulting in a bigger difference between micro– and macro-structure. In the end, the computations turned out to heavy to be run directly in the browser, and the code a wee bit too unpredictable, so we went with a more controllable and visually more fitting approach. Right now, the code is not quite ready for sharing, but I can offer to clean up and upload the code, in case anyone has a strong interest.
I put a little new project online, analyzing which ingredients were ordered together most often in custom muesli mixtures from mymuesli.com customers. Besides chocolate, nuts and oat, the project description also features a little excursion on matrix views of networks, and conditional probabilities.