February 19th, 2007

Emerging topics v2

I am currently working on trends in individual tagging behaviour. You might have seen a first, animated version of my studies based on tag maps. The original animation shows the emergence of previously rarely used tags over time. Now I dug deeper and made a richer visualization for investigating this topic.

For the impatient:
» Check out the interactive version here

And here’s the explanation:

It has been shown before (for a plausibility argument, also check out the marvellous cloudalicious tool, where you can track tag proportions for any website on the web), that tag proportions for ressources stabilize over time. Which means that the tag cloud representing a tag profile for a resource does not change much, once a sufficient number of tags has been collected. In a folksonomy, this is generally considered a good sign, since this indicates a certain agreement on how to judge a certain ressource and what vocabulary to use.

For tagging individuals, and communities, this might — at first glance — hold true as well. Consider the following the visualization of a tagging community‘s evolution, for example:

picture-8_480×250shkl.png

Each tag is assigned a band, with the thickness indicating the overall summed usage of a tag over time (time runs left to right). Thus, a vertical cut through the graph corresponds to taking a tag cloud snapshot at this time point. The vertical order is based on the overall frequency of the tags. The color is used to to give an impression of the long tail distortion – if all tags would appear equally often, you would see a linear transition from red to green instead of the skewed distribution.
So – what do we see? Apparently, most of the bands seem to grow in parallel, indicating a stable growth proportion for all tags. Of course, we cannot see much for the smaller tags, and there are some edgy parts of the graph which might indicate different behavior at specific time points, but overall – pretty stable impression.

However, this does not make much sense. For individuals and communities, the topics of interest evolve over time, so there must be some hidden variability not captured by the visualization and the underlying linear model.

So I decided to provide an alternative visualization for the data based on a decay model, where tags “age” over time and finally get “forgotten” if they are not used anymore. This idea is loosely based on the Yules-Simon memory model for tag generation presented in this paper.

picture-7_480×266shkl.png
A radically different picture emerges. Not only does the overall shape now nicely display phases of community activity over time, but also the life cycle of single tags is much more transparent. You can rollover single layers highlight it and display the corresponding tag name. Great fun.

» Check out the interactive version here

What I am now curious about:
– Is there a correlation between time-dependency and over-all frequency of tags? In other words, are frequent tags more evely distributed over time, whilst the low frequency tags tend to be more variable over time?
– Is there a correlation between temporal synchronization and general co-occurrence? Which means – do related tags also appear and disappear together over time?

I think the answer is YES to both questions, but that would definitely need some statistical analysis (any bored neuroscientists around to help me? ;)

To-dos for the visualization:
– Implement a slider, so you can see how a linear and decayed tag cloud would have looked like at a specific time point.
Stamen got it right: Maybe I should have plotted from the vertical center. Or at least provide an optional inversion of the sorting. Because right now, all the top (green) layers are really distorted, making visual analysis really hard.
– Put some numbers on the axis
– Show single tagging events on rollover. Or even “unfold” the layer to improve readability and avoid misconceptions.

2 Responses to 'Emerging topics v2'

Subscribe to comments with RSS

  1. bastian
    February 19th, 2007 at 12:47 pm

    hübsche sache. verschlingt allerdings mal spontane 0.5G RAM, das ressourcen-hungrige ding ;)
    planst du denn auch eine nicht-flash-variante, oder bleibt das eher ein browser-ding ?

  2. Bruce Mason
    March 4th, 2007 at 7:41 pm

    This is a fascinating piece of work. I’m a qualitative social scientist so bits of it are beyond me but it seems like a really useful step into analysing tagging behaviour.

    Bruce