February 19th, 2007

Emerging topics v2

I am cur­rently work­ing on trends in indi­vid­ual tag­ging behav­iour. You might have seen a first, ani­mated ver­sion of my stud­ies based on tag maps. The orig­i­nal ani­ma­tion shows the emer­gence of pre­vi­ously rarely used tags over time. Now I dug deeper and made a richer visu­al­iza­tion for inves­ti­gat­ing this topic.

For the impa­tient: » Check out the inter­ac­tive ver­sion here

And here’s the explanation:

It has been shown before (for a plau­si­bil­ity argu­ment, also check out the mar­vel­lous clouda­li­cious tool, where you can track tag pro­por­tions for any web­site on the web), that tag pro­por­tions for ressources sta­bi­lize over time. Which means that the tag cloud rep­re­sent­ing a tag pro­file for a resource does not change much, once a suf­fi­cient num­ber of tags has been col­lected. In a folk­son­omy, this is gen­er­ally con­sid­ered a good sign, since this indi­cates a cer­tain agree­ment on how to judge a cer­tain ressource and what vocab­u­lary to use.

For tag­ging indi­vid­u­als, and com­mu­ni­ties, this might — at first glance — hold true as well. Con­sider the fol­low­ing the visu­al­iza­tion of a tag­ging com­mu­nity’s evo­lu­tion, for example:

picture-8_480×250shkl.png

Each tag is assigned a band, with the thick­ness indi­cat­ing the over­all summed usage of a tag over time (time runs left to right). Thus, a ver­ti­cal cut through the graph cor­re­sponds to tak­ing a tag cloud snap­shot at this time point. The ver­ti­cal order is based on the over­all fre­quency of the tags. The color is used to to give an impres­sion of the long tail dis­tor­tion — if all tags would appear equally often, you would see a lin­ear tran­si­tion from red to green instead of the skewed dis­tri­b­u­tion. So — what do we see? Appar­ently, most of the bands seem to grow in par­al­lel, indi­cat­ing a sta­ble growth pro­por­tion for all tags. Of course, we can­not see much for the smaller tags, and there are some edgy parts of the graph which might indi­cate dif­fer­ent behav­ior at spe­cific time points, but over­all — pretty sta­ble impression.

How­ever, this does not make much sense. For indi­vid­u­als and com­mu­ni­ties, the top­ics of inter­est evolve over time, so there must be some hid­den vari­abil­ity not cap­tured by the visu­al­iza­tion and the under­ly­ing lin­ear model.

So I decided to pro­vide an alter­na­tive visu­al­iza­tion for the data based on a decay model, where tags “age” over time and finally get “for­got­ten” if they are not used any­more. This idea is loosely based on the Yules-Simon mem­ory model for tag gen­er­a­tion pre­sented in this paper.

picture-7_480×266shkl.png A rad­i­cally dif­fer­ent pic­ture emerges. Not only does the over­all shape now nicely dis­play phases of com­mu­nity activ­ity over time, but also the life cycle of sin­gle tags is much more trans­par­ent. You can rollover sin­gle lay­ers high­light it and dis­play the cor­re­spond­ing tag name. Great fun.

» Check out the inter­ac­tive ver­sion here

What I am now curi­ous about: – Is there a cor­re­la­tion between time-dependency and over-all fre­quency of tags? In other words, are fre­quent tags more evely dis­trib­uted over time, whilst the low fre­quency tags tend to be more vari­able over time? – Is there a cor­re­la­tion between tem­po­ral syn­chro­niza­tion and gen­eral co-occurrence? Which means — do related tags also appear and dis­ap­pear together over time?

I think the answer is YES to both ques­tions, but that would def­i­nitely need some sta­tis­ti­cal analy­sis (any bored neu­ro­sci­en­tists around to help me? ;)

To-dos for the visu­al­iza­tion: – Imple­ment a slider, so you can see how a lin­ear and decayed tag cloud would have looked like at a spe­cific time point. – Sta­men got it right: Maybe I should have plot­ted from the ver­ti­cal cen­ter. Or at least pro­vide an optional inver­sion of the sort­ing. Because right now, all the top (green) lay­ers are really dis­torted, mak­ing visual analy­sis really hard. – Put some num­bers on the axis – Show sin­gle tag­ging events on rollover. Or even “unfold” the layer to improve read­abil­ity and avoid misconceptions.

December 10th, 2006

Emerging topics

picture-8_480x336shkl.png You might have seen the tag clouds posted below. I cal­cu­late tag posi­tions based on co-occurrence, such that tags used together are placed closer to each other. Addi­tion­ally, tags are scaled áccord­ing to fre­quency.
A gen­eral prob­lem I have with the result­ing rep­re­sen­ta­tion (and com­mon tag clouds as well) is the fact, that every tag occur­rence is weighted equally. As a result, these tag clouds never rep­re­sent the cur­rent state of inter­est, but a very slug­gishly chang­ing sum­mary of your archive. How­ever, your inter­ests and the cor­re­spond­ing vocab­u­lary keeps mov­ing on. So I am cur­rently inves­ti­gat­ing trends in tag clouds and how groups of related tags emerge and dis­ap­pear again.

A first glimpse into the dynam­i­cal nature of tag structures.