… visualizing your tags :)
The “Tagging” book by Gene Smith is out. I am still awaiting a paper copy, but had a chance to look at the online version already. It looks really comprehensive, concise and covering all important tagging concepts. Which is not easy for such a moving target topic.
Besides, my elastic tag maps visualization is featured on p.102f. Nice!
My thesis is due pretty soon, so I am currently writing A LOT and make some on–the–go beautifications to my experiments.
First one is the emerging topics histogram. I followed my own advice and vertically centered the stacked histogram. Additionally, I never liked those sharp edges, so now I do not only “fade out” tags visually, but also fade them in, resulting in a much more organic picture, and largely improved readability of the chart. New color scheme: old tags are cold, freshly introduced ones in warm color. This is all very much inspired by the fabulous last.fm charts by Lee Byron — thanks!
and some pix:
Another visual experiment on tagging: How do individuals use tags — compared to the community? Do you use private language or are you a mainstream tagger? When you tag with “design” — do the others think it is “art”?
On the left: the personal tags for the personal bookmarks — ordered by frequency — the containing box per is log-scaled so you get an impression of the long–tail position of a tag. Which means: Often used tags are large, bright and go to the top.
On the right: community tags for the tagged ressources.
In the middle: tags with the same name are connected. If a line is horizontal, the indvidual and the community essentially agree on the relevance of the tag for the ressources. The steeper it is — the larger the disagreement. If no line starts at a tag, it means it is not present in the other list
So in the picture above, you can see my tags. Same facts you can read from the picture: • “m.a.thesis” is a very often used, but private tag of mine. • The ressources I tag in general are mostly tagged with “design” by the community. I, however, use the tag “design” much less often. • “news”, “semantic web”, “webdev” are tags I use often, but not the community. etc.
And often course, you can click individual tags to see what the comparison is like for subsets of the bookmarks. That’s especially interesting for obscure tags like “guru” — you can see what the tagger “means” by looking at the distribution of the community bookmarks (in this case “design — art — programmer — artist”). Interesting!
Some more shots:
I wish I could say “click here for the interactive version” as usual — but unfortunately, del.icio.us offers a JSON API, but did not put a crossdomain.xml file on their server. Which means the visualization (which runs nicely on my harddisc) cannot load data when put in the web. Bummer. I hope I can figure something out.
So for now — I can only offer a download link. Click the index.html. You might have to adjust you Flash player security settings in order to load the community tags. Caveat: The application is still a bit buggy and pretty heavy concerning processor ressources.
I am currently working on trends in individual tagging behaviour. You might have seen a first, animated version of my studies based on tag maps. The original animation shows the emergence of previously rarely used tags over time. Now I dug deeper and made a richer visualization for investigating this topic.
For the impatient: » Check out the interactive version here
And here’s the explanation:
It has been shown before (for a plausibility argument, also check out the marvellous cloudalicious tool, where you can track tag proportions for any website on the web), that tag proportions for ressources stabilize over time. Which means that the tag cloud representing a tag profile for a resource does not change much, once a sufficient number of tags has been collected. In a folksonomy, this is generally considered a good sign, since this indicates a certain agreement on how to judge a certain ressource and what vocabulary to use.
For tagging individuals, and communities, this might — at first glance — hold true as well. Consider the following the visualization of a tagging community’s evolution, for example:
Each tag is assigned a band, with the thickness indicating the overall summed usage of a tag over time (time runs left to right). Thus, a vertical cut through the graph corresponds to taking a tag cloud snapshot at this time point. The vertical order is based on the overall frequency of the tags. The color is used to to give an impression of the long tail distortion — if all tags would appear equally often, you would see a linear transition from red to green instead of the skewed distribution. So — what do we see? Apparently, most of the bands seem to grow in parallel, indicating a stable growth proportion for all tags. Of course, we cannot see much for the smaller tags, and there are some edgy parts of the graph which might indicate different behavior at specific time points, but overall — pretty stable impression.
However, this does not make much sense. For individuals and communities, the topics of interest evolve over time, so there must be some hidden variability not captured by the visualization and the underlying linear model.
So I decided to provide an alternative visualization for the data based on a decay model, where tags “age” over time and finally get “forgotten” if they are not used anymore. This idea is loosely based on the Yules-Simon memory model for tag generation presented in this paper.
A radically different picture emerges. Not only does the overall shape now nicely display phases of community activity over time, but also the life cycle of single tags is much more transparent. You can rollover single layers highlight it and display the corresponding tag name. Great fun.
What I am now curious about: – Is there a correlation between time-dependency and over-all frequency of tags? In other words, are frequent tags more evely distributed over time, whilst the low frequency tags tend to be more variable over time? – Is there a correlation between temporal synchronization and general co-occurrence? Which means — do related tags also appear and disappear together over time?
I think the answer is YES to both questions, but that would definitely need some statistical analysis (any bored neuroscientists around to help me? ;)
To-dos for the visualization: – Implement a slider, so you can see how a linear and decayed tag cloud would have looked like at a specific time point. – Stamen got it right: Maybe I should have plotted from the vertical center. Or at least provide an optional inversion of the sorting. Because right now, all the top (green) layers are really distorted, making visual analysis really hard. – Put some numbers on the axis – Show single tagging events on rollover. Or even “unfold” the layer to improve readability and avoid misconceptions.
A very nice paper on the “laissez-faire librarianship” often associated with tagging vs. more structured semantic web approaches. Most notable is that the discussion is put in the context of Husserl’s theory of reflections, intentionality and intersubjectivity.
D. Grant Campbell
London, Ontario N6A 5B7, Canada
Abstract This paper uses Husserl’s theory of phenomenology to provide a model for the relationship between user-centered tagging systems, such as del.icio.us, and the more highly structured systems of the Semantic Web. Using three aspects of phenomenological theory—the movement of the mind out towards an entity and then back in an act of reflection, multiplicities within unity, and the sharing of intentionalities within a community—the discussion suggests that both tagging systems and the Semantic Web foster an intersubjective domain for the sharing and use of information resources. The Semantic Web, however, resembles traditional library systems, in that it relies for this intersubjective domain on the conscious implementation of domain-centered standards which are then encoded for machine processing, while tagging systems work on implied principles of emergence.
You might have seen the tag clouds posted below. I calculate tag positions based on co-occurrence, such that tags used together are placed closer to each other. Additionally, tags are scaled áccording to frequency.
A general problem I have with the resulting representation (and common tag clouds as well) is the fact, that every tag occurrence is weighted equally. As a result, these tag clouds never represent the current state of interest, but a very sluggishly changing summary of your archive. However, your interests and the corresponding vocabulary keeps moving on. So I am currently investigating trends in tag clouds and how groups of related tags emerge and disappear again.
I am currently organizing my literature on tagging. When I started with research on that topic about a year ago, you could count the number of substantial contributions on one hand. Over the current year, however, the number of papers on this topic has sky-rocketed, which makes the whole area hard to oversee at the moment.
I will start with an overview of my subjective must-reads for now (all linked to citeulike):