Well-formed data

May 23rd, 2007

Emerging topics update

My thesis is due pretty soon, so I am currently writing A LOT and make some onâ€“theâ€“go beautifications to my experiments.

First one is the emerging topics histogram. I followed my own advice and vertically centered the stacked histogram. Additionally, I never liked those sharp edges, so now I do not only “fade out” tags visually, but also fade them in, resulting in a much more organic picture, and largely improved readability of the chart. New color scheme: old tags are cold, freshly introduced ones in warm color. This is all very much inspired by the fabulous last.fm charts by Lee Byron – thanks!

> new interactive version here

and some pix:
picture-12_480Ã—229shkl.png
picture-3_480Ã—354shkl.png
picture-13_480Ã—388shkl.png
picture-6_480Ã—361shkl.png

emerging-topics, tagging, temporal-dynamics, vocabulary-evolution

6 Comments

May 10th, 2007

You say… We say…

Another visual experiment on tagging:
How do individuals use tags â€” compared to the community? Do you use private language or are you a mainstream tagger? When you tag with “design” â€” do the others think it is “art”?

On the left: the personal tags for the personal bookmarks â€” ordered by frequency â€” the containing box per is log-scaled so you get an impression of the longâ€“tail position of a tag. Which means: Often used tags are large, bright and go to the top.

On the right: community tags for the tagged ressources.

In the middle: tags with the same name are connected. If a line is horizontal, the indvidual and the community essentially agree on the relevance of the tag for the ressources. The steeper it is â€” the larger the disagreement. If no line starts at a tag, it means it is not present in the other list

So in the picture above, you can see my tags. Same facts you can read from the picture:
â€¢ “m.a.thesis” is a very often used, but private tag of mine.
â€¢ The ressources I tag in general are mostly tagged with “design” by the community. I, however, use the tag “design” much less often.
â€¢ “news”, “semantic web”, “webdev” are tags I use often, but not the community.
etc.

And often course, you can click individual tags to see what the comparison is like for subsets of the bookmarks. That’s especially interesting for obscure tags like “guru” â€” you can see what the tagger “means” by looking at the distribution of the community bookmarks (in this case “design – art – programmer – artist”). Interesting!

Some more shots:

I wish I could say “click here for the interactive version” as usual â€” but unfortunately, del.icio.us offers a JSON API, but did not put a crossdomain.xml file on their server. Which means the visualization (which runs nicely on my harddisc) cannot load data when put in the web. Bummer. I hope I can figure something out.

So for now â€” I can only offer a download link. Click the index.html. You might have to adjust you Flash player security settings in order to load the community tags. Caveat: The application is still a bit buggy and pretty heavy concerning processor ressources.

community, intersubjectivity, long-tail, Personal, tagging, visualization

19 Comments

May 6th, 2007

Hourly shots + delicious + twitter = fun

I am having fun here with a little custom made Flash app that reads
â€¢ hourly shots from my built-in webcam
â€¢ my twitter posts
â€¢ and my delicious bookmarks

and puts it all together. Bit messy at the moment, but I am working on it.

Bigger picture here.

del.icio.us, hourly, mashup, twitter, webcam

2 Comments

April 26th, 2007

Forrester Research: Social Technographics

Just got my hands on the quite fascinating “Social Technographics” study from Forrester Research. They take a close look at the social and demographic structure of the social web population â€” unlike Technorati’s statistics which mostly focus on raw blog growth numbers and structural features of the blogosphere. The study is based on two surveys including including close to 5000 North-American individuals each.

Interesting facts:

22% of adults now read blogs at least monthly, and 19% are members of a social networking site. Even more amazingly, almost oneâ€“third of all youth publish a blog at least weekly, and 41% of youth visit a social networking site daily.

Based on an analysis of online participation and consumption practices, the authors identify six segments of users, ordered by degree of participation:

14 Comments

April 13th, 2007

Hans Rosling / gapminder

Watch Myths about the developing world, a talk by Hans Rosling from gapminder.org
Intense, thrilling, passionate. 10/10.

gapminder, open-data, rosling, talk, ted, visualization

1 Comment

March 11th, 2007

Elastic lists

Just a short post, but another demo is online.

It is a demonstration of the “elastic list” principle for browsing multi-facetted data structures. Click any number of list entries to query the database for a combination of the selected attributes. If you create an “impossible” configuration, your selection will be reduced until a match is possible.

The example data is based on the Noble prize winners dataset used in the Flamenco facet browser.

Elastic lists enhance traditional facet browsing approaches by
â€¢ visualizing relative proportions (weights) of metadata values by size
â€¢ visualizing unusualness of a metadata weight by brightness
â€¢ and animated filtering transitions.

In unfiltered view, the brightness shows a trend measure, indicating a rising number of prices of the last years.

In filtered views, a brighter background indicates a higher weight of the metadata value compared to the overall distribution.

If, for instance, you click “peace” as in the example above, you will see that “female” and “Switzerland” are much brighter, indicating that the proportion of women and Swiss is much higher in this context than compared to the whole data set. That’s interesting information and could also be used to characterize the result set of a keyword query or any other collection in terms of its “characteristic” metadata values. Besides that, it fosters understanding of how metadata values are correlated with each other, which is often interesting information itself.

You can also switch on little sparklines to see the temporal distribution of each metadata value:

elastic-lists, facet-browser, facets, nobel-prize, sparklines, statistics, Thesis

41 Comments

February 19th, 2007

Emerging topics v2

I am currently working on trends in individual tagging behaviour. You might have seen a first, animated version of my studies based on tag maps. The original animation shows the emergence of previously rarely used tags over time. Now I dug deeper and made a richer visualization for investigating this topic.

For the impatient:
Â» Check out the interactive version here

And here’s the explanation:

It has been shown before (for a plausibility argument, also check out the marvellous cloudalicious tool, where you can track tag proportions for any website on the web), that tag proportions for ressources stabilize over time. Which means that the tag cloud representing a tag profile for a resource does not change much, once a sufficient number of tags has been collected. In a folksonomy, this is generally considered a good sign, since this indicates a certain agreement on how to judge a certain ressource and what vocabulary to use.

For tagging individuals, and communities, this might â€” at first glance â€” hold true as well. Consider the following the visualization of a tagging community‘s evolution, for example:

picture-8_480Ã—250shkl.png

Each tag is assigned a band, with the thickness indicating the overall summed usage of a tag over time (time runs left to right). Thus, a vertical cut through the graph corresponds to taking a tag cloud snapshot at this time point. The vertical order is based on the overall frequency of the tags. The color is used to to give an impression of the long tail distortion – if all tags would appear equally often, you would see a linear transition from red to green instead of the skewed distribution.
So – what do we see? Apparently, most of the bands seem to grow in parallel, indicating a stable growth proportion for all tags. Of course, we cannot see much for the smaller tags, and there are some edgy parts of the graph which might indicate different behavior at specific time points, but overall – pretty stable impression.

However, this does not make much sense. For individuals and communities, the topics of interest evolve over time, so there must be some hidden variability not captured by the visualization and the underlying linear model.

So I decided to provide an alternative visualization for the data based on a decay model, where tags “age” over time and finally get “forgotten” if they are not used anymore. This idea is loosely based on the Yules-Simon memory model for tag generation presented in this paper.

picture-7_480Ã—266shkl.png
A radically different picture emerges. Not only does the overall shape now nicely display phases of community activity over time, but also the life cycle of single tags is much more transparent. You can rollover single layers highlight it and display the corresponding tag name. Great fun.

Â» Check out the interactive version here

What I am now curious about:
– Is there a correlation between time-dependency and over-all frequency of tags? In other words, are frequent tags more evely distributed over time, whilst the low frequency tags tend to be more variable over time?
– Is there a correlation between temporal synchronization and general co-occurrence? Which means – do related tags also appear and disappear together over time?

I think the answer is YES to both questions, but that would definitely need some statistical analysis (any bored neuroscientists around to help me? ;)

To-dos for the visualization:
– Implement a slider, so you can see how a linear and decayed tag cloud would have looked like at a specific time point.
– Stamen got it right: Maybe I should have plotted from the vertical center. Or at least provide an optional inversion of the sorting. Because right now, all the top (green) layers are really distorted, making visual analysis really hard.
– Put some numbers on the axis
– Show single tagging events on rollover. Or even “unfold” the layer to improve readability and avoid misconceptions.

stack-chart, tagging, temporalDynamics

2 Comments

February 7th, 2007

Indexed

I have been subscribed to the indexed blog for a couple of weeks now and really, it never ceases to amaze me. Hands down, this is one of the most funny, original and yet deepest blogs I have seen.

The concept is simple: little stories or facts about life are told with infographics drawn on index cards (which I love anyways). Its amazing how much laughs or “true, true”s you can get out of little Venn or axis diagrams:

card648jpg_480Ã—288shkl.jpg

Reminds me also of the wonderful Facts of life by Pippo Lionni.

facts-of-life, indexed, lionni, review, story-telling

Comments Off

January 31st, 2007

Husserl and tagging

A very nice paper on the “laissez-faire librarianship” often associated with tagging vs. more structured semantic web approaches. Most notable is that the discussion is put in the context of Husserl’s theory of reflections, intentionality and intersubjectivity.

A PHENOMENOLOGICAL FRAMEWORK FOR THE RELATIONSHIP BETWEEN THE
SEMANTIC WEB AND USER-CENTERED TAGGING SYSTEMS

D. Grant Campbell
Faculty of Information and Media Studies
University of Western Ontario
London, Ontario N6A 5B7, Canada

Abstract

This paper uses Husserlâ€™s theory of phenomenology to provide a model for the relationship
between user-centered tagging systems, such as del.icio.us, and the more highly structured
systems of the Semantic Web. Using three aspects of phenomenological theoryâ€”the movement
of the mind out towards an entity and then back in an act of reflection, multiplicities within unity,
and the sharing of intentionalities within a communityâ€”the discussion suggests that both tagging
systems and the Semantic Web foster an intersubjective domain for the sharing and use of
information resources. The Semantic Web, however, resembles traditional library systems, in
that it relies for this intersubjective domain on the conscious implementation of domain-centered
standards which are then encoded for machine processing, while tagging systems work on
implied principles of emergence.

community, husserl, intersubjectivity, Personal, semantic-web, tagging

4 Comments

January 22nd, 2007

Tag maps update

As promised, here is an update to the tag maps application I introduced below along with some explanations.

For the impatient: HERE’S THE LINK

(Update again: *The latest version can be found here*)

And for the curious: Here’s the explanations: → read more

CCA, elastic, PCA, similarity, statistics, tagging, tagMaps