Saturday, March 6, 2010

WORDLE: diagnostic tool or party trick?

Wordle ( is an online tool that has been growing in popularity over the past few years. Even on their own website, they describe their creation as "a toy for generating word clouds", yet I have seen it used more and more often as some sort of accurate diagnostic tool for analyzing discourse. The assumption, often unstated, is that if a word stands out clearly in a word cloud, then somehow it is of greater importance in the source document, and as such reflects a stronger opinion or point of view.

Nonsense. There are multiple reasons that this is anything but a good assumption, and using a Wordle in this manner is not accurate and nothing more than Pop Science, at best. First of all, the larger words are disproportionately larger than the others; a typical 'lying with statistics' trick is to take something that appears twice as frequently and make the representive indicator not only twice as tall, but twice as wide. This gives it FOUR times the impact, which is neither accurate or representative. Additionally, synonyms are listed separately; if an author uses one single word for an idea, that will ouststrip the more eloquent author who uses multiple words with shades of meaning. The role of color comes in; brightly colored words will stand out while yellows and pinks will blend into the cloud, regardless of their size. Orientation comes into play; words aligned horizontally are more readable than those that are vertical. The sheer mass of smaller words is dismissed into the 'cloud' while only a few large words are reduced to being representative of the entire article. And worst of all, word clouds completely ignore context, phrases or meanings...they only focus on individual words and the number of times they are used. What is accurate about that?

At best, a Wordle word cloud is an interesting party trick or a device for initiating conversation, but to use it as some sort of diagnostic or enlightening tool is folly.

For an exercise, consider strongly the message in this post, then follow this link to see the word cloud generated by this post: is this accurate? Does it represent the message here? You decide.