Wordle (www.wordle.net) is an online tool that has been growing in popularity over the past few years. Even on their own website, they describe their creation as "a toy for generating word clouds", yet I have seen it used more and more often as some sort of accurate diagnostic tool for analyzing discourse. The assumption, often unstated, is that if a word stands out clearly in a word cloud, then somehow it is of greater importance in the source document, and as such reflects a stronger opinion or point of view.
There are multiple reasons that this is anything but a good assumption,
and using a Wordle in this manner is not accurate and nothing more than
Pop Science, at best. First of all, the larger words are disproportionately larger
than the others; a typical 'lying with statistics' trick is to take
something that appears twice as frequently and make the representive
indicator not only twice as tall, but twice as wide. This gives it FOUR
times the impact, which is neither accurate or representative.
Additionally, synonyms are listed separately; if an author uses one
single word for an idea, that will ouststrip the more eloquent author
who uses multiple words with shades of meaning. The role of color comes
in; brightly colored words will stand out while yellows and pinks will
blend into the cloud, regardless of their size. Orientation comes into
play; words aligned horizontally are more readable than those that are
vertical. The sheer mass of smaller words is dismissed into the 'cloud'
while only a few large words are reduced to being representative of the
entire article. And worst of all, word clouds completely ignore context,
phrases or meanings...they only focus on individual words and the
number of times they are used. What is accurate about that?
best, a Wordle word cloud is an interesting party trick or a device for
initiating conversation, but to use it as some sort of diagnostic or
enlightening tool is folly.
For an exercise, consider strongly the message in this post, then follow this link to see the word cloud generated by this post: is this accurate? Does it represent the message here? You decide.