Structure, Sign, and Word Cloud

Word clouds are an increasingly familiar online tool.  Blogs, library catalogs, even The New York Times . . . the whole world’s using them.  Is modern periodical studies?  Anyone out there done a word cloud for any little magazines?  What would a cloud look like for The Little Review, The Egoist, or The Masses? Here’s one for The Waste Land.

And another one for Woolf’s Jacob’s Room.

And finally, my favorite, Derrida’s “Structure, Sign, and Play.”

Now, this is largely a lot of fun.  No big surprise that “Jacob” shows up a lot in Woolf’s novel, or that “center,” “discourse,” and “always” (what, no “already?”)* are, er, central to Derrida’s essay, but I was surprised to see “water,” “mountains,” and “rock” dominate Eliot’s quintessentially urban poem.  With short and/or familiar works we might not get big surprises all that often, but what about with big texts, like a volume of a magazine or a run of a little magazine? Could we discovering surprising and/or evolving priorities in them?

The problem with this is that few modern periodicals are easily accessed in plain text or other formats that lend themselves to word cloud programs like Wordle, which I used to make the three images in the post.  As we go forward with the digitization of periodicals, we want to make sure that we make them as open as possible not just so they can be read but so their data can be mined for other purposes.

*I think that Wordle excluded “already” as a non-lexical word, which is a problem in this case.  Silly Wordle, did you forget the iterability of the signifier?

Advertisements

3 responses to “Structure, Sign, and Word Cloud

  1. Not quite what you’re looking for, but here’s a tag cloud that represents my students’ critical writing on magazines (Summer 2009):

    http://macaulay.cuny.edu/seminars/material-modernism/content/tag-cloud

  2. There are lots of ways to derive plain text from XML or PDF files; do a Google search on PDF extract plain text. For example, I used Acrobat to extract the (semi) corrected OCR from the MJP edition of BLAST 1 and ran it through Wordle; I’ve uploaded the cloud to the blog (but I don’t know how to link to it here).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s