10 Voyant Tools Corpora of 9 MJP Magazines

I spent the day preparing Voyant Tools corpora for an in-class lab tomorrow. The following links lead to a chronological corpus of all 9 magazines currently offering TEI XML files in the MJPLab Sourceforge site. I also broke them down and offered individualized corpora by magazine to facilitate comparative analysis.

To make the datasets, I used a regular expression in TextWrangler to strip all the tags out of the XML files, and then used a command line script to batch rename them. The first attempt at the comprehensive corpus resulted in weird results on account of Voyant’s ordering the files alphabetically, so I manually renamed all 508 of them to place the publication date (yyyy-mm-dd) at the beginning of the naming convention to keep the representation of materials chronological. The individual magazine corpora are chronological on account of the volume and issue numbers having been part of the naming convention first used by Mark Gaipa.

MJP Corpora at Voyant Tools

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s