At this point you’ve likely seen the Times article on Google Labs’ Books Ngrams Viewer. If not, the short version is that a couple of postdocs at Harvard saw the potential of Google Books early on and started working on a way to track the frequency with which words appear in books over time. They published their results today in Science. Using a corpus of 5,195,769 books published between 1500 and 2000 (approximately 4 percent of all books ever published), including 361 billion English words, they calculated the frequency of usage by dividing the number of instances of a word (an n-gram) in a given year by the total number of words in the corpus in that year, which gives a funny way of thinking about word usage. Who would have thought that “damn” would account for just under .0008% of the words appearing in English books in 1920? (Click images for a larger view)
There’s good reason to be sort of suspicious of the data. First, we’re talking about Google Books, which has a host of problems, chief among them its unsophisticated understanding of the book itself. It gets publication dates wrong all the time, doesn’t seem to know what an edition is, and often counts bound volumes of magazines as books. So, the numbers have to be off.
Second, from what I can tell, it’s counting all appearances of words as independent events, rather than as incidents within a text, and not adjusting for hyper-usage in books that might skew the count. It’s once again the problem with Google’s “bag-of-words” approach and lack of metadata–it’s counting items in the bag rather than thinking about the ways those items are sorted and what role the sorting might play. For instance, what happens to the count for “motherfucker” in 1989 when Miles Davis’s autobiography Miles comes out, with its 160 pages containing “motherfucker,” most of those pages with more than one “motherfucker” per page? There’s no major spike for the word in 1989 (although there is growth), so even Miles’s prodigious use of the word didn’t throw off the graph, but would the subtraction of that book from 1989 (and 1990, which is also listed as a year of publication in Google Books, sigh–so, it’s being double-counted?) make a big difference? Conversely, might the bio’s appearance have liberated writers and publishers to let their “motherfuckers” free? To be fair, the authors have released their data sets, so it would be possible to see just how much a couple hundred “motherfuckers” in one book could skew the results. Anyone downloaded the data-set with “motherfucker” in it?
Third, and I quote from the Science article, “Periodicals were excluded.” That is the single mention of periodicals in the article. There is no explanation for the exclusion of periodicals. Here’s another quotation: “‘Culturomics’ extends the boundaries of rigorous quantitative inquiry to a wide array of new phenomena spanning the social sciences and the humanities.” Periodicals remain outside those boundaries, although they represent a huge portion of the printed material of the nineteenth and twentieth centuries (Recall Peter Stallybrass’s claim that books account for only 13% of printed matter). Since books take much longer to produce and cost more, encouraging a conservative attitude toward cultural change, they provide a less accurate index of cultural change than periodicals do. Or, at least, periodicals typically reflect a stronger impulse to reflect short-term change than books do. In the Times article, Louis Menand laments the lack of humanists among the authors of the Science, noting the absence of an historian of the book. Maybe, going forward, they’ll get a periodical specialist too.
Having said all that, this is pretty cool stuff, and as the authors of the paper acjknowledge, the quant work is not enough; we need interpretation. That’s our challenge–that and getting Google Books to realize magazines exist as their own category.
For now, here’s some screen-shots of some searches I did on the Ngrams site.
So, you can see that the action really begins in 1880 or so, as both graphs begin a steady climb. Let’s zero in on those years.
The growth of “modernism” looks positively anemic in comparison, but that’s in part a question of scale. So, here’s what “modernism” looks like by itself.
Two things stand out to me here.
1. The boom in appearances of the word “modernism” starting in 1978 or so–a reflection of academic publication, which Google Books is long on, no? Other interpretations?
2. The slight dip and then recovery from 1940-60. This looks surprising, given this is just the moment that the word “modernism” gains serious leverage in and out of the academy as a term designating a phenomenon in the arts and architecture. It’s the moment when Eliot, Corbusier, Berg, etc. start routinely getting called modernists. So why the decline? Could it be the fading of modernism from another discourse–theology? This is where looking at the actual texts matters.
Finally, recalling Eurie Dahn’s post about mentions of magazines in literature, here’s what the site gives us for mentions of “magazine” and of “modernism” in fiction in English.