Shakespeare by numbers

In November of last year, I attended a seminar given by Dr Jonathan Hope entitled Visualizing English Print from 1470-1800.  He outlined the aims of the Text Creation Partnership and gave several illustrations of the way in which printed texts could be quantified – for example, to reveal that A Midsummer Night’s Dream had a higher proportion of concrete nouns than any other Shakespeare play.  Considering the expansion of this wordcounting platform, the discussion afterwards turned to the direction of the humanities in light of a statistical revolution – a question that is becoming of increasing concern for those working in the area.  Will all English Literature undergraduates need to be schooled in Statistics in future years?  What does it even mean for A Midsummer Night’s Dream to have a greater amount of concrete nouns, or that amongst Shakespeare plays, the two most distant in linguistic terms are AMND and Measure for Measure.   You can experiment with such figures by visiting the WordHoard website.

These questions have been approached sceptically by a number of voices in the press, including a recent article in Scientific American that claims:

When we relegate the humanities to a bunch of trends and statistics and frequencies, we get exactly that disconcerting and incongruous dystopia of Italo Calvino’s If on a Winter’s Night a Traveler: books that have been reduced to nothing but words frequencies and trends, that tell you all you need to know about the work without your ever having to read it—and machines that then churn out future fake (or are they real?) books that have nothing to do with their supposed author. It’s a chilling thought.

It would be a chilling thought, but such a dystopian prospect seems unlikely to come into being.  English departments have been accused of approaching a similar self-destruction in exactly the opposite manner during and following the 1980s, denying concrete quantification (to the point of rejecting ‘fact’) in the favour of the cloudy, meaningless, and ‘decentred’ pontificating that characterises a lot of postmodern theory.  Nevertheless, literary criticism weathered the storm, and a great amount of worthy criticism survives the decline of extreme theory – criticism that acknowledges the value of literature.  Whatever scars the theory-obsession inflicted on academia, it is clear that most departments were and are still probing worthwhile questions about our shared past and its products.  Amid the ‘scientism’ and scientific rationalism proving popular in the media, it is unsurprising that humanities too should move toward some sort of mathematical ‘credibility’, especially at a time of funding cuts when departments must ‘prove’ their importance.

Despite these political questions, linguistic quantification poses little ‘scientistic’ threat to humanities proper and never threatens to reduce literature to spreadsheets and graphs.  Indeed, WordHoard has been defended by those involved with its creation;  in his recent blog post, ‘What Happens in Hamlet‘, Dr Hope argues that the statistical analysis afforded by these powerful digital creations aids rather than replaces literary analysis.  Explaining the somewhat surprising lack (in relative terms) of first person pronouns in Hamlet, he notes:

Digital analysis can’t explain the cause of the drop: the only question it is answering here is, ‘How frequently does Shakespeare use “I” in Hamlet compared to his other plays?’. On its own, this is not a very interesting question. But the analysis provokes the much more interesting question, ‘Why does Shakespeare use “I” far less frequently in Hamlet than normal?’.

Essentially, these digital analyses perform what diligent critics would otherwise do manually: close reading.  It is still up to the reader to determine the importance of ‘I’, or the significance of Athens’s many ‘fairy toys’; does the imaginative experience of concrete ‘bushes’ and ‘bears’ make A Midsummer Night’s Dream an altogether different comedy to the verb-al actions of Measure for Measure?

Naturally, statistics pose problems of comparisons and samples – as any Psychology student will verify.  If such statistics are to be cited, then no doubt they will demand more mathematically-sound methods of literary criticism, but as a place to start thinking and a point of comparison, they are surely not a method as insidious as some ‘humanists’ fear.



