“He was right after all, and the scholars who for a generation now have ignored or sneered at his evidence, sometimes—when they have condescended to mention it—printing the word evidence itself between inverted commas, have not turned out to be our most reliable guides.”

So wrote R.H. Barker in 1958. Here he was speaking of E.H.C. Oliphant, who attributed The Revenger’s Tragedy to Thomas Middleton and, despite the dismissiveness of less reliable scholars, was eventually vindicated.

When I first ventured into authorship attribution studies, I spent a long time trying to develop an old-school ear and eye for early modern dramatists’ verse styles. I was struck, time and time again, by individual voices breaking through the hybrid passages of certain anonymous texts. It wasn’t long before I realised that some of the conclusions I reached during my MA studies had been anticipated by scholars for over two centuries, the most recent claimant being Sir Brian Vickers.

In a general essay published in the Times Literary Supplement in 2008, Vickers argued for a new Kyd canon, ascribing to him King Leir (1589), Arden of Faversham (1590), Fair Em (1590) and parts of Henry VI Part One (1592) and Edward III (1593). Vickers’ attributions were rejected by several scholars using different systems, largely arithmetico-statistical, based on word frequencies. Vickers studies authorial self-repetition seen in their use of N-grams, contiguous sequences of words, using evidence produced by modern anti-plagiarism software. The only scholar to realise the validity of this approach was Martin Mueller, co-author of The Chicago Homer  which allows direct study of the thousands of N-grams repeated in the corpus of early Greek epic (the famous ‘Homeric formulae’). In two blogs published on his then site in 2009 (‘N-grams and the Kyd canon: a crude test’ and ‘Vickers is right about Kyd’), Mueller applied statistical tests which convinced him that ‘Vickers is right about the Leir play, Fair Em, and Arden’.

My own researches over the last few years have collected a wide range of evidence – encompassing verbal parallels (both common and rare), feminine endings, exclamations, intensifiers, colloquialisms, prefixes, suffixes (according to manual counts of complete play texts, as opposed to samples), rhyme forms, linguistic idiosyncrasies, pause patterns, internal self-repetition (utilising a new methodology I have devised; my results for early Shakespeare texts are forthcoming), compound formations, patterns of influence, plot, characterisation and overall dramaturgy. In the course of this study I have scrutinised the Vickers ascriptions and the arguments against them and have been surprised at some of the scholarly deficiencies in    authorship attribution studies. It appears that many attribution scholars are not weighing up other empirical evidence, nor considering the falsification conditions for their assumptions, nor considering the possibility that their results might be ambiguous or misleading. There has been considerable misuse of computational stylistics, and no acknowledgement of the inherent subjectivity involved in interpretations of Principal Component Analysis. Statistically significant findings have been concealed by scholars and editors who wish to add plays to modern editions of Shakespeare’s works. Most alarming of all, however, is the fact that the findings of scholars such as Charles Crawford, Philip Timberlake, Ants Oras and Paul V. Rubow, who provided the groundwork for many accepted attributions today, have been simply ignored. But not mentioning something, or paying something scant attention, does not erase it from history. The wheel of criticism will come full circle.

What is most refreshing about Martin Mueller’s database (Shakespeare His Contemporaries) consisting of 548 plays dated between 1552 and 1662, is that we are afforded a corpus that has been created to facilitate the study of N-grams. It will produce objective, automated results, which will create, as Mueller puts it, ‘a framework of expectations’ within which their evidentiary value can be evaluated. We now know for sure that ‘Roughly speaking, plays by the same author are likely to share twice as many dislegomena’ (that is a sequence of words that appears only twice in Mueller’s corpus) ‘as plays by different authors’ and that ‘it is quite rare for two plays–texts that are typically between 15,000 and 25,000 words long–to share more than one or two of the dislegomena analyzed here’. Mueller and his colleagues have developed an electronic corpus that could revolutionise not only attribution studies but early modern literary and textual studies as a whole. I have profited much from Mueller’s Excel document ‘SHCSharedTetragramsPlus’, which lists plays that share large numbers of unique tetragrams (contiguous four-word sequences or more that appear no more than twice within the corpus of 548 early modern plays) and is available at: https://scalablereading.northwestern.edu/category/shakespeare-his-contemporaries/

So, what do Mueller’s results say about Vickers’ Kyd attributions? Well, let’s consider the fact that Shakespeare is the current favoured candidate for the authorship of the anonymous Arden of Faversham, which has been attributed to Kyd since 1891. The play at the top of the list of matching N-grams is Kyd’s Soliman and Perseda (1588), with eighteen unique N-grams of four or more words. Given that ‘out of ~130,000 pairwise combinations of plays by (putatively) different authors, there are only 119 that share more than a dozen dislegomena of our type’, there is clearly some plausibility in Vickers’ ascription. (Not to mention Mueller’s other results for the top 1,500 play-pair combinations with the densest networks of N-grams, or his excellent work applying Discriminant Analysis to common lemma trigrams). One cannot accuse Kyd of being entered into a one-horse race here. Kyd’s The Spanish Tragedy (1587) and Soliman and Perseda share eight unique N-grams of four or more words. On this basis, Arden of Faversham is more like Kyd than the traditionally accepted plays. What of King Leir? The old historical romance (if one can call it that) shares eleven word sequences with Arden and eight with Soliman and Perseda. Mueller notes that ‘If we look more closely at shared dislegomena by same-author play pairs, we discover that on average plays by the same author share five dislegomena, and the median is four. Roughly speaking, plays by the same author are likely to share twice as many dislegomena as plays by different authors’.

Among the list of texts sharing large numbers of unique word sequences with Arden of Faversham are three Shakespeare plays. Here the importance of chronology must be emphasised. Plays are not merely sources of data, as some attribution scholars treat them, but real documents produced in a historical context: the narrow and intensely competitive world of the London theatres. Playwrights were very aware of what rival companies were putting on, and some of them – including Shakespeare – were also actors, and would develop an intimate knowledge of texts in which they themselves had performed. The marvellous chronology currently being produced by Martin Wiggins (British Drama 1533-1642: A Catalogue) allows us to give more precise dating than has yet been available. Wiggins assigns Arden of Faversham to 1590, and thus, as Professor MacDonald P. Jackson has acknowledged, it antedated the whole of Shakespeare’s corpus. This fact enables us to see that the Shakespeare matches with Arden are indicative of its influence on him, rather than his authorship. Martin Mueller’s database shows that Richard III (1593) shares eight tetragrams, The Merchant of Venice (1597) shares seven tetragrams, and Troilus and Cressida (1602) also shares seven.

Mueller’s database records shared repetitions between The Spanish Tragedy and sixteen plays by different authors. This is hardly surprising, for the play was enormously popular and many of its phrases seem to have been embedded in the minds of Kyd’s contemporaries. Kyd’s tragedy was parodied by dramatists such as Nashe, Heywood, Marston, Dekker, Jonson, Field, Beaumont and Shirley. It is notable, however, that in a corpus of 548 plays, six of the sixteen plays that make the list of matches with The Spanish Tragedy are Shakespeare’s. These texts span Shakespeare’s entire career, right up to Henry VIII (1611).

Arthur Freeman observed that Kyd’s Soliman and Perseda ‘never attained the popularity of The Spanish Tragedy’, which ‘is evident, both from its scant printing history and the paucity of allusions to it in its own time’. Nevertheless, we find nine matches between the Turkish tragedy and Henry VI Part Three (1591), according to Mueller’s spreadsheet, and seven with The Two Gentlemen of Verona (1594). As Lukas Erne puts it: ‘Shakespeare, perhaps more than anyone else, seems to have specifically profited from Kyd’s works’.

Fascinatingly, Shakespeare His Contemporaries validates the claim made by scholars such as Hardin Craig, Thomas H. McNeal, Meredith Skura and Mueller himself, that King Leir exerted a considerable influence over Shakespeare’s drama. Shakespeare was borrowing phrases from the play from Henry VI Part Three until Much Ado About Nothing (1598). Whether his memory failed him due to the passage of time, or whether he made a conscious effort to avoid capping lines in his tragic version (Shakespeare utterly transformed the source play, as noted by Alfred Hart and Geoffrey Bullough), King Leir and King Lear (1605) do not make Mueller’s list. Mueller’s database thus poses some very interesting questions about Shakespeare’s habits of verbal borrowing…

Soliman and Perseda is assumed to have been printed shortly after it was registered in November 1592. King Leir doesn’t appear to have been printed until 1605. Did Shakespeare somehow have access to copies prior to publication? Had Shakespeare acted in these plays? Are these repetitions the result of Shakespeare’s capacious memory, which was certainly required as an actor-dramatist? Once again, Mueller’s results pose some fascinating questions. One thing we can be sure of, however: the repetitions of phrases from Arden of Faversham are hardly unusual in the context of Vickers’ ‘extended’ Kyd canon. If one were to collect the parallels with Leir (which had a far more pervasive influence over Shakespeare, with five Shakespeare plays providing large numbers of matches with it), throw in some impressionistic evaluations of certain passages, along with some misleading data drawn from assumptions about the distribution of linguistic items, it would not be too difficult to provide a superficially impressive case for Shakespeare’s part authorship.

I’ve scrutinised the overall patterns in these plays and found that, contrary to Arthur Kinney and Jackson’s ascriptions, there is no greater concentration of Shakespeare matches in the supposedly ‘Shakespearean’ scenes (scenes Four to Nine, the middle portion of the play, or Act Three in older editions) of Arden. Six of the eight matches with Richard III are in ‘non-Shakespeare’ scenes; six of the seven matches with The Merchant of Venice are in ‘non-Shakespeare’ scenes, and four of the seven matches with Troilus and Cressida are in scenes not ascribed to Shakespeare by Kinney or Jackson. Nor are the matches with the ‘Shakespeare’ scenes in Arden of a greater quality.

It is curious that Professor MacDonald P. Jackson has argued in his recent monograph Determining the Shakespeare Canon that there is a disparity of data regards the quantity of verbal parallels in the ‘Shakespeare’ scenes of Arden, in comparison to the rest of the play. In ‘New Research on the Dramatic Canon of Thomas Kyd’, Jackson was able to detect more unique triples between the domestic tragedy and Shakespeare’s Henry VI Part Two (1591) and The Taming of the Shrew (1592) than Vickers had (at that time) discovered with Kyd’s plays. Jackson lists (according to my count) forty unique parallels between Henry VI Part Two and scenes that he does not attribute to Shakespeare in Arden. He lists ten matches between Shakespeare’s play and the middle portion of the domestic tragedy. He also lists thirty-eight matches between The Taming of the Shrew and scenes outside of the middle portion of Arden, with just six matches between Shakespeare’s early comedy and the scenes he assigns to Shakespeare.

If we take Jackson’s figures and adjust them to overall word counts, we are given an average of 0.03 matches between scenes Four to Nine of Arden and Henry VI Part Two (combining the overall word count for these scenes in Arden with the total word count for Shakespeare’s play gives us a total of 30972) and 0.09 matches with the ‘non-Shakespeare’ scenes (which has a composite count of 42480 words). Similarly, The Taming of the Shrew averages 0.02 matches with the middle portion of the play (with a combined total of 26720 words) and 0.10 with the ‘non-Shakespeare’ scenes (38228 words). According to Jackson’s own data, there is no disparity whatsoever.

Judging by these patterns, one might hypothesise that either Shakespeare wrote the entire play, or he borrowed from it as a whole, just as he did with The Spanish Tragedy, Soliman and Perseda and King Leir.

Jackson has also argued in his monograph that matches with Shakespeare in Scene Six and Scene Eight of Arden must be the result of authorship, for he could not have acted in both. Conjecturing that Shakespeare had indeed acted in these plays (I suspect for Pembroke’s Men, as proposed by Halliwell-Phillips during the nineteenth century), I recently attempted to assign certain roles to him, as part of a research project with Marcus Dahl and Lene B. Petersen. Interestingly, many of the matches with Soliman and Perseda suggest Shakespeare could have played Erastus, while the matches with King Leir suggest he played either Perillus or the Messenger. But Erastus and Perillus are principal roles, and the Messenger is probably the most memorable character in King Leir. We can’t rule out the possibility that Shakespeare had simply seen these plays during performance, and retained a great many lines via his aural memory. Significantly, many matches co-occur in scenes during which none of these characters are present on stage so, once again, the distribution of parallels in Arden is hardly unusual.

Proper acknowledgement of Shakespeare’s verbal indebtedness to plays such as Soliman and Perseda and King Leir will take us one step closer to understanding Shakespeare’s dramatic development. In authorship attribution studies, there is much emphasis placed on authorial self-borrowing but, in my view, an author’s patterns of influence can also serve as useful authorship markers. One finds that dramatists such as Jonson and Shirley generally borrowed from particular contemporaries, while early Shakespeare shares many unique N-grams with Marlowe and Kyd. Older scholars recognised the potential for identifying authors according to their primary influences.

In 1903 Charles Crawford observed that ‘Boas and others have pointed out Kyd’s frequent imitations of John Lyly’, so ‘we should not be surprised to find Lyly’s similes and his Euphuistic mannerisms appearing also in Arden of Faversham’. Mueller’s document shows us that Arden shares seven unique phrases with Lyly’s Endymion, The Man in the Moon (1588) while King Leir also shares seven N-grams with Lyly’s play. Two of these borrowings from Lyly in Arden feature in the ‘Shakespeare’ portion of the play. However, we do not find a high quantity of matches between Shakespeare and Lyly’s comedy until Cymbeline (1609), while it would be interesting to determine the probability that Shakespeare and a co-author were both recycling verbal details from Lyly’s comedy, in order to inform their individual contributions to the domestic tragedy.

If Arden and King Leir are, as Vickers contends, both by Kyd, one might surmise that the dramatist was influenced by Endymion when he wrote King Leir, and that these influences, being at the forefront of his verbal memory, found their way into Arden of Faversham shortly afterwards.

Statistical analysis, like literary analysis, can aspire to objectivity, but it always relies upon an interpretative position. In this post I have discussed my observations but, of course, others might interpret the data differently. I am most grateful that I was invited to offer my thoughts for Mueller’s blog.

There have been considerable advancements in identifying authors through the development of electronic corpora, but I dare say we have much more to learn. Shakespeare His Contemporaries has wonderful potential for researchers, and will almost certainly contribute to a revaluation of early modern literary studies as a whole.

As for where Mueller’s database leaves many of the current consensuses in attribution studies, I am compelled to invoke the Roman poet Phaedrus:
Non semper ea sunt quae videntur…