We are proud to announce the launch of our prototype of the Hoccleve Lexicon (the HOCCLEX). This is a major milestone on our path to a critical edition, and an important moment in the long and tangled history of Hoccleve and the digital humanities. The new lexicon feature is based on the HOCCLEX files, a set of computer files created over three decades ago. For a bit of perspective, consider that the work stations for cutting edge humanities computing projects at their creation looked like this.

By Autopilot – Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=39098108

First created in the early 1980’s as part of what was at the time a pioneering effort to bring computing power to humanist research, the HOCCLEX files are also a fascinating piece of the history of what we now call the Digital Humanities. The files were developed in the early 1980’s by Peter Farley, working under the auspices of an editorial team lead by D.C. Greetham. They contain semi-diplomatic transcriptions of the poetry found in Hoccleve’s three surviving holograph manuscripts (Huntington MS HM 111, Huntington MS HM 744, and Durham MS Cosin V.III.9). Each word in the transcription has been marked with a Middle English root form and tagged for grammatical and syntactical data including person, number, and part of speech. The original purpose of the HOCCLEX files was to create the raw data for a lexicon of the holograph manuscripts. Greetham proposed to use this lexicon to identify preferences in Hoccleve’s holograph manuscripts that could be used to normalize spelling variants and resolve accidentals in a critical edition of the Regiment of Princes, Hoccleve’s major poetic work. In a very real sense, the HOCCLEX files were the core of Greetham’s proposed edition, the key piece of the puzzle that he hoped would allow him to combine a Lachmannian base-text approach with a copy-text editorial approach. Moreover, they were the medium through which Hoccleve’s authorial intentions, as displayed in the holograph poems, could be discerned and transferred to the Regiment, which survives in numerous manuscripts, but not in Hoccleve’s hand.

Greetham’s work proceeded far enough for him to publish several articles outlining how the HOCCLEX files would be used, and to inform Charles Blyth’s 1999 TEAMS teaching-edition of the Regiment.


However, Greethams’ proposed critical edition failed to materialize. After 1999, the HOCCLEX files, their purpose seemingly spent, might easily have been lost. Fortunately, Blyth kept not only the computer files but a wealth of materials, including microfilmed copies of most of the Hoccleve manuscripts and over 6000 handwritten collation sheets. In 2009, Blyth donated these items to Elon Lang, and they now serve as the core archival resources of the Hoccleve Archives project.

Over the past several years, Student Innovation Fellows at GSU have been working to make the HOCCLEX files accessible to scholars and to recreate the Hoccleve Lexicon in a more robust and, most importantly, public form. Since the creation of the original files, their potential utility has been amplified by the subsequent coming of the digital age. Greetham’s original conception for the lexicon was to create a tool that would largely work behind the scenes to inform a printed critical edition. In contrast, our aim is to take advantage of the internet to bring the Lexicon to life in a digital format where it can serve as a public tool. As a web-based resource, the Lexicon, populated with data from the original HOCCLEX files, can function as a fully searchable and browseable research tool of use to Hoccleve scholars, students of the Middle English language in general, and serve as a key archival resource available to collaborators on the largest goal of the Hoccleve Archive, a digital critical edition of the Regiment.
A stand-alone interface with the Lexicon is yet to come, but the HOCCLEX files have now been integrated into our edition of the holograph poems. To use it, simply right-click on any word in any poem. This will open a dialogue box allowing you to search for instances of the word within the poem or across all the poems of the holograph manuscripts, and detailed grammatical information about the word. For instance, a search of the term “womman” will pull the following information.

The initial prototype will be limited in some respects – most notably, it will display an untranslated version of the grammatical mark-up, so parts of speech searches will unfortunately display in such user-friendly forms as ‘v1#adj%prp’ or ‘n[??FORM?CHK]’. We are still working to understand the nearly 250 different abbreviations for parts of speech contained in the HOCCLEX files. Some are easy enough to understand, but many are ambiguous ( ‘n#propn[NOT-MED]’), seem to be human errors ( ‘ende’) or are just plain head scratchers. Unless we can find the key to them, we may need the services of a seriously talented linguist/grammar nerd to help us replace the abbreviations with more user-friendly terminology. If anyone using the Lexicon has information that would help us identify some of the more obscure corners of this mark-up language, please contact Robin Wharton.

The lexicon will become more robust in the coming months. Even at this initial stage, however, it is remarkable step forward for a project that began over thirty years ago, and a reminder of how much the internet has changed the horizon of possibility for computing projects in the humanities.What began as an essentially private database, certainly one that would inform public documents and which gave rise to peer-reviewed articles, but which nevertheless stayed behind the scenes because there was no easy means to make it public, is now accessible to students and scholars around the globe.