I recently discovered Early Modern Print (‘discovered’ is perhaps the wrong word: I noticed Brodie Waddell and David Hitchcock talking about it on facebook). This website provides easy-to-use programmes, including the EEBO Spelling Browser, for analysing text available from Early English Books Online and the related Text Creation Partnership. The Spelling Browser, as designer Anupam Basu explains in this post, measures the frequency of ‘n-grams’, which are ‘contiguous sequences of tokens’, i.e. words or letters, and can be set to different levels of complexity to search for phrases as well as individual words. Naturally, the first thing I did was drop in a search for the terms ‘seaman’, ‘seamen’, ‘mariner’ and ‘mariners’. The results surprised me.
In my Ph.D. thesis, I claimed – because I thought so at the time – that ‘mariner’ was a more technical term, appearing in legal documents like court records and wills alongside its Latin counterpart nauta, but that in general use ‘seaman’ was the more common word, summing up the key ideas of masculinity and separation from society ashore which characterised this stereotype. This was based, I admit, on a fairly impressionistic reading of the available evidence, especially printed sources like pamphlets, newsbooks, ballads, and government publications. I toyed with the idea of statistical analysis, perhaps using the ballads in the English Broadside Ballad Archive (and I wonder if integrating the text from that archive, or others like the Burney Newspapers Collection, would change these results), but I did not have the time to pursue what was, for that project, a pretty minor point. It seems that I was wrong. The graph shows that, for most of the sixteenth and seventeenth centuries, ‘mariner’ and ‘mariners’ were more commonly printed words – only towards the turn of the eighteenth century do ‘seaman’ and ‘seamen’ overtake them.
There are, as Basu explains in his post (and also in this one), some inherent difficulties in these graphs which mean the results cannot be taken as straightforward. They do not show the ‘hits’ for any given word, because the increase in printed material across the period would result in considerably higher numbers for later years that might dwarf and conceal the significance of earlier appearances. Instead, they show percentage of total text – none of these words rise above 0.00006 per cent of the printed words in any given year. A few appearances of a word in a year with a low number of publications would create a potentially unrepresentative spike. On this graph, the period roughly 1580-1600 shows a higher percentage for ‘mariners’ than after 1640, but the total number of texts before 1600 was much smaller than in the later period, as shown in this graph, also by Basu. Basu therefore emphasizes that it is always important to check the text behind these visualizations.
There are deeper problems with the collections behind them, too. Variable early modern spellings, preserved in EEBO and the TCP, can make word-searches pretty complex. This is assuming that the text is accurate: although TCP – as it says on their website, ‘transcribed by hand’ – is fairly reliable, the text-recognition in other collections could raise problems if this sort of technique were to be applied elsewhere (I have experienced problems text-searching in the Burney Newspapers, for example). More importantly, EEBO is not a complete collection of printed material from this period: that graph linked in the last paragraph shows numbers both for EEBO and the English Short Title Catalogue, demonstrating just how small a proportion of early modern English texts is available on EEBO. Even if all of the contents of the ESTC (or any other catalogue) were uploaded and transcribed, there would still be the question of published works that have not survived in any library or private collection, the number of which we can only guess at. These caveats make the conclusions drawn from such graphs rather tentative – but in the absence of more complete evidence, they are probably the best we can achieve.
Which brings me, finally, to the problem of interpretation. What does this visualization mean, if it means anything at all? This graph might show quite a minor point, but then it depends how much meaning you read into the use of different words. It certainly means I need to rethink my ideas – I have been using ‘seaman’ as a shorthand for the professional stereotype of seafarers, on the assumption that this was how contemporaries would have described them, but now it seems that was not the case. Explaining the trends revealed in the graph will require working between the statistics and the original documents. What sort of texts do these words appear in? Pamphlets, newsbooks, official proclamations? Who is using these words, and why? I would guess that the 1580-1600 rise is due to war between England and Spain, and that the shift from ‘mariner/mariners’ to ‘seaman/seamen’ at the end of the period is due to the increasingly public debates at that time about recruitment into the growing Royal Navy. When you plot ‘mariners’, ‘seamen’ and ‘navy’, the percentage for ‘navy’ (the green line on the graph) rises after 1620, and ‘seamen’ begins to rise a little later, with both continuing to increase from about 1680 onwards. What exactly the relationship is between these two linguistic changes, and whether they have any deeper significance, is something I will have think about.