Recently, for the first time and with the help of patient friends (especially Simon Abernethy and Edmond Smith), I’ve been working on some research that involves quite a bit of statistical analysis. As a result I have started thinking more about this kind of approach to other problems and so, when I came across the Data for Research page of the journal website JSTOR, it seemed well worth investigating.
DfR allows you to run searches, as you would when looking for articles on the main JSTOR site, but as well as providing a list of search results, it then lets you visualise those results and, more usefully, export them in a variety of formats, depending on which search criteria you are most interested in. For example, setting the subject as ‘History’ and searching for ‘mariner’ (of course I’m going to search for ‘mariner’) produces this graph:
Exporting the data means you can then analyse it yourself in the format of your choice. I thought I’d try this out by looking at a specific issue of historical terminology: what exactly we should call the conflicts in Britain and Ireland during the 1640s and 1650s. For a long time, this was traditionally called ‘the English civil war’ or ‘the English revolution’, the latter carrying rather more political connotations. However, at the end of the twentieth century historians began to give more attention to the relations between the different kingdoms ruled by Charles I. War broke out in Scotland and Ireland before it began in England; Scottish and Irish soldiers fought in England; and after the conflict in England ended, the new parliamentarian regime invaded both Ireland and Scotland. ‘English civil war’ doesn’t really cut it to describe all of these events. Two terms have since emerged amongst historians favouring a broader understanding of these events: ‘the British civil wars’, and the rather less wieldy ‘wars of the three kingdoms’.
Searching for these terms using DfR, exporting the results in CSV format, and putting it all together in Excel gives you this graph (which is only a continuous series from 1900 onwards):
You can see how the new phrases arrived on the scene in the 1990s but, it would appear, haven’t made all that much of a dent. Taking just the results from 1990 onwards shows this in a bit more detail:
Things can move pretty slowly in scholarship anyway, so it’s not all that surprising that the older and more familiar terms have remained in widespread use. There is no way to be certain about the most recent developments, either, because of the steep drop in JSTOR articles in the subject ‘History’ since 2006, which could be due to a delay in uploading new items or perhaps to journals moving away from JSTOR to other platforms. You can see the decline clearly in this graph (clicking on it will take you to the DfR search, where you can zoom in on the relevant years):
Nevertheless, it seems that the arguments by some historians for a ‘British’ interpretation of the mid-seventeenth century have not been widely accepted. If we use DfR to look at journals instead of change over time, the same impression emerges. The term ‘English civil war’ appeared in 164 different journals; ‘English revolution’ in 156; ‘British civil wars’ in 29; ‘wars of the three kingdoms’ in 24. This could be due just to the relatively recent appearance of the new terms. Looking at the 24 journals in which ‘wars of the three kingdoms’ does appear, they include well-known history publications which regularly feature early modern essays, like Journal of British Studies, The American Historical Review, The English Historical Review, The Historical Journal, and Past & Present, but the two newer terms appear in very small numbers. Together they made up more than 10% of the total appearances of all these terms in only seven journals. Four of these seven focus upon Scotland and Ireland – Archivium Hibernicum, Analectia Hibernica, History Ireland, The Scottish Historical Review – and the other three are The Journal of Military History, Transactions of the American Philosophical Society, and The International History Review. It looks like this new interpretation has gained most traction amongst those interested in Ireland and Scotland, has made a little headway with historians who write about the period, but has gone pretty much unnoticed by everyone else.
These conclusions are fairly predictable, so is this kind of approach all that useful, and do the results matter? Well, in this specific case it reminds us of the need to continue drawing attention to the British dimension of the mid-seventeenth century crisis – even amongst historians and academics, let alone in public discussion – and if you just read the specialist journals this broader context would be missing. Generally, ‘big data’ is something more and more historians are flirting with or at least talking about, and online tools like DfR and Early Modern Print (about which see this post) can only make it easier to do for those of us who aren’t statistically savvy or digital experts. Naturally, there are problems. Search results are confined to the digital corpus (although as JSTOR’s articles include book reviews, I reckon DfR is pretty good at tracking trends in what historians are writing about, at least in the languages featured on the site). More fundamentally, the frequency of certain phrases can only tell us so much about what scholars are researching. In other words, it’s a pretty blunt analysis. Still, as problematic as it is, I think it has a lot of potential as a complementary alternative to the close reading that is the conventional training of most history students.