Counting Historians

Recently, for the first time and with the help of patient friends (especially Simon Abernethy and Edmond Smith), I’ve been working on some research that involves quite a bit of statistical analysis. As a result I have started thinking more about this kind of approach to other problems and so, when I came across the Data for Research page of the journal website JSTOR, it seemed well worth investigating.

DfR allows you to run searches, as you would when looking for articles on the main JSTOR site, but as well as providing a list of search results, it then lets you visualise those results and, more usefully, export them in a variety of formats, depending on which search criteria you are most interested in. For example, setting the subject as ‘History’ and searching for ‘mariner’ (of course I’m going to search for ‘mariner’) produces this graph:

Appearance of 'mariner' in JSTOR Data for Research, subject 'History'.

Appearance of ‘mariner’ in JSTOR Data for Research, subject ‘History’.

Exporting the data means you can then analyse it yourself in the format of your choice. I thought I’d try this out by looking at a specific issue of historical terminology: what exactly we should call the conflicts in Britain and Ireland during the 1640s and 1650s. For a long time, this was traditionally called ‘the English civil war’ or ‘the English revolution’, the latter carrying rather more political connotations. However, at the end of the twentieth century historians began to give more attention to the relations between the different kingdoms ruled by Charles I. War broke out in Scotland and Ireland before it began in England; Scottish and Irish soldiers fought in England; and after the conflict in England ended, the new parliamentarian regime invaded both Ireland and Scotland. ‘English civil war’ doesn’t really cut it to describe all of these events. Two terms have since emerged amongst historians favouring a broader understanding of these events: ‘the British civil wars’, and the rather less wieldy ‘wars of the three kingdoms’.

Searching for these terms using DfR, exporting the results in CSV format, and putting it all together in Excel gives you this graph (which is only a continuous series from 1900 onwards):

Appearance of terms in JSTOR Data for Research, subject 'History'.

Appearance of terms in JSTOR Data for Research, subject ‘History’.

You can see how the new phrases arrived on the scene in the 1990s but, it would appear, haven’t made all that much of a dent. Taking just the results from 1990 onwards shows this in a bit more detail:

Appearance of terms since 1990 in JSTOR Data for Research, subject 'History'.

Appearance of terms since 1990 in JSTOR Data for Research, subject ‘History’.

Things can move pretty slowly in scholarship anyway, so it’s not all that surprising that the older and more familiar terms have remained in widespread use. There is no way to be certain about the most recent developments, either, because of the steep drop in JSTOR articles in the subject ‘History’ since 2006, which could be due to a delay in uploading new items or perhaps to journals moving away from JSTOR to other platforms. You can see the decline clearly in this graph (clicking on it will take you to the DfR search, where you can zoom in on the relevant years):

Articles since 2000 in JSTOR Data for Research, subject 'History'.

Articles since 2000 in JSTOR Data for Research, subject ‘History’.

Nevertheless, it seems that the arguments by some historians for a ‘British’ interpretation of the mid-seventeenth century have not been widely accepted. If we use DfR to look at journals instead of change over time, the same impression emerges. The term ‘English civil war’ appeared in 164 different journals; ‘English revolution’ in 156; ‘British civil wars’ in 29; ‘wars of the three kingdoms’ in 24. This could be due just to the relatively recent appearance of the new terms. Looking at the 24 journals in which ‘wars of the three kingdoms’ does appear, they include well-known history publications which regularly feature early modern essays, like Journal of British Studies, The American Historical Review, The English Historical Review, The Historical Journal, and Past & Present, but the two newer terms appear in very small numbers. Together they made up more than 10% of the total appearances of all these terms in only seven journals. Four of these seven focus upon Scotland and Ireland – Archivium Hibernicum, Analectia Hibernica, History Ireland, The Scottish Historical Review – and the other three are The Journal of Military History, Transactions of the American Philosophical Society, and The International History Review. It looks like this new interpretation has gained most traction amongst those interested in Ireland and Scotland, has made a little headway with historians who write about the period, but has gone pretty much unnoticed by everyone else.

These conclusions are fairly predictable, so is this kind of approach all that useful, and do the results matter? Well, in this specific case it reminds us of the need to continue drawing attention to the British dimension of the mid-seventeenth century crisis – even amongst historians and academics, let alone in public discussion – and if you just read the specialist journals this broader context would be missing. Generally, ‘big data’ is something more and more historians are flirting with or at least talking about, and online tools like DfR and Early Modern Print (about which see this post) can only make it easier to do for those of us who aren’t statistically savvy or digital experts. Naturally, there are problems. Search results are confined to the digital corpus (although as JSTOR’s articles include book reviews, I reckon DfR is pretty good at tracking trends in what historians are writing about, at least in the languages featured on the site). More fundamentally, the frequency of certain phrases can only tell us so much about what scholars are researching. In other words, it’s a pretty blunt analysis. Still, as problematic as it is, I think it has a lot of potential as a complementary alternative to the close reading that is the conventional training of most history students.


5 thoughts on “Counting Historians

  1. Hello – I’m Kristen Garlock, part of the Data for Research team at JSTOR. It’s great to see use cases for DfR, and we really enjoyed your post. I can shed a little more light on the drop off of articles after 2006. There are a few things that can impact the availability of more recent data. The biggest factor is the JSTOR “moving wall.” Since JSTOR began as an archival project to digitize journal backfiles, it’s always had a concept of a gap between where our coverage ends and the date of the most recent issue. This can be anywhere from 0-7 years, but most gaps are in the 2-5 year range. In these cases, we add another year of content to the backfile in January and the available content increases.

    However, it is also the case that the content on DfR lags a bit behind the content on the main platform. Right now, it’s about several months behind. Only a handful of journals have ceased participation in JSTOR over the last 20 years, so that’s unlikely to have a significant impact on the DfR data. Since the addition of the Current Scholarship program, we have actually been adding more current issues (so far, for about 300 journals out of 2,000) and that content can be explored on DfR.

    I hope this is helpful. Just let us know if you have any questions. We are actually preparing for an overhaul of DfR infrastructure and interface beginning later this year, and are eager for feedback to incorporate into our planning work. One of the things we hope to do is to reduce or eliminate the lag time in updating the content available via DfR.

    • Hi Kristen, thanks for the comment! My wondering about journals leaving JSTOR was pure speculation, so it’s encouraging to hear that’s not the case.

      I do have one thought on DfR; perhaps I just failed to figure this out, but I couldn’t run different search terms simultaneously to compare with one another (as you can in the Google Ngram viewer) or set more than one journal in a search. Obviously, you can export the data and make those comparisons, but it would speed things up if there were some comparative function on the site. If there is and I missed it, then forgive me!

  2. That’s corrrect, unfortunately, DfR doesn’t support that type of visualization at the moment. I sent a note to the developer and we’ll be sure to track this in our list of improvements for 2015. Thanks for the feedback!

  3. I’m a bit late to this, but thanks for bringing the JSTOR data tool to my attention. I hadn’t come across it before and it is great fun. As you say, it doesn’t help much for the most recent years, but it is interesting for getting a sense of 20th century trends.

    The lack of use of anything except ‘English civil war’ and ‘English revolution’ is a bit surprising: to tell you the truth I’m surprised ‘English civil war’ could even make it through peer-review these days. I’m a little disappointed that ‘Great Rebellion’ turns up so rarely.

    I tried a couple queries that seem to reflect historiographical trends rather well. In history journals, ‘working-class’ rises steady from 1945 before dropping precipitously in the mid-1990s. I wonder how much of that is due to the fall of the USSR and how much due to the rise of the culture wars? On the other hand, the rise of ‘the people’ seems to begin at about the same time and doesn’t suffer the same drop in the 1990s.

    • Thanks Brodie! Intriguing to see ‘working-class’ disappearing in the 1990s – I wonder if the same thing happened with ‘middle-‘ and ‘upper-class’ (and how they compare with the trajectories of terms like ‘plebieans’, ‘proletariat’, ‘bourgeois’, ‘aristocracry’ and so on)?

      The resilience of the ‘English civil war’ is indeed disappointing, and even if we were to suppose a more radical shift hidden in the lack of data for the last four years, it would still be pretty late in coming. It would be interesting to break the stats down even further and see whether the terms appear in book reviews or articles. For example, three relatively recent books by Mike Braddick (2008), John Miller and Blair Worden (both 2009) all have an English focus and title, although they also use the plural ‘wars’. Miller wrote ‘I concentrate unashamedly on England’, and in all three the ‘British’ dimension is introduced only where it bears upon England (though it appears more in Braddick’s than the other two).

      These three were all aimed at bridging academic/popular markets – both Miller’s and Worden’s are relatively brief, introductory overviews – so it’s not surprising that they or their publishers went with phrases that are familiar to more people. They may not actually represent academic attitudes, and perhaps the new terminology has broken through in articles but not in book titles or book reviews. Yet the example does highlight the difficulty in shifting such embedded ideas. A small group of specialists may have enthusiastically jumped onto the new bandwagon, but it seems to be passing by everyone else.

      One thing that emerges from all of this is that once you start playing with statistics in this way, there’s always another query that springs to mind…

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s