Historyonics: 2012

Preface

The talk that forms the basis for this post was written for the annual Gerald Aylmer seminar run by the Royal Historical Society and the National Archives, and was delivered on 29 February 2012. The day was given over to a series of great projects, most of which came out of historical geography, and I was charged with providing a capstone to the event, and presenting a more general overview of the relationship between history and geography. There was a good audience of academics, archivists and librarians, all with a strong digital cast of mind and I very much enjoyed it. Unusually, I am also pretty sure I still agree with the majority of it even some six months after I sat down to write it. At the same time, I have put off posting it until now because it just did not quite feel like a blog post - too much history, too much text, too much of an internal discussion among academics. I have also recently found myself wary of blogging, having discovered (who knew?) that blogs form a sort of publication that people occasionally read. But, the work of people like Andrew Prescott has also reminded me of just how important it is to continue having the discussion. In the nature of a public talk, the text is rather informal, the notations slapdash, and the links non-existent.

Place and the Intellectual Politics of the Past

Currently there is a rather wonderful raproachment between historical geographers and historians; with archivists and librarians (as usual) providing the meat, gristle and spicy practical critique. This is brilliant. These are cognate disciplines which need to be in constant dialogue. The habits of mind and analytical tools of geographers need to inform our understanding of the past; while the mental ticks of the historian, and the authority of history as a literary genre, are necessary tools for communicating all kinds of memory to a wider audience.

Frustratingly, over the last half century, this cross-fertilisation hasn’t always flourished. Geography Departments (and even historical geographers) have not done a lot of talking to history departments, and vice versa. Many geographers, through the 70s and 80s, in particular, became ever more engrossed in the technical manifestations of their field; while many historians have fallen for the joys of the linguistic turn.

In part, this has been about funding and the structures of higher education. In the great taxonomy of knowledge inherent in the creation of the notion of STEM subjects vs the Humanities, Geographers naturally gravitated towards the areas with more secure funding. While historians, frustrated by the changing politics of their field – the collapse of ‘modernity’ as a form of historical explanation (both right looking and left looking) – pursued theory, gender and language, to the exclusion of positivist measurable change – they chose places of endless debate and disagreement – usually in the form of a various versions of identity politics - as a route to a politicised audience. Similarly, whereas historical geographers looked to Europe and a thriving institutional base; historians tended to more frequently look westward to North America, where historical geography is almost unknown – or at least denied the security of an extensive network of independent academic departments.

All of which is simply to say, that we are confronted with two fields that should be in constant dialogue, but which simply have not passed much more than the odd civil word in the last few decades. In the process, they have developed different technologies of knowing and different systems of training and analysis. It strikes me that an event celebrating Gerald Aylmer’s cross genre engagement with history and archives, with the structures of the archive, and the stories that can be told with them, is just the right place to bring these disciplines back onto the same map, or at least to reconsider where in a rapidly changing technical environment, that necessary dialogue might take place.

Of course, for most of the last decade or so we have had the ‘spatial turn’ in history; and for longer than that, the creation of a post-modern geography. Historians have struggled to define ‘space’ and context in ever more material (if still rather flabby) terms; while some historical geographers have taken theories of discourse and language seriously and extended them to the clear air enclosed by the mind-forged boundaries symbolically represented on every map.

But this has been little more than a casual rapprochement – driven largely by academic fashion; and has not fundamentally changed the centre ground of either discipline. Most historians still trade in text mediated by uncertainty and theory; while historical geographers, strive to tie data to a knowable and certain fragment of the world’s surface.

What I want to suggest today, is that something rather more profound than the ‘spatial turn’ is also happening in the background, and that it promises to force these disciplines (and several others) back into a more direct relationship.

And this change, this possibility, is being driven by technology; both in the form of the ‘infinite archive’ – the Western Text Archive second edition; and also through the direct public access to a newly usable online version of GIS-like tools, in the form of Google Maps and its many imitators.

To deal with historians and the infinite archive first - I don’t think that historians have quite twigged it yet – though librarians and archivists certainly have - but the rise of the ‘infinite archive’ has fundamentally changed the nature of text. It has turned text in to ‘data’, with profound implications for how we read it, and deploy it as evidence. My guestimate is that between fifty and sixty per cent of all non-serial publications in English produced between Caxton and 1923 – between the first English press and Mickey Mouse – has been digitised to one standard or another; with a smaller percentage of serial materials thrown in.

This has ensured that the standard of scholarship has in many ways improved. It is now possible to consult a wider body of literature before setting out to analyse it. But, it has also pushed us to the point where it is no longer feasible to read all the material you might want to consult in a classic immersive fashion. Instead we are moving towards what Franco Moretti has dubbed ‘distant reading’, and towards the development of new methodologies for ‘text mining’ – or the statistical analysis of large bodies of text/data. Stephen Ramsay’s new book, Reading Machines, illustrates four or five examples of what he describes as a new form of ‘Algorythmic Literary Criticism’, but is simply a taster for a wider series of practical methodologies. That Tony Grafton, president of the American Historical Association could recently and hyperbolically claim that textmining was simply the ‘future’ of history, and that it was already here; reflects a truth most digital humanists (whose ranks are dominated by librarians and archivists) have been struggling with for the last three or four years; but which most historians are only now becoming aware of.

The Google Ngram viewer, which allows you to rapidly chart the changing use of words and phrases as a percentage of the total published per year is just the most high-profile online tool in a wider technological landscape. The associated, and ill-named, ‘culturomics’ movement being built on the back of the ngram viewer is another. I love the ngram viewer, and spend my Sundays charting the changing use of dirty words decade by decade. But it also forms the basis for a newly statistical approach to language. And lest we forget, recorded language is the only evidence most historians use.

I am not entirely convinced by the culturomics work, which focuses on describing social and linguistic change through consistent mathematical formulae, but in related studies by people such as Ben Schmidt, Tim Sherratt and Rob Newman, one can find the beginnings of a pioneering analysis of large scale texts that promise to remodel how we understand cultural change, and the relative influence of events.

These graphs simply illustrate that the word ‘outside’ both grew in commonality over the course of the nineteenth century (perhaps understandably as people’s lives migrated inside); and that if this phenomenon were a reflection of naturally evolving language (embedded in people’s vocabulary in youth), its adoption according to the age of the authors whose work has been published, would look like the first graph; but that in fact it looks like the second. In other words, these visualisations created by Schmidt suggest a history of the adoption of the use of the word ‘outside’ in response to events; and in the process give us a way of measuring the cultural impact of specific happenings – its import as measured by individual responses to them.

Or look at Tim Sherratt’s visualisations of the use of the terms ‘Great War’ and ‘First World War’ in 20^th century Australian newspapers. While entirely commonsensical, the detailed results of the 1940s in particular mark out the evidence for a month by month reaction to events; allowing both more directed immersive reading (drilling down to the finest detail), and a secure characterisation of large scale collections.

Or to bring this back to a British perspective, we can look at work Bill Turkell and I have done on the Old Bailey Online – simply charting the distribution of the 125 million words reported in 197,000 trials, to analyse both the nature of the Old Bailey Proceedings as a publication, and their relationship to words spoken in court – in this instance to illustrate, among a few other things, that serious crimes like killing were more fully reported than others in the 18^th century proceedings, and that this pattern changed in the 19^th century.

This stuff works and is important; and will necessarily form a standard component of the research of anyone who claims to understand the past through reading.

But it points up a further issue. If each paragraph in the infinite archive, all the trillions of words, is simply a collection of data, it immediately becomes something that can be tied to a series of other things – to any other bit of data. A name, a date, a selection of words, or a phrase, or most importantly in this context, a place – defined as a polygon on the surface of the earth. In other words, the texts that form the basis for western history can now be geo-referenced and tied directly to a historical/geographical understanding of spatial distribution, which can in turn be cross analysed with any other series of measures of text – textmining makes text available for embedding within a geographical frame.

I can’t emphasise this enough: the creation of a digital edition of the western print archive means that it can be collated against all the other datasets we possess. The technology of words, and how we engage with them, has changed; creating a new world of analysis. With a bit of Natural Language Processing, and XML tagging; and a shed load of careful work, a component of text that hitherto has been restricted to human understanding becomes subject to precise definition: “he walked for twenty minutes from St Paul’s westward, coming first to Covent Garden, and then onwards to Trafalgar Square’, changes from a complex narrative statement reflecting an individual’s experience, into three individual locations, a journey’s route, and a rate of travel; each capable of being expressed as a polygon, a line, a formulae – a specific bit of translatable data.

What I want to say next might not sound quite right in this context. But, I am hoping this development of text as data, and by extension, text tied to place, will have a more profound impact on our understanding of the past precisely because, for the most part, it has not emerged from historical geography. It has been driven by people interested first in text, and only then, in data such as place.

As a long-time admirer of the work of historical geographers, and avid reader of it; I believe that the rise of a highly sophisticated form of desktop GIS, requiring substantial training and expertise to make work, has contributed to the evolution of a widening gap between disciplines, and has in some ways distanced historical geographers from the kind of audience historians have traditionally courted. The rise of the geographer as ‘expert’ has been both impressive and excluding.

But, in the last few years a real alternative has emerged. I understand geographers are sniffy about it, and I know full well that it doesn’t provide the kind of powerful analytical environment that a fully functioning GIS Editor, Analyst and Viewer package can generate in combination with a Spatial database management system. But it is usable and it is continually getting better. And I mean Open Street Map, Google Earth and Google maps, and the range of open source browser side services that build on it, like BatchGeo.

Together they make available to everyone, a good and growing proportion of the tools previously only available to a technocratic elite. In the process and in combination with the transition of text in to data; we are suddenly in a position to do something different.

My favourite exploration of what can actually be done online in an intuitive and accessible way, is Richard Rodger’s collaborative project with the National Library of Scotland: Visualising Urban Geographies, and the associated Addressing History sites:

The important thing about these projects is that they allow a wider audience to use historical maps in the way a historical geographer would, and to upload their own KLM files, and to explore the data, and relate it to a modern map. It is historical geography made user friendly. And that is important.

I am also a great fan of the New York Public Library Labs, Map Rectifyer Project, which crowd sources the kind of warping of maps that people just could not previously do.

And more recently, the British Library’s adaptation of the same methodology to their own map collections.

As much as anything projects like these educate a wider public (including historians) about the methodologies and issues traditionally faced by historical geographers, and generally hidden behind a beautifully presented set of final maps designed to make a point, rather than to allow an intellectual journey. These sites form open invitations to discover all the issues associated with the underlying maps and all the problems with the data.

Together, text as data and user friendly GIS make it newly possible to imagine an environment in which geographical information, and display, form a natural and unproblematic component of every other analytical process. It makes possible a situation in which historians cease to be mere text merchants, obsessed with the perfect quote, and compelling (if largely un-evidenced) argument; and where geographers have a new access to the subtle mappings of the marks of ‘culture’ in its broadest sense – a new way of thinking about the geographical distribution of behaviours and ideas, that bring within a geographical fold questions traditionally preserved for others.

By extension, In other words, I want to suggest that it is very much the moment for a bonfire of the disciplines, and that while history and geography can now begin to speak in new terms, the same forces are also making it possible for literature, and art history; for all the disciplines of memory and explanation, to speak in new ways to each other. We quite suddenly share a new culture of data – and data can be translated.

I will return to both these developments in a minute. But, by way of illustrating the kinds of things that we are now able to do as a result of this newly open and analytical framework – through the mash-up of text and space - I want to spend a little time discussing a project that Bob Shoemaker, Matthew Davies and I, and a large team of other people, recently completed, called Locating London’s Past. Please excuse me for spending the next couple of minutes on something that sounds a little bit too much like ‘me and my database’ to be entirely appropriate.

In itself, Locating London’s Past is not particularly important, but it illustrates one naïve attempt to play with these new possibilities; to take text/data and accessible online GIS, and make something that facilitates mapping words.

This project grew from the rich soil that is failure. Five or six years ago, as a final component of the original Old Bailey project, we struggled to incorporate a mapping feature on to the site that could be delivered online. But in the last few years, we realised something was changing; and inspired in particular by the Edinburgh project, we decided to try again. The outcome - Locating London’s Past – does three things that are new. First, it makes available a fully rasterised and warped version of both John Rocque’s 1746 map of London; and the first ‘accurate’ OS map of the capital created between 1869 and 1880 – both of which have been fully ‘polygonised’, and related to a modern Google maps representation of London. And second it brings together around 40 million words of text, and a raft of established datasets – a couple of hundred million lines of data - in a newly geo-coded form that can be ‘mapped’ against both area and local population, at the level of streets, parishes and wards. And finally, it relates both these resources to the first comprehensive, parish level population estimates for the 18^th century.

In the process it brings together text and maps in a new way, delivered in a cut down Google maps container that even a historian can understand. For the maps and GIS, we turned to Peter Rauxloh of the Museum of London Archaeological Service (MOLA), who worked with scans and an index of place names drawn from Rocque’s map created by Patrick Mannix, to develop the kind of resource that underpins the best sort of traditional desk bound GIS project.

The 24 sheets of the original map were turned in to a single image, and then warped onto the first reliable Ordinance Survey map from 1869-1880, creating a direct geo-referenced relationship between the first accurate modern representation of London and Rocque’s eighteenth-century version.

Rocque 1746 After Georeferencing.

The geo-referencing operation involved identifying some 48 common points between Rocque's original map and a modern OS map; leaving us the task of defining all the streets, courts, parishes and wards that made up 18^th century London. In the end, this amounted to some 29,000 separate defined polygons.

Parish boundaries which intersect with the street of Cheapside, London.

Completed street network for main area covered by Rocque's map.

Street lines expanded to polygons based on recorded width.

All of which gave us something rather cool – a proper, interactive and accurate map of 18^th century London, that among a lot else, let’s you go from here:

To here:

And more importantly, lets you go here – all those parishes securely defined:

And all those streets and cul-de-sacs:

In the process it makes, each parish and street, ward and cul-de-sac newly available as an analytical category – defined as a specific area, and location – defined in terms of its distance from any other place, and the route between them, its size and importance in a hierarchy of streets – and defined securely against the earth’s surface.

All of which left us with just one more task – the, to us, more familiar job of providing the text/data to put into these analytical polygons.

And for that, we brought in the material available from the Old Bailey online for the 18^th century crime, from London Lives, fire insurance records, voting records for Westminster, Hearth Tax returns and plague deaths; and finally a bunch of archaeological material from Mola.

Along with the more structured data, we ended up with a couple of hundred million words of text, primarily reflecting crime and events – descriptions of behaviours given under oath to magistrates, in court, at sessions and before a coroner; which we then processed using a combination of automated methodologies, including Natural Language Processing, and manual checking, to identify some 4.9 million place-name instances, each tied to its own polygon.

All of this data was then made available for search and mapping – including both structured and keyword searches – so both the ability to search on the crime of ‘murder’, and the word: teapot.

Inevitably, there are problems with the data and the map; but, it nevertheless allows us to map things like the number of small houses in the 1690s – defined as having one or two hearths as recorded in the Hearth Tax Returns.

Or more contentiously, to map the distribution of suicide cases in the coroners’ inquests found by a keyword search of 5000 inquests on ‘felo’ – as in felo de se – and ‘suicide:

Or the distribution of the mention of a horse, mare or gelding, in the Old Bailey.

We have not even begun to explore what the data tells us, nor was this created in the expectation that we would be able to do so – but the important thing is that it does allow us and everyone else, to explore this material in a new way – and do so quickly enough to facilitate the testing of new hypotheses, and random midnight thoughts. And to quickly test words against spaces, text/data against spatial data.

Of course, all of this is contentious, and I suspect will leave historical geographers rather dissatisfied. The original data is variable, the percentage securely geo-referenced is inconsistent and I am waiting for a few proper demographers to critique the population figures. As worrying, the data is not currently available for the more subtle analytical approaches that have been so fruitful in historical geography. We can’t easily define networks, for instance. In other words, this is a rough starting point, and all the critical skills of a true sceptic are needed when using it. But this site does allow us to play with all this data in a new way, and to come up with insights and hypotheses for further investigation. To, for instance, map all the instances of the words for the industrial colours of ‘blue, red and yellow’ against the natural hues of ‘brown and green’ to explore an urban environment and to suggest different ways of thinking about a wider cityscape.

It also means that we have 40 million words of geo-referenced text that we can use as the basis for a new kind of text mining – that incorporates space with linguistic change; and which will add to the geographers’ toolbox all the rather wonderful methodologies of corpus linguistics: Measures of Text Frequency and Topic Modelling to name just two.

We are, of course, nowhere near where we really want to be. For that, we will need to have a lot more text, and a lot more subtlety. I want to be able to map all the places in a newspaper by subject and category of article – to have a scrolling representation of places mentioned in a text as I read (either immersively or distantly). I want to be able to use corpus linguistics, semantic search and syntactic analysis (ontologies and all the methodologies designed for text/data) in combination with both secure place name data, and historically sensitive boundary and population data. There is not much point in comparing 18^th century text with the modern road network or county boundaries; or wondering why there is not a lot of text coming from Greenland in the absence of population density figures. I want to be able to map networks defined by individuals, defined in turn by the words they use; and networks defined by geographical measures such as road width. What percentage of London was made up of parkland at different stages? And what words and crimes are dominant in those different parks (Hyde Park vs Moorfields?). And as importantly, I want to be able to test the results against secure measures of statistical significance.

All of the components to make this happen are in place – we all now work with data and data is interchangeable – subject to unending automated translation - making the main technical hurdle essentially unproblematic. But there is still a long way to go. And there is also a clear and present danger in the process. And it is that danger, that I now want to turn to.

I spent last year co-directing one geographical project – Locating London’s Past - and one text mining project – Datamining with Criminal Intent. Both projects were intellectually engaging beyond measure. I learned more about history – and sources I already thought I knew well - doing something else with them, than I could possibly have done in a year of reading. But I also found myself struggling against the run of the data I was helping to produce. I came to count myself among those who Lewis Mumford had in mind in 1962 when he warned urban geographers that:

‘… minds unduly fascinated by computers carefully confine themselves to asking only the kind of question that computers can answer and are completely negligent of the human contents or the human results.’ Lewis Mumford, “The Sky Line "Mother Jacobs Home Remedies",” The New Yorker, December 1, 1962, p. 148

In other words, I found myself limping uncomfortably towards a positivist abstraction in which there were few people, but much data; beautiful graphs and compelling trends, but few of the moments of empathetic engagement that make history so powerful and which form a little discussed component of its authority as a genre of literature.

So, at this point, having waxed on the joys of technology and what it allows you to do, I want to stand back for a minute and remember the individual in the landscape. And in this instance, just one individual – a man named Charles McGee or Mckay, who stood just here for over forty years, from at least 1809, until his death in 1854; making a living as a one-eyed crossing sweeper – a black Jamaican refugee from Britain’s wars of colonial expansion:

Or to put it differently, he stood just here, on a map created just before he arrived:

Or, for a map that should have included him, here:

Or if we want to get down to street level, just here – the obelisk he stood in front of, itself visible on the map:

MacKay became a part of the image of this cityscape almost immediately. William Bennet was the first to record his presence, placing him before the obelisk dedicated to John Wilkes that stood at the top of New Bridge Street:

He was already missing one eye, but had not yet started sporting his shock of white hair:

A year later, in 1810, he McKay was still there:

As he was when Ackerman published the same vista in 1812. Recognisable, even though his faced has been scratched white by some later owner of this image, clearly made uncomfortable by his presence in the landscape:

Five years later, in 1817, John Thomas Smith, the keeper of prints and drawings at the British Museum, gave us our first detailed portrait, and our first biography.

Smith claims Mckay, or McGee as he styles him, was already old beyond credibility in 1817, though another account would put his age as 50 in that year. His hair ‘almost white’, was tied back in a tail and Smith firmly locates him at his ‘stand… at the Obelisk, at the foot of Ludgate-Hill’. He also claims (as do most commentaries on well known street figures), that he was secretly wealthy; and that he attended Rowland Hill’s Methodist Tabernacle on Sundays; that he was lately seen wearing a ‘smart coat’ the gift of a city pastry chef, and finally that his portrait, made in October of 1815, hung in the Twelve Bells public house on Fleet Street – around the corner from the obelisk.

Two years later, George Cruickshank, includes him, broom in hand, with Billy Waters, King George III, and a host of abolitionists, in the ‘The new Union-Club’.

And again, in 1821, in his depiction of Tom and Jerry, ‘Masquerading it among the Cadgers’:

And finally, in the same year, Cruickshank includes McKay in his ‘Slap at Slop’, suggesting along the way that McKay was involved on the edges of radical London:

And so to John Dempsey’s portrait from sometime in the 1820s – which seems to me to speak of a man and a place, of a life lived in a landscape, more powerfully than any other.

At the beginning of the next decade, he was still there, depicted this time, from a different perspective:

And three years later, Mackay also became the model upon which Charles Matthews based his depiction of a modern Othello in ‘the Moor of Fleet Street’, first performed to disastrous reviews, at the Adelphi in 1833.

In the play, Mackay is depicted as engaged in a battle of jealousy and rage among the low characters of London; and is described as ‘the Moor who for many a day hath swept Waithman’s crossing over the way’ from Ludgate Hill.[i] His spotted red bandana, clearly visible in Dempsy’s depiction, invested with gypsy lore, and gypsy power, to ‘keep woman honest, or cure the worst cold’, and given a history steeped in London’s boxing lore, and serving in the play, the role of Desdemona’s lost handkerchief.

The best account of his later life is from Charles Diprose’s authoritative history of St Clement Danes, where McKay lived, off Stanhope Street. Diprose describes McKay as ‘a short, thick-set man, with his white-grey hair carefully brushed up into a toupee, the fashion of his youth; … he was found in his shop, as he called his crossing, in all weathers, and was invariably civil. At night, after he … swept mud over his crossing… he carried round a basket of nuts and fruit to places of public entertainment...’ And according to Diprose, ‘He died in Chapel Court, St Giles, in 1854, in his eighty-seventh year.’ A later historian, William Purdie Treloar claims McKay was then replaced at his stand by a drunken soldier who ‘sometimes made 8s to 10s a day’, and drank as much each evening.

All of which is simply to say, that some people stand in the same place longer than many buildings; and have a greater right to appear on a map, than many landmarks. As we move towards that new data rich environment of text/data and intuitive GIS; as naïve historians and the wider public, come to use the ideologically laden genre that is a map as an interface for trillions of words of text; and as they step back from their own text to view text/data from afar, I just think it is important to remember that landscapes and cityscapes only exist between the ears of their denizens – that we cannot map the subtleties of Ludgate Hill and New Bridge Street without trying to know Charles Mackay. With Lewis Mumford, we need to ensure that we are not ‘completely negligent of the human contents or the human results’ of asking the questions only computers can answer.

[i] Note that Waithman is a wealthy linen draper with a shop on Fleet Street. His daughter is reputed to have been especially kind to McKay, and to have received a legacy from his on his death of £7000. Treloar gives a more detailed account of Waithman’s role as alderman and MP, and suggests his daughter regularly took out soup and warm food to McKay, p.124.

Historyonics

Monday, 29 October 2012

A Five Minute Rant for the Consortium of European Research Libraries

Wednesday, 11 July 2012

Place and the Politics of the Past

Labels

About Me