In recent years I find myself using the terms Stuff and Dead
People in talks and titles more and more. And as a historian I find
myself conceptualising my work as being about Stuff inherited from Dead
People. Both expressions just sound right. But it occurs to me
that while I have a relatively clear sense of what I am intending to convey
when I use these terms, their meanings might not be entirely apparent to
others. For this reason I thought I would have a stab at providing a
couple of definitions, and a brief explanation of why I find these terms so useful
In my usage Stuff encompasses all the different varieties of
artefact that can be used in practising history. The term is in some
respects an attempt avoid saying that our object of study is text or image, the
manmade landscape or a piece of furniture, or indeed even data in its broadest
form. Instead, the use of Stuff is intended to signify that my
practise as a historian actively seeks to make use of all of these
things. In terms of an epistemology, it is an attempt to distance myself
from the categories of knowing that I (we) have inherited. Stuff denies
the taxonomies of knowing that define a museum object as being different to a
pamphlet; a hedgerow different to a teapot. In part this usage
reflects a profound disillusion with the narrow practise of textual comparison
that lies at the heart of the Rankean tradition of historical analysis; but it
is also a recognition that new technologies allow us to encompass new types of
evidence in new ways. When all Stuff is data it can be
interrogated across boundaries that seemed natural and unbreachable just a few
decades ago (between a hedgerow and a teapot). And while data itself is also a
form of Stuff, and the transition from varieties of stuff to data is itself
a process of creating a new taxonomy, there remains a rather wonderful
transition involved. There is an opportunity to rethink the
meanings of Stuff, and without a new vocabulary it is all that much more
difficult to do so.
In other words, Stuff is a simple rejection of post-enlightenment categorisation.
In some respects Dead People serves a similar function. The use
of Dead People avoids the traps of both identity and social modelling;
while at the same time giving some shape to the object of historical study
(human culture in the past). Ironically some Dead People are still
alive. Henry Kissinger is apparently still breathing, but is nevertheless
a figure of substantial historical analysis. In my view he is undeniably Dead
People. At the same time, because cultural history seems to take
longer to turn journalism in to books, Amy Winehouse and Michael Jackson may be
dead, but they are not yet Dead People.
The term Dead People implies a refusal to describe the people of past
as men or women, workers or citizens, artists or authors. And in doing
so, like Stuff, is used to signal that I do not find the traditional
categories and boundaries that comprise social science very helpful.
Stuff we inherit from Dead People is my object of study as a
historian.
One could convey these ideas using other words. The results
might be a bit long winded, but could certainly point up
my intention. At the same time, the use of these terms serve a slightly wider
function. They form an attempt to de-centre the language of historical
and social science authority that underpins the professional claims of
academic historians as a whole. By refusing to use the categories and languages of authority we inherited, I am
self-consciously rejecting the systems that underpin the professional
academic practise of history.
It is perhaps a ridiculous comparison, but I like to think of the
use of these terms as akin to the transition in thinking brought about by the
evolution of labelling in quantum theory between the proposal of the eight-fold-way in the 1950s and the November Revolution of 1974. Like most people
of my generation and education, I was raised in an Einsteinian universe in
which unusual phenomenon were described in the most secure of scientific jargon
- we believed in the physics because it was expressed in the language of authority.
But in the 1970s, in particular, a whole new language of strangeness and
charm was broadcast to a popular audience. As a teenager schooled in an older tradition, this challenged me to rethink. By
using everyday words to describe complex phenomena I was forced to interrogate what I believed more closely than I would otherwise have done. I don't understand quantum, but suspect I understand Einsteinian physics better as a result! I use the terms Stuff and Dead People in the hope
that their use will challenge listeners to question the labels and phenomena
they think I am talking about.
This blog is a space for me to rant in that most seventeenth-century sense of the word; and to cut and paste the ideas and comments that don't seem to fit in more traditional forms of academic publication.
Wednesday, 22 May 2013
Thursday, 11 April 2013
Hearing the Dead - Ten years of the Old Bailey Online
![]() |
The Old Bailey Homepage, 2003 |
The Old Bailey Online has now been around for a decade - and we are celebrating. But it also seems a good moment to take stock of what went right and what went wrong and just reflect for a moment on fifteen years of project work.
My most vivid memories are of the team of people involved. They were amazing, and I feel hugely privileged to have worked with them all; most especially with Bob Shoemaker, who has been a constant collaborator and friend for over three decades, through at least five projects and two books. But also a whole crew of people who came together to make something good happen. Just in relation to the period before the initial launch in 2003, there was Jamie McLaughlin and Simon Tanner, Geoffrey Laycock, Louise Henson, John Black, Edwina Newman, Kay O'Flaherty, and Gwen Smithson. What they produced was ground breaking, and that was a simple result of their hard work and good sense. Since then a dozen more people have been involved - most importantly Sharon Howard.
I suppose the real question is: if we had to do it over again, what would we have done differently? Speaking for myself, rather than for the project, I can't think of all that much I would change. Perhaps we could have chosen better software to begin with, and been generally more technically saavy, and worried less about IP. But these seem small things from this distance. I guess I regret the six months of my time, and several thousand pounds of project money spent licensing the images we used for the background pages on the site. The experience demonstrated to me that the copyright system in online images is broken, but it seems a hard lesson won at a relatively high cost!
I also regret over egging some of the early discussion about the project. The whole 'new history from below' narrative which we developed in thinking about the site simply raised the ire of a group of historians offended by the hubris; or who felt their own expertise was somehow threatened. I hold my hands up to the charge of hubris (though I still stand by the need for a 'new' history from below). Digitisation and the Internet, and Digital Humanities more broadly does lend itself to hyperbole, and I am far from immune to its attractions. But mainly it was unnecessary, as the site has effected the wider historical agenda and the kinds of history people write with no need for any one to lead the way or yell about it.
And what we got right - however fortuitously - seems to me to outweigh these issues. Looking back on the decisions we made in the late 1990s, the choice of a double-entry rekeyed text, in combination with XML tagging turned out to be perfect, even if it was through luck rather than expertise (though Michael Pidd's experience and initial steer helped!). We also got the timing right. Because it was 2003, because we started the project in 1998/9, the launch received a lot of news coverage, and generated what seemed at the time a ridiculous amount of usage - without us having to pursue the kinds of detailed 'impact' plans the funding councils now demand.
But mainly, I like to think what we got right was our decision and commitment to digitise the most compelling source of social history I know; of history from below. I am continuously moved by the fact that lots of people now read eighteenth-century trials who would never preoviously have thought to seek them out. Professional historians have always known what powerful voices the Proceedings contain, but putting them online in a form that is easy to use and free has meant that millions of people who would not otherwise have been minded to read this stuff, have done. They have used what they found in the Proceedings in novels and on television, in endless undergraduate dissertations and in more books than I want to read; and I take neither credit nor blame for their work. But, I believe that the decision to make freely available a source that prior to 2003 could only be read by a small and privileged group of academics, was an unproblematic good thing.
![]() |
Bob Shoemaker (right) and Tim Hitchcock, 2003. Clock the monitor! |
In the next ten years I very much hope that the Proceedings will form the basis of a growing body of more technically sophisticated analyses, using all the techniques of datamining, corpus linguistics and information science, all made easier by the API. But most importantly, I hope that people continue to hear the voices of anger and pain it contains; and that for just a second they let their imaginations take them to that brutal theatre of judgement where working people were forced to negotiate with power.
Monday, 4 March 2013
OA in the UK
A recent one-day colloquium sponsored by the Institute of Historical Research and the Royal Historical Society was called precisely to bring together major institutional players (Scholarly Societies, Journals, and Publishers) for a conversation about the best ways forward (the Tweet stream is here). The general feeling seems to be that while every well-meaning historian is keen to promote Open Access (a show of hands at the conference confirmed this), the Gold Route, whereby authors and institutions are asked to shoulder the cost of peer review and publishing, is just not workable in the humanities.
There is also the beginnings of what many feel is an apparent solution to the problem. Both the past and present presidents of the Royal Historical Society, and some 21 editors of major humanities journals, signed a letter proposing the imposition of an increased embargo period on the articles in their journals - essentially suggesting they be allowed to have three years in which to make money on their publications before being forced to make them available through Open Access. They also proposed maintaining a two-track system to ensure that overseas and non-academic authors are excluded from the government led requirement for Open Access.
To me this feels like Saint Augustine's plaint: "Grant
me chastity and continence, but not yet." (Confessions 8:17).
Let me be clear, though.
I understand completely the anxieties motivating these institutions and
commentators. A narrowly defined Gold
Route process of the sort privileged by the Finch Report is not workable in the
humanities. The 'author pays' model is
predicated on the direct funding of research by government, and on the
assumption that the consumers of research outputs are the same as the
producers. In the case of history this
is not true.
The vast majority of
historical research and publication is not funded by project grants; and while
a higher proportion is funded through the Universities, and through QR, there
is still a large body of excellent work that is undertaken by independent
scholars, or as part of a self-funded PhD, or by staff in institutions which do
not receive QR funding or participate in the REF. And similarly, all historians seek to reach a
wider audience than most scientists, and imagine their work in the light of a successful 'trade
monograph'; which itself forms a recognised academic achievement.
In other words, I largely agree with the diagnosis that the main
thrust of the Finch Report is unworkable.
Though, of course, the Report does not restrict academics to the single
route to Open Access, and makes it clear that other types of OA (Green Route)
are entirely consistent with the objective of making publicly funded research
available to the public. Following on
from this, I also believe the RCUK policy to cover the new costs entailed through
grants is also largely unworkable, and if poorly executed in pursuit of a
narrow Gold Route form of publication will create issues of fair access, with institutional
meddling in academic decision making and serious problems for post-graduates
and early career academics.
What is missing in all this is any positive model of Open
Access publishing that takes seriously the fundamental interests and values of history
as a discipline, as opposed to the interests of the collection of institutions
and journals that purport to speak for it.
For myself I have a clear sense of what I would like Open Access in
history (and more broadly in the academy) to look like in ten years' time; and
it would have the following characteristics:
- It would be built on the deposit of articles and research data (including notes) in institutional repositories, linked to APIs that allow their content to be re-'published', mashed up and re-used (with acknowledgment).
- The route to 'publication' would include the initial deposit of research materials, followed by the posting of a 'rough draft' for comment and revision, leading to a post-publication peer review system. The author would then be allowed to specify at what revision the 'article' is complete, with perhaps a six month norm for revisions. See for instance the History Working Papers Project.
- Metrics for downloads, re-use and citations for the now online article will be used to generate a measure of scholarly importance. These metrics can include the kind of complex systems for assessing 'authority' (i.e. whose post peer review assessments are worth most) implicit in the Altmetrics movement.
- 'Journals' will be made up of adopted 'articles' that fit their theme and which are moulded to a particular house style in the open peer review process. In this ecology of scholarship journals will take on a new intellectual role in shaping debate and argument, and in defining academic communities, and will have a 'promoting' role rather than a 'publishing' role.
- Academic monographs will be seen as a simple extension of article publication - i.e. either in the form of long articles, or perhaps as a collection of pieces created as 'articles'.
- Genuine 'Trade History' will continue to be sold in the generic forms of biography and narrative etc., but the underlying academic historical content will be available in institutional repositories, while the revisions and adaptions for a popular audience are dealt with separately.
- The co-archiving of secondary writing and notes and research materials will allow for the creation of an increasingly vertically integrated form of writing, in which source material and commentary are connected.
- The costs of maintaining, curating and archiving the system will be borne by the Universities, with savings from the journal and book purchasing costs. A separate tNA or British Library repository will support and archive the work of otherwise unaffiliated scholars.
Getting to this point is not straightforward. But the universities are fully empowered to
use the RCUK funding to beef up their repositories, rather than paying journal
fees. The repositories could also take a first step
towards a more rigorous ecology of scholarship by archiving and making
available the underlying research data we all endlessly collect (and jealously
guard as the capital of an ego driven system of professional advancement).
At the moment most public debate and effort seems to be
devoted to preserving the current business model that underpins the
public/private partnership that lies at the heart of academic publishing. The
journals worry that their main income stream (allowing them to provide studentships
etc) will be eliminated; while the publishers worry that their privileged
position between subsidised creators of content and subsidised buyers of
content will be squeezed. Both these
anxieties are justified.
But, we need to
ask ourselves whether we really want to use the roundabout and expensive route
of generating income from University Library budgets via the publication of
materials produced by academic staff, to take money from the providers of
education - the Universities - in order to give it to the journals and scholarly
societies, in order to allow them, to in turn purchase education from the
Universities. It is ridiculous.
As for the academic presses, they have spent thirty years
squeezing the 'added value' from their operation. In-house copy-editing and proof reading for
the most part went ages ago. And many
presses now demand what amounts to 'camera ready' copy. If the presses do not want to serve their
traditional role in an ecology of scholarship (sifting and polishing its
products), then it is not clear what their profits are based on. At the moment, the greatest input on the
part of the presses lies in advertising and licensing content, policing its
re-use and in producing hard copy versions of books and articles that are largely
unwanted (ask any librarian). A
thoroughgoing Open Access model eliminates the need for selling, licensing, and
policing, while time will take care of the romantic attachment to wood pulp.
Current debate seems most fully motivated by a reactionary
and defensive fear that a change in the nature of academic publication will
unravel the systems of authority and organisational finance that used to
deliver public debate. But, if we have
faith in the importance of the academy and of scholarship, then we need to
continually re-invent the process. Open
Access provides a perfect opportunity to reconnect with the founding principles
of the academy.
Wednesday, 27 February 2013
Drills and holes
I have just spent a couple of days at the annual conference of the NFAIS (National Federation of Advanced Information Services) in Philadelphia. I was giving a brief paper at the request of the good people at the British Library about developments in text mining in the Humanities, and I was happy to be invited and to participate.
But it was only when I had sat through a day or two of presentations that I realised just how out of place I was, and how irrelevant my comments were. It turns out the NFAIS is the trade organisation for all the companies (and some libraries) that have been building a commercial operation for a hundred years by placing themselves between information and those who need it. Thompson-Reuters were heavily represented, as was Cengage/Gale, several medical abstracting services, and a host of companies providing data to particular sectors of the economy such as the building trades and architects. And they were all presenting their well articulated models of data gathering and manipulation designed to deliver a pablum of stuff to the desktops of America's commercial movers, shakers and capitalists. Interestingly, the people who weren't there, were Google, Facebook, Twitter, the Creative Commons or representatives of the Open Access movement. Neither new model capitalists, nor Open Access evangelists were present. And while there was a strand of discussion focussed on research and academic library services, this was a small corner of an essentially old style commercial ecology. Nor were the real innovations in data modelling and analysis coming out of CS on display. A constant sub-theme of the conference seemed to be a tetchy criticism of Google for having done a half-arsed job of inter-mediating between data and users, while the Twitter stream for this event was almost non-existent. It was clear that these data professionals were having their conversation somewhere else - though I never did find out where.
There was a lot of talk about Altmetrics as a way of adding value to the data people already had, prior to selling it to the managers of research and education. And the theme of the event appeared to be a call to extend a hundred year old business model to ensure that these companies were delivering precisely the data that people needed (rather than what they thought they wanted), in a form that allowed them to use that data without thinking. The controlling metaphor was - people buy a drill, because they really want a hole.
I was bemused by this. I don't want a hole. I want a drill, a hammer, a saw and a workshop to make stuff in. And I certainly don't want anyone else to second guess what it is I am making (perhaps wonky, but original).
And then it occurred to me where my disconnect came from. The NFAIS and the companies they represent derive from a long and largely American tradition of late Enlightenment data processing. Their origins lie in Union Catalogues and abstracting services via microfilm; in creating a pre-digested, post-Enlightenment world of understood data, that could be packaged and catalogued and sold as yard after yard of uniform reference volumes. NFAIS used to stand for the National Federation of Abstracting and Information Services. I have always put the American obsession with this kind of thing down to its inability to get over the European Enlightenment long after the rest of us got bored.
My overwhelming impression was that all these companies were anxious to widen the gap between data and its users, to ensure that they continued to have a role and an income - tapping the stream between the two for annual profit. Some of the work was reasonably sophisticated (though most of it felt more 'relational database' than anything more innovative), and it was clear that many saw the way forward as providing faster access to real-time data in a form that would become normalised in a business context (or well funded, close to market STEM).
In retrospect, and after having heard the presentations which came after my own, what I really should have said more forcefully, is get out of the way, this is boring, and it misses the point entirely. We are rapidly approaching the stage when the devil's contract between private companies and the public sector, which has governed data delivery in both the humanities and STEM for the last fifteen years on line (and a hundred years off-line) is going to break down. Open Access, for example, is just a wedge issue for a wider re-thinking of how research and data, and its users will interact. And the fact that both the British and American governments were there first, is an indication that this particular community is not paying sufficient attention.
I am not overly exercised by the profiteering of these companies (if people want to sell their souls for a health plan and a cheap suit, that is OK by me). Nor do I really want to castigate them for their place in an information food chain. Companies like Cengage/Gale have coralled a useful amount of money for data processing. I am just struck by the lack of serious engagement with the real changes in the fundamental relationship between the production and consumption of data that the last fifteen years has wrought. More than anything else, my three days in Phili has brought home to me that I simply don't inhabit the same data universe that I did fifteen years ago.
But it was only when I had sat through a day or two of presentations that I realised just how out of place I was, and how irrelevant my comments were. It turns out the NFAIS is the trade organisation for all the companies (and some libraries) that have been building a commercial operation for a hundred years by placing themselves between information and those who need it. Thompson-Reuters were heavily represented, as was Cengage/Gale, several medical abstracting services, and a host of companies providing data to particular sectors of the economy such as the building trades and architects. And they were all presenting their well articulated models of data gathering and manipulation designed to deliver a pablum of stuff to the desktops of America's commercial movers, shakers and capitalists. Interestingly, the people who weren't there, were Google, Facebook, Twitter, the Creative Commons or representatives of the Open Access movement. Neither new model capitalists, nor Open Access evangelists were present. And while there was a strand of discussion focussed on research and academic library services, this was a small corner of an essentially old style commercial ecology. Nor were the real innovations in data modelling and analysis coming out of CS on display. A constant sub-theme of the conference seemed to be a tetchy criticism of Google for having done a half-arsed job of inter-mediating between data and users, while the Twitter stream for this event was almost non-existent. It was clear that these data professionals were having their conversation somewhere else - though I never did find out where.
There was a lot of talk about Altmetrics as a way of adding value to the data people already had, prior to selling it to the managers of research and education. And the theme of the event appeared to be a call to extend a hundred year old business model to ensure that these companies were delivering precisely the data that people needed (rather than what they thought they wanted), in a form that allowed them to use that data without thinking. The controlling metaphor was - people buy a drill, because they really want a hole.
I was bemused by this. I don't want a hole. I want a drill, a hammer, a saw and a workshop to make stuff in. And I certainly don't want anyone else to second guess what it is I am making (perhaps wonky, but original).
And then it occurred to me where my disconnect came from. The NFAIS and the companies they represent derive from a long and largely American tradition of late Enlightenment data processing. Their origins lie in Union Catalogues and abstracting services via microfilm; in creating a pre-digested, post-Enlightenment world of understood data, that could be packaged and catalogued and sold as yard after yard of uniform reference volumes. NFAIS used to stand for the National Federation of Abstracting and Information Services. I have always put the American obsession with this kind of thing down to its inability to get over the European Enlightenment long after the rest of us got bored.
My overwhelming impression was that all these companies were anxious to widen the gap between data and its users, to ensure that they continued to have a role and an income - tapping the stream between the two for annual profit. Some of the work was reasonably sophisticated (though most of it felt more 'relational database' than anything more innovative), and it was clear that many saw the way forward as providing faster access to real-time data in a form that would become normalised in a business context (or well funded, close to market STEM).
In retrospect, and after having heard the presentations which came after my own, what I really should have said more forcefully, is get out of the way, this is boring, and it misses the point entirely. We are rapidly approaching the stage when the devil's contract between private companies and the public sector, which has governed data delivery in both the humanities and STEM for the last fifteen years on line (and a hundred years off-line) is going to break down. Open Access, for example, is just a wedge issue for a wider re-thinking of how research and data, and its users will interact. And the fact that both the British and American governments were there first, is an indication that this particular community is not paying sufficient attention.
I am not overly exercised by the profiteering of these companies (if people want to sell their souls for a health plan and a cheap suit, that is OK by me). Nor do I really want to castigate them for their place in an information food chain. Companies like Cengage/Gale have coralled a useful amount of money for data processing. I am just struck by the lack of serious engagement with the real changes in the fundamental relationship between the production and consumption of data that the last fifteen years has wrought. More than anything else, my three days in Phili has brought home to me that I simply don't inhabit the same data universe that I did fifteen years ago.
Monday, 29 October 2012
A Five Minute Rant for the Consortium of European Research Libraries
I have been asked to participate in a panel at the annual CERL conference - and to speak for no more than five minutes or so. Initially, I was just going to wing it, but then, in writing up a couple notes, five minutes worth of text found its way on to the screen. In the spirit of never wasting a grammatical sentence, the text is below. I probably wont follow it at the conference, but it reflects what I wanted to say.
CERL - British Library, 31 October 2012:
We all know just how transformative
the digitisation of the inherited print archive has been. Between Google Books, ECCO, EEBO, Project
Gutenberg, the Burney Collection, the British Library's 19th century
newspapers, the Parliamentary Papers, the Old Bailey Online, and on and on,
something new has been created. And it
is a testament to twenty years of seriously hard graft. But, it is one of the great ironies of the
minute that the most revolutionary technical change in the history of the human
ordering of information since writing - the creation of the infinite archive,
with all its disruptive possibilities - has resulted in a markedly conservative
and indeed reactionary model of human culture.
For both technical and legal reasons,
in the rush to the on line, we have given to the oldest of Western canons a new
hyper-availability, and a new authority. With the exception of the genealogical sites,
which themselves reflect the Western bias of their source materials and
audience, the most common sort of historical web resource is dedicated to
posting the musings of some elite, dead, white, western male - some scientist,
or man of letters; or more unusually, some equally elite, dead white woman of
letters. And for legal reasons as much
as anything else, it is now much easier to consult the oldest forms of
humanities scholarship instead of the more recent and fully engaged
varieties. It is easier to access work from
the 1890s, imbued with all the contemporary relevance of the long dead, than it
is to use that of the 1990s.
Without serious intent and political
will - a determination to digitise the more difficult forms of the
non-canonical, the non-Western, the non-elite and the quotidian - the materials
that capture the lives and thoughts of the least powerful in society - we will
have inadvertently turned a major area of scholarship, in to a fossilised
irrelevance.
And this is all the more important
because just at the same moment that we have allowed our cultural inheritance
to be sieved and posted in a narrowly canonical form; the siren voices of the
information scientists: the Googlers, coders and Culturomics wranglers, have
discovered in that body of digitised material, a new object of study. All digital texts is now data, that date is
now available for new forms of analysis, and that data is made up of the stuff
we chose to digitise. All of which embeds a subtle biase towards a
particular subset of the human experience.
Using measures derived from Ngrams, and topic modelling, natural
language processing, and TF-IDF similarity measures; scientists are beginning
to use this text/data as the basis for a new search for mathematically
identifiable patterns. And in the
process, the information scientists are beginning to carve out what is being
presented as 'natural' patterns of change, that turn the products of human
culture into a simple facet of a natural, and scientifically intelligible
world. The only problem with this is
that the analysts undertaking this work are not overly worried by the nature of
the data they are using. For most, the
sheer volume of text makes its selective character irrelevant.
But if we are not careful, we will
see the creation of a new 'naturalisation' of human thought based on the
narrowest sample of the oldest of dead white males. And to this particular audience I just want
to suggest that we need to be much more critical about what it is that we
digitise; what we allow to represent the cultures libraries and collections
stand in for; and that we need to engage more comprehensively and intelligently
with the simple fact that we are in the middle of a selective recreation of
inherited culture.
Wednesday, 11 July 2012
Place and the Politics of the Past
Preface
The talk that forms the basis for this post was written for the annual Gerald Aylmer seminar run by the Royal Historical Society and the National Archives, and was delivered on 29 February 2012. The day was given over to a series of great projects, most of which came out of historical geography, and I was charged with providing a capstone to the event, and presenting a more general overview of the relationship between history and geography. There was a good audience of academics, archivists and librarians, all with a strong digital cast of mind and I very much enjoyed it. Unusually, I am also pretty sure I still agree with the majority of it even some six months after I sat down to write it. At the same time, I have put off posting it until now because it just did not quite feel like a blog post - too much history, too much text, too much of an internal discussion among academics. I have also recently found myself wary of blogging, having discovered (who knew?) that blogs form a sort of publication that people occasionally read. But, the work of people like Andrew Prescott has also reminded me of just how important it is to continue having the discussion. In the nature of a public talk, the text is rather informal, the notations slapdash, and the links non-existent.
Two years later, George Cruickshank, includes him, broom in hand, with Billy
Waters, King George III, and a host of abolitionists, in the ‘The new
Union-Club’.

The talk that forms the basis for this post was written for the annual Gerald Aylmer seminar run by the Royal Historical Society and the National Archives, and was delivered on 29 February 2012. The day was given over to a series of great projects, most of which came out of historical geography, and I was charged with providing a capstone to the event, and presenting a more general overview of the relationship between history and geography. There was a good audience of academics, archivists and librarians, all with a strong digital cast of mind and I very much enjoyed it. Unusually, I am also pretty sure I still agree with the majority of it even some six months after I sat down to write it. At the same time, I have put off posting it until now because it just did not quite feel like a blog post - too much history, too much text, too much of an internal discussion among academics. I have also recently found myself wary of blogging, having discovered (who knew?) that blogs form a sort of publication that people occasionally read. But, the work of people like Andrew Prescott has also reminded me of just how important it is to continue having the discussion. In the nature of a public talk, the text is rather informal, the notations slapdash, and the links non-existent.
Place and the Intellectual
Politics of the Past
Currently there is a rather wonderful raproachment between historical
geographers and historians; with archivists and librarians (as usual) providing
the meat, gristle and spicy practical critique. This is brilliant. These are cognate disciplines
which need to be in constant dialogue. The
habits of mind and analytical tools of geographers need to inform our understanding
of the past; while the mental ticks of the historian, and the authority of
history as a literary genre, are necessary tools for communicating all kinds of
memory to a wider audience.
Frustratingly, over the last half century, this
cross-fertilisation hasn’t always flourished.
Geography Departments (and even historical geographers) have not done a
lot of talking to history departments, and vice versa. Many geographers, through the 70s and 80s, in
particular, became ever more engrossed in the technical manifestations of their
field; while many historians have fallen for the joys of the linguistic turn.
In part, this has been about funding and the structures of
higher education. In the great taxonomy
of knowledge inherent in the creation of the notion of STEM subjects vs the Humanities,
Geographers naturally gravitated towards the areas with more secure
funding. While historians, frustrated by
the changing politics of their field – the collapse of ‘modernity’ as a form of
historical explanation (both right looking and left looking) – pursued theory,
gender and language, to the exclusion of positivist measurable change – they
chose places of endless debate and disagreement – usually in the form of a
various versions of identity politics - as a route to a politicised audience. Similarly, whereas historical geographers
looked to Europe and a thriving institutional base; historians tended to more
frequently look westward to North America, where historical geography is almost
unknown – or at least denied the security of an extensive network of independent
academic departments.
All of which is simply to say, that we are confronted with
two fields that should be in constant dialogue, but which simply have not
passed much more than the odd civil word in the last few decades. In the process, they have developed different
technologies of knowing and different systems of training and analysis. It strikes me that an event celebrating
Gerald Aylmer’s cross genre engagement with history and archives, with the
structures of the archive, and the stories that can be told with them, is just
the right place to bring these disciplines back onto the same map, or at least
to reconsider where in a rapidly changing technical environment, that necessary
dialogue might take place.
Of course, for most of the last decade or so we have had the
‘spatial turn’ in history; and for longer than that, the creation of a
post-modern geography. Historians have
struggled to define ‘space’ and context in ever more material (if still rather
flabby) terms; while some historical geographers have taken theories of
discourse and language seriously and extended them to the clear air enclosed by
the mind-forged boundaries symbolically represented on every map.
But this has been little more than a casual rapprochement –
driven largely by academic fashion; and has not fundamentally changed the centre
ground of either discipline. Most historians still trade in text mediated by
uncertainty and theory; while historical geographers, strive to tie data to a
knowable and certain fragment of the world’s surface.
What I want to suggest today, is that something rather more
profound than the ‘spatial turn’ is also
happening in the background, and that it promises to force these disciplines
(and several others) back into a more direct relationship.
And this change, this possibility, is being driven by
technology; both in the form of the ‘infinite archive’ – the Western Text
Archive second edition; and also through the direct public access to a newly
usable online version of GIS-like tools, in the form of Google Maps and its
many imitators.
To deal with historians and the infinite archive first - I
don’t think that historians have quite twigged it yet – though librarians and
archivists certainly have - but the rise of the ‘infinite archive’ has
fundamentally changed the nature of text.
It has turned text in to ‘data’, with profound implications for how we
read it, and deploy it as evidence. My
guestimate is that between fifty and sixty per cent of all non-serial
publications in English produced between Caxton and 1923 – between the first
English press and Mickey Mouse – has been digitised to one standard or another;
with a smaller percentage of serial materials thrown in.
This has ensured that the standard of scholarship has in
many ways improved. It is now possible
to consult a wider body of literature before setting out to analyse it. But, it has also pushed us to the point where
it is no longer feasible to read all the material you might want to consult in
a classic immersive fashion. Instead we
are moving towards what Franco Moretti has dubbed ‘distant reading’, and towards
the development of new methodologies for ‘text mining’ – or the statistical
analysis of large bodies of text/data. Stephen
Ramsay’s new book, Reading Machines, illustrates four or five examples of what
he describes as a new form of ‘Algorythmic Literary Criticism’, but is simply a
taster for a wider series of practical methodologies. That Tony Grafton, president of the American
Historical Association could recently and hyperbolically claim that textmining
was simply the ‘future’ of history, and that it was already here; reflects a
truth most digital humanists (whose ranks are dominated by librarians and archivists)
have been struggling with for the last three or four years; but which most
historians are only now becoming aware of.
The Google Ngram viewer, which allows you to rapidly chart
the changing use of words and phrases as a percentage of the total published
per year is just the most high-profile online tool in a wider technological landscape. The associated, and ill-named, ‘culturomics’
movement being built on the back of the ngram viewer is another. I love the ngram viewer, and spend my Sundays
charting the changing use of dirty words decade by decade. But it also forms the basis for a newly statistical
approach to language. And lest we
forget, recorded language is the only evidence most historians use.
I am not entirely convinced by the culturomics work, which
focuses on describing social and linguistic change through consistent mathematical
formulae, but in related studies by people such as Ben Schmidt, Tim Sherratt
and Rob Newman, one can find the beginnings of a pioneering analysis of large
scale texts that promise to remodel how we understand cultural change, and the
relative influence of events.
These graphs simply illustrate that the word ‘outside’ both
grew in commonality over the course of the nineteenth century (perhaps
understandably as people’s lives migrated inside); and that if this phenomenon
were a reflection of naturally evolving language (embedded in people’s
vocabulary in youth), its adoption according to the age of the authors whose
work has been published, would look like the first graph; but that in fact it
looks like the second. In other words,
these visualisations created by Schmidt suggest a history of the adoption of
the use of the word ‘outside’ in response to events; and in the process give us
a way of measuring the cultural impact of specific happenings – its import as
measured by individual responses to them.
Or look at Tim Sherratt’s visualisations of the use of the
terms ‘Great War’ and ‘First World War’ in 20th century Australian
newspapers. While entirely
commonsensical, the detailed results of the 1940s in particular mark out the
evidence for a month by month reaction to events; allowing both more directed
immersive reading (drilling down to the finest detail), and a secure
characterisation of large scale collections.
Or to bring this back to a British perspective, we can look
at work Bill Turkell and I have done on the Old Bailey Online – simply charting
the distribution of the 125 million words reported in 197,000 trials, to
analyse both the nature of the Old Bailey Proceedings as a publication, and
their relationship to words spoken in court – in this instance to illustrate,
among a few other things, that serious crimes like killing were more fully
reported than others in the 18th century proceedings, and that this
pattern changed in the 19th century.
This stuff works and is important; and will necessarily form
a standard component of the research of anyone who claims to understand the
past through reading.
But it points up a further issue. If each paragraph in the infinite archive,
all the trillions of words, is simply a collection of data, it immediately
becomes something that can be tied to a series of other things – to any other
bit of data. A name, a date, a selection
of words, or a phrase, or most importantly in this context, a place – defined
as a polygon on the surface of the earth.
In other words, the texts that form the basis for western history can
now be geo-referenced and tied directly to a historical/geographical
understanding of spatial distribution, which can in turn be cross analysed with
any other series of measures of text – textmining makes text available for
embedding within a geographical frame.
I can’t emphasise this enough: the creation of a digital
edition of the western print archive means that it can be collated against all
the other datasets we possess. The
technology of words, and how we engage with them, has changed; creating a new
world of analysis. With a bit of Natural
Language Processing, and XML tagging; and a shed load of careful work, a
component of text that hitherto has been restricted to human understanding
becomes subject to precise definition: “he walked for twenty minutes from St
Paul’s westward, coming first to Covent Garden, and then onwards to Trafalgar
Square’, changes from a complex narrative statement reflecting an individual’s
experience, into three individual locations, a journey’s route, and a rate of
travel; each capable of being expressed as a polygon, a line, a formulae – a
specific bit of translatable data.
What I want to say next might not sound quite right in this
context. But, I am hoping this
development of text as data, and by extension, text tied to place, will have a
more profound impact on our understanding of the past precisely because, for
the most part, it has not emerged from historical geography. It has been driven by people interested first
in text, and only then, in data such as place.
As a long-time admirer of the work of historical
geographers, and avid reader of it; I believe that the rise of a highly
sophisticated form of desktop GIS, requiring substantial training and expertise
to make work, has contributed to the evolution of a widening gap between
disciplines, and has in some ways distanced historical geographers from the
kind of audience historians have traditionally courted. The rise of the geographer as ‘expert’ has
been both impressive and excluding.
But, in the last few years a real alternative has
emerged. I understand geographers are
sniffy about it, and I know full well that it doesn’t provide the kind of
powerful analytical environment that a fully functioning GIS Editor, Analyst and Viewer package
can generate in combination with a Spatial database management system. But it is usable and it is continually
getting better. And I mean Open Street
Map, Google Earth and Google maps, and the range of open source browser side
services that build on it, like BatchGeo.
Together they make available to everyone, a good and growing
proportion of the tools previously only available to a technocratic elite. In the process and in combination with the
transition of text in to data; we are suddenly in a position to do something
different.
My favourite exploration of what can actually be done online
in an intuitive and accessible way, is Richard Rodger’s collaborative project
with the National Library of Scotland: Visualising Urban Geographies, and the associated
Addressing History sites:
The important thing about these projects is that they allow
a wider audience to use historical maps in the way a historical geographer
would, and to upload their own KLM files, and to explore the data, and relate
it to a modern map. It is historical
geography made user friendly. And that
is important.
I am also a great fan of the New York Public Library Labs,
Map Rectifyer Project, which crowd sources the kind of warping of maps that
people just could not previously do.
And more recently, the British Library’s adaptation of the
same methodology to their own map collections.
As much as anything projects like these educate a wider
public (including historians) about the methodologies and issues traditionally
faced by historical geographers, and generally hidden behind a beautifully
presented set of final maps designed to make a point, rather than to allow an
intellectual journey. These sites form
open invitations to discover all the issues associated with the underlying maps
and all the problems with the data.
Together, text as data and user friendly GIS make it newly
possible to imagine an environment in which geographical information, and
display, form a natural and unproblematic component of every other analytical
process. It makes possible a situation
in which historians cease to be mere text merchants, obsessed with the perfect
quote, and compelling (if largely un-evidenced) argument; and where geographers
have a new access to the subtle mappings of the marks of ‘culture’ in its
broadest sense – a new way of thinking about the geographical distribution of
behaviours and ideas, that bring within a geographical fold questions
traditionally preserved for others.
By extension, In other words, I want to suggest that it is
very much the moment for a bonfire of the disciplines, and that while history
and geography can now begin to speak in new terms, the same forces are also making
it possible for literature, and art history; for all the disciplines of memory
and explanation, to speak in new ways to each other. We quite suddenly share a new culture of data
– and data can be translated.
I will return to both these developments in a minute. But, by way of illustrating the kinds of
things that we are now able to do as a result of this newly open and analytical
framework – through the mash-up of text and space - I want to spend a little time
discussing a project that Bob Shoemaker, Matthew Davies and I, and a large team
of other people, recently completed, called Locating London’s Past. Please excuse me for spending the next couple
of minutes on something that sounds a little bit too much like ‘me and my
database’ to be entirely appropriate.
In itself, Locating London’s Past is not particularly
important, but it illustrates one naïve attempt to play with these new possibilities;
to take text/data and accessible online GIS, and make something that facilitates
mapping words.
This project grew from the rich soil that is failure. Five or six years ago, as a final component
of the original Old Bailey project, we struggled to incorporate a mapping
feature on to the site that could be delivered online. But
in the last few years, we realised something was changing; and inspired in
particular by the Edinburgh project, we decided to try again. The outcome - Locating London’s Past – does three things that are new. First, it makes available a fully rasterised
and warped version of both John Rocque’s 1746 map of London; and the first
‘accurate’ OS map of the capital created between 1869 and 1880 – both of which
have been fully ‘polygonised’, and related to a modern Google maps
representation of London. And second it
brings together around 40 million words of text, and a raft of established
datasets – a couple of hundred million lines of data - in a newly geo-coded
form that can be ‘mapped’ against both area and local population, at the level
of streets, parishes and wards. And
finally, it relates both these resources to the first comprehensive, parish
level population estimates for the 18th century.
In the process it
brings together text and maps in a new way, delivered in a cut down Google maps
container that even a historian can understand.
For the maps and GIS, we turned to Peter Rauxloh of the Museum of London
Archaeological Service (MOLA), who worked with
scans and an index of place names drawn from Rocque’s map created by
Patrick Mannix, to develop the kind of resource that underpins the best sort of
traditional desk bound GIS project.
The 24 sheets of
the original map were turned in to a single image, and then warped onto
the first reliable Ordinance Survey map from 1869-1880, creating a direct
geo-referenced relationship between the first accurate modern representation of
London and Rocque’s eighteenth-century version.
Rocque 1746 After Georeferencing.
The geo-referencing
operation involved identifying some 48 common points between Rocque's original
map and a modern OS map; leaving us the task of defining all the streets,
courts, parishes and wards that made up 18th century London. In the end, this amounted to some 29,000
separate defined polygons.
Parish boundaries which intersect with the
street of Cheapside, London.
Completed street network for main area
covered by Rocque's map.
Street lines expanded to polygons based on
recorded width.
All of which gave
us something rather cool – a proper, interactive and accurate map of 18th
century London, that among a lot else, let’s you go from here:
To here:
And more
importantly, lets you go here – all those parishes securely defined:
And all those
streets and cul-de-sacs:
In the process it
makes, each parish and street, ward and cul-de-sac newly available as an
analytical category – defined as a specific area, and location – defined in
terms of its distance from any other place, and the route between them, its
size and importance in a hierarchy of streets – and defined securely against
the earth’s surface.
All of which left
us with just one more task – the, to us, more familiar job of providing the text/data
to put into these analytical polygons.
And for that, we
brought in the material available from the Old Bailey online for the 18th
century crime, from London Lives, fire insurance records, voting records for
Westminster, Hearth Tax returns and plague deaths; and finally a bunch of
archaeological material from Mola.
Along with the more
structured data, we ended up with a couple of hundred million words of text,
primarily reflecting crime and events – descriptions of behaviours given under
oath to magistrates, in court, at sessions and before a coroner; which we then
processed using a combination of automated methodologies, including Natural
Language Processing, and manual checking, to identify some 4.9 million place-name
instances, each tied to its own polygon.
All of this data
was then made available for search and mapping – including both structured and
keyword searches – so both the ability to search on the crime of ‘murder’, and
the word: teapot.
Inevitably, there
are problems with the data and the map; but, it nevertheless allows us to map
things like the number of small houses in the 1690s – defined as having one or
two hearths as recorded in the Hearth Tax Returns.
Or more
contentiously, to map the distribution of suicide cases in the coroners’
inquests found by a keyword search of 5000 inquests on ‘felo’ – as in felo de
se – and ‘suicide:
Or the distribution
of the mention of a horse, mare or gelding, in the Old Bailey.
We have not even
begun to explore what the data tells us, nor was this created in the
expectation that we would be able to do so – but the important thing is that it
does allow us and everyone else, to explore this material in a new way – and do
so quickly enough to facilitate the testing of new hypotheses, and random
midnight thoughts. And to quickly test
words against spaces, text/data against spatial data.
Of course, all of
this is contentious, and I suspect will leave historical geographers rather dissatisfied. The original data is variable, the percentage
securely geo-referenced is inconsistent and I am waiting for a few proper
demographers to critique the population figures. As worrying, the data is not currently
available for the more subtle analytical approaches that have been so fruitful
in historical geography. We can’t easily
define networks, for instance. In other words, this is a rough starting
point, and all the critical skills of a true sceptic are needed when using
it. But this site does allow us to play
with all this data in a new way, and to come up with insights and
hypotheses for further investigation. To,
for instance, map all the instances of the words for the industrial colours of ‘blue, red and yellow’ against the natural
hues of ‘brown and green’ to explore an urban environment and to suggest different
ways of thinking about a wider cityscape.
It also means that we have 40 million words of
geo-referenced text that we can use as the basis for a new kind of text mining
– that incorporates space with linguistic change; and which will add to the
geographers’ toolbox all the rather wonderful methodologies of corpus
linguistics: Measures of Text Frequency and Topic Modelling to name just two.
We are, of course, nowhere near where we really want to
be. For that, we will need to have a lot
more text, and a lot more subtlety. I
want to be able to map all the places in a newspaper by subject and category of
article – to have a scrolling representation of places mentioned in a text as I
read (either immersively or distantly).
I want to be able to use corpus linguistics, semantic search and
syntactic analysis (ontologies and all the methodologies designed for
text/data) in combination with both secure place name data, and historically sensitive
boundary and population data. There is
not much point in comparing 18th century text with the modern road
network or county boundaries; or wondering why there is not a lot of text
coming from Greenland in the absence of population density figures. I want to be able to map networks defined by
individuals, defined in turn by the words they use; and networks defined by
geographical measures such as road width.
What percentage of London was made up of parkland at different stages? And what words and crimes are dominant in
those different parks (Hyde Park vs Moorfields?). And as importantly, I want to be able to test
the results against secure measures of statistical significance.
All of the components to make this happen are in place – we
all now work with data and data is interchangeable – subject to unending
automated translation - making the main technical hurdle essentially unproblematic. But there is still a long way to go. And there is also a clear and present danger
in the process. And it is that danger,
that I now want to turn to.
I spent last year co-directing one geographical project –
Locating London’s Past - and one text mining project – Datamining with Criminal
Intent. Both projects were
intellectually engaging beyond measure.
I learned more about history – and sources I already thought I knew well
- doing something else with them, than I could possibly have done in a year of
reading. But I also found myself
struggling against the run of the data I was helping to produce. I came to count myself among those who Lewis
Mumford had in mind in 1962 when he warned urban geographers that:
‘… minds
unduly fascinated by computers carefully confine themselves to asking only the
kind of question that computers can answer and are completely negligent of the
human contents or the human results.’ Lewis Mumford, “The Sky Line "Mother Jacobs Home Remedies",” The
New Yorker, December 1, 1962, p. 148
In other words, I found myself limping uncomfortably towards
a positivist abstraction in which there were few people, but much data;
beautiful graphs and compelling trends, but few of the moments of empathetic
engagement that make history so powerful and which form a little discussed
component of its authority as a genre of literature.
So, at this point, having waxed on the joys of technology
and what it allows you to do, I want to stand back for a minute and remember
the individual in the landscape. And in
this instance, just one individual – a man named Charles McGee or Mckay, who
stood just here for over forty years, from at least 1809, until his death in
1854; making a living as a one-eyed crossing sweeper – a black Jamaican refugee
from Britain’s wars of colonial expansion:
Or to put it differently, he stood just here, on a map
created just before he arrived:
Or, for a map that should have included him, here:
Or if we want to get down to street level, just here – the
obelisk he stood in front of, itself visible on the map:
MacKay became a part of the image of this cityscape almost
immediately. William Bennet was the
first to record his presence, placing him before the obelisk dedicated to John
Wilkes that stood at the top of New Bridge Street:
He was already missing one eye, but had not yet started
sporting his shock of white hair:
A year later, in 1810, he McKay was still there:
As he was when Ackerman published the same vista in 1812. Recognisable, even though his faced has been
scratched white by some later owner of this image, clearly made uncomfortable
by his presence in the landscape:
Five years later, in 1817, John Thomas Smith, the keeper of
prints and drawings at the British Museum, gave us our first detailed portrait,
and our first biography.
Smith claims Mckay, or McGee as he styles him, was already
old beyond credibility in 1817, though another account would put his age as 50
in that year. His hair ‘almost white’,
was tied back in a tail and Smith firmly locates him at his ‘stand… at the
Obelisk, at the foot of Ludgate-Hill’.
He also claims (as do most commentaries on well known street figures),
that he was secretly wealthy; and that he attended Rowland Hill’s Methodist Tabernacle
on Sundays; that he was lately seen wearing a ‘smart coat’ the gift of a city
pastry chef, and finally that his portrait, made in October of 1815, hung in
the Twelve Bells public house on Fleet Street – around the corner from the
obelisk.
And again, in 1821, in his depiction of Tom and Jerry,
‘Masquerading it among the Cadgers’:
And finally, in the same year, Cruickshank includes McKay in
his ‘Slap at Slop’, suggesting along the way that McKay was involved on the
edges of radical London:
And so to John Dempsey’s portrait from sometime in the 1820s
– which seems to me to speak of a man and a place, of a life lived in a
landscape, more powerfully than any other.
At the beginning of the next decade, he was still there,
depicted this time, from a different perspective:
And three years later, Mackay also became the model upon
which Charles Matthews based his depiction of a modern Othello in ‘the Moor of
Fleet Street’, first performed to
disastrous reviews, at the Adelphi in 1833.
In the play, Mackay is depicted as engaged in a battle of jealousy
and rage among the low characters of London; and is described as ‘the Moor who
for many a day hath swept Waithman’s crossing over the way’ from Ludgate Hill.[i]
His spotted red bandana, clearly visible in
Dempsy’s depiction, invested with gypsy lore, and gypsy power, to ‘keep woman
honest, or cure the worst cold’, and given a history steeped in London’s boxing
lore, and serving in the play, the role of Desdemona’s lost handkerchief.
The best account of his later life is from Charles Diprose’s
authoritative history of St Clement Danes, where McKay lived, off Stanhope
Street. Diprose describes McKay as ‘a short, thick-set man,
with his white-grey hair carefully brushed up into a toupee, the fashion of his
youth; … he was found in his shop, as he called his crossing, in all weathers,
and was invariably civil. At night, after he … swept mud over his crossing… he
carried round a basket of nuts and fruit to places of public
entertainment...’ And according to Diprose,
‘He died in Chapel Court, St Giles, in 1854, in his eighty-seventh year.’ A later historian, William Purdie Treloar
claims McKay was then replaced at his stand by a drunken soldier who ‘sometimes
made 8s to 10s a day’, and drank as much each evening.
All of which is simply to say, that some people stand in the
same place longer than many buildings; and have a greater right to appear on a
map, than many landmarks. As we move
towards that new data rich environment of text/data and intuitive GIS; as naïve
historians and the wider public, come to use the ideologically laden genre that
is a map as an interface for trillions of words of text; and as they step back
from their own text to view text/data from afar, I just think it is important
to remember that landscapes and cityscapes only exist between the ears of their
denizens – that we cannot map the subtleties of Ludgate Hill and New Bridge
Street without trying to know Charles Mackay.
With Lewis Mumford, we need to ensure that we are not ‘completely negligent of the human contents or the human results’ of
asking the questions only computers can answer.
[i]
Note that Waithman is a wealthy linen draper with a shop on Fleet Street. His daughter is reputed to have been
especially kind to McKay, and to have received a legacy from his on his death
of £7000. Treloar gives a more detailed
account of Waithman’s role as alderman and MP, and suggests his daughter
regularly took out soup and warm food to McKay, p.124.
Subscribe to:
Posts (Atom)