Tuesday, 15 July 2014

Doing it in public: Impact, blogging, social media and the academy

The text below is derived from a short talk I gave in February for the Library at the University of Sussex.  At the time (and in the text) I promised to post it as a blog, but never quite found the time. 

Impact is an awkward thing in British Higher Education.  Most of the time it feels like just one more bludgeon used to batter hapless academics into submission.  It is frequently shorthand for an agenda handed down from on high, privileging near-market research and the agendas of government.  And yet no one spends a lifetime researching, teaching and writing about something if they don't believe it is important - if they don't believe that what they do contributes to a better world. We all want to have 'impact'.  The question is how can we do so in a way that reflects our own values, rather than those of whatever government happens to be in power this week?

This question is all the more important because our traditional assumptions about how our work effects a broader social discourse seem increasingly threadbare.  When the print run of most monographs number just a few hundred copies (most of which disappear in to American research libraries, never to be read or used), and when journal articles proliferate beyond number because they serve the needs of big publishing, rather than academic dialogue - we need to think harder about how we do the job of the humanities.  If we simply continue in an older vein - having small (vociferous) conversations amongst ourselves, in professional seminars and at conferences, through book reviews and in the specialist hard copy press - we will lose our place in the broader social dialog. If there is a 'crisis' in the humanities, it lies in how we have our public debates, rather than in their content. 

It seems to me that the solution to this problem is all around us, and that in order to address it, we need to remember that the role of the academic humanist has always been a public one - however mediated through teaching and publication.   By building blogging, and twitter, flickr, and shared libraries in Zotero, in to our research programmes - into the way we work anyway - we both get more research done, and build a community of engaged readers for the work itself.  We can do what we have always done, but do it better; as a public performance, in dialog amongst ourselves, and with a wider public.

The best (and most successful) academics  are the ones who are so caught up in the importance of their work, so caught up with their simple passion for a subject, that they publicise it with every breadth. Twitter and blogs, and embarrassingly enthusiastic drunken conversations at parties, are not add-ons to academic research, but a simple reflection of the passion that underpins it.  

A lot of early career scholars, in particular, worry that exposing their research too early, in too public a manner, will either open them to ridicule, or allow someone else to 'steal' their ideas.  But in my experience, the most successful early career humanists have already started building a form of public dialog in to their academic practise - building an audience for their work, in the process of doing the work itself.  

Perhaps the best example of this is Ben Schmidt, and his hugely influential blog: Sapping Attention.  His blog posts contributed to his doctorate, and will form part of his first book.  In doing this, he has crafted one of the most successful academic careers of his generation - not to mention the television consultation business, and world-wide intellectual network.

Or Helen Rogers, whose maintains two blogs: Conviction: Stories from a Nineteenth-Century Prison - on her own research; and also the collaborative blog, Writing Lives, created as an outlet for the work of her undergraduates. They bring together research and teaching, and in the process are building a substantial community of interest.

Or Adam Crymble and his blog - Thoughts on Public & Digital History - where he melds practical posts addressing straightforward DH problems, with substantial interventions in policy.  Crymble's recent appointment to a lectureship in digital history rested in large measure on his blog.  

The list could go on.  The Many Headed Monster, the collective blog authored by Brodie Waddell, Mark Hailwood,  Laura Sangha and Jonathan Willis, is rapidly emerging as one of the sites where 17th century British history is being re-written.   While Jennifer Evans is writing her next book via her blog, Early Modern Medicine.

The most impressive thing about these blogs (and the academic careers that generate them), is that there is no waste - what starts as a blog, ends as an academic output, and an output with a ready-made audience, eager to cite it.

For myself the point is that these scholars don't waste text, and neither do I.  If I give a talk, I turn it into a blog. Not everything is blogged, but the vast majority of the public presentations I make as part of my job, will be.  And while many of these texts will never contribute to an academic article, about half of them do.   As a result blogging has become part of my own contribution to what I think of as an academic public sphere.  It becomes a way of thinking in public and revising ones work, to make it better, in public.  And knowing that there is an audience (whatever its size), changes how one does it - forcing you to think a little harder about the reader, and to think a little harder about the standards of record keeping and attribution that underpin your research.  

One of my favourite blogging experiences involves embedding blogs in undergraduate assessment.  By forcing students to write 'publicly', their writing rapidly improves.  From being characterized by the worst kind of bad academic prose - all passive voice pomposity - undergraduate writing in blogs is frequently transformed in to something more engaging, simply written, and to the point.  From writing for the eyes of an academic or two,  students are forced to imagine (or actually confront) a real audience.  Blogging has the same effect on more professional academic writers - many of whom assume that if the content is good, the writing somehow doesn't matter.

But as importantly, blogs are part of establishing a public position, and contributing to a debate.  

Twitter is in some ways the same - or at least, like blogging, twitter is good for making communities, and finding collaborators; and letting other people know what you are doing.  But, it also has another purpose.  

Dan Cohen - the director of the Digital Public Library of America - always says about Twitter that the important thing is that at the end of the week, it makes you aware of all the publications  and developments, calls for papers, and conferences, you need to know about in order to keep up with your corner of the academy. It is not about what you had for breakfast.  It is about being on top of your field.

Between them, twitter and blogging just make good academic sense.  And while you need to avoid all the kittens and trolls, click bate and self-promoting gits, these forms of social media are rapidly evolving in to the places where the academic community is embodied.  They are doing the job of the seminar, and the letters page.  They are where our conversation is happening.

And on participating in this 'academic public sphere', there are only a few rules.  First - be yourself.  If you want credit, you have to own your material.  In other words, never be anonymous.  And second, remember that everything from Academia.edu, to Twitter, to Facebook and Flickr, is a form of publication, and should be taken seriously as such.  If you would not say it in an academic review, or in the questions following a public lecture, don't say it on Twitter.

And finally, keep track of it.  Use GoogleAnalytics, or something similar.  Know who you are talking to.  This involves nothing more challenging than cutting and pasting four lines of code, but provides more data, at a more granular level than you can possibly need.  

All of which is simply to say, that social media are building in to what feels like an increasingly coherent environment reflecting communities of interest - allowing us, online, to be just what we claim to be the rest of the time - a community of scholars.  The objections - which usually come down to the fear that someone will steal your ideas, your work, your credit - are best addressed by doing it in public.  And in the process there is every hope that we can rebuild the humanities as a wider public discussion, able to more effectively reach beyond the academy - to have 'impact'.  

Friday, 3 January 2014

Judging a book by its URLs

It will sound odd, but I have recently had a great time editing URLs.  Robert Shoemaker and I have have just finished a book for CUP, derived from the London Lives project, and called - London Lives: Poverty, Crime and the Making of a Modern City, 1690-1800. It is a long book (170,000 words) and each quote and reference in it is linked via a URL to the original document or article, book or web-resource used as evidence or to contextualize the argument.  It will be published as both an ebook and in hard copy, and the links need to be robust, and secure.  My estimate is that there are in the region of 4,000 URLs included in the manuscript (which was written collaboratively in PMWiki).  In the end, I found that I could identify an appropriate link for 98% of all footnote references, but then had to eliminate around 10% of these, as the relevant URL was just not useable.  The book took some nine years, and I am glad it is finished.

One of my final jobs was editing those 4000 URLs.   It took about three months work, spread over the last year, and I have just finished spending a week or so confirming what I hope will be their final form.  When I have told people about this work many have looked incredulous and suggested that this is the sort of technical implementation process that should be left to others.  A couple of otherwise nice people have suggested I dump this job on the shoulders of the nearest PhD student.  But for myself, it is precisely the kind of thing that an author should do for themselves.  And in doing it, two things kept coming to mind.  First was how the role of the scholar in creating a rigorous academic apparatus is a central part of the intellectual journey that academic writing involves - and that we should see the implementation of the online version of this in the light of the precise writing of footnotes and references that mark out good scholarship.  And second, that URLs encode a system of design and intent, online architecture and system of access, that signal the quality and permanence (the academic credibility and perceived audience) of historical materials online.  And that just as we have always sorted and judged scholarship by its form, we should think a bit harder about how the form of a URL can let us interrogate online materials.

On the first point, I do not know of much discussion of the joys of this kind of academic slog.  There is a lot of good writing on research and archives (by Carolyn Steedman and Arlette Farge among many others), on writing and thinking, but no-one talks much about the painstaking labour that goes in to turning a rough draft in to a final finished piece of scholarship.  And here I am really talking about generating accurate and fully comprehensive footnotes that reflect both the material cited, and the research journey that resulted in the main text.   This has become much easier with online catalogs and citation management packages, but nevertheless remains laborious and a reflection of our collective and individual commitment to a particular kind of evidenced discussion.  But for me it also represents my favourite compromise.  The writing of history is a wonderfully imaginative and creative process.  And in some respects we wish to judge the product of history writing as art.  Is it enjoyable to read? Is it convincing?  Does it do the job of good writing in liberating the readers' imagination?  In making these judgements we tend to appeal to a notion of 'value' that is cultural and that privileges dominant forms of authority.  This aspect of judgement is essentially romantic; with all the implications for western and elite hegemony embedded in that idea.  At the same time history writing is the result of simple hard work of a more technical kind - in the archives, in collating and collecting, re-ordering and interrogating data.  And it is valuable because it encompasses that hard work.  The beauty of the academic apparatus is that it evidences this and in the process generates a different measure of value.  In other words it is where quality is tied to a 'labour theory of value'.  I love the academic slog because it is where un-moored judgement is tied down to hard labour; and where value can be universalized in a common human experience (work).  In other words I really enjoyed editing 4000 URLs precisely because in them and their associated footnotes lies a claim to and evidence of the hard labour that underpins the book itself.

 At the same time, the process also taught me to read URLs differently.  Clearly coders and web designers do this as a matter of course.  But I am a historian and want to read URLs as a scholar, rather than as a programmer or designer.  And for me, the important thing is that URLs embed the structure of a site, making it plain to see for anyone willing to look hard; and that they are made up of both the character of a library reference, and a command directed at the new technology of discovery - the Internet .  There are just lots of different types of URL.

There are 'Search URLs' that include all the elements that  take the user past a collection to a specific object, but don't let you go directly there without the query.  And there are URLs that encode a cataloging hierarchy.  There are URLs that sift data, or work in your browser to change the data delivered, highlighting phrases or sifting material.  And there are URLs that encode licensing, passwords, and access information.  It is easy enough to find that the whole search journey that took you from a library catalog to an individual item is encoded directly in the URL, and even personalized to you, the machine you are using, or the forms of access you can deploy.  It is easy to find URLs that run on for hundreds of characters, each element divided by a '&' or a '%', or such.

But in creating robust reproducible links to credible historical materials most of these URLs are at least problematic if not useless.  If they include details for institutional access, or session information, they cannot be re-used by someone else.  These URLs are friable and fragile things and not fit for scholarly purposes.  And as a result, for the London Lives book we have been forced to eliminate all the links we originally hoped to include to forty or fifty different sites.  To take a single example, most archives structure their online collections with search in mind, making it difficult to link to a single item.  I spent a lot of time finding the catalog entry for every manuscript we cited in the London Metropolitan Archives, and Westminster Archives Centre, only to regretfully strip out the links when confronted by a complex URL that just did not look credible as a long term citation of the item itself.

Even in its simplest, and in the form recommended by the site for sharing a link, a London Metropolitan Archives URL looks like this:


Since we had consulted these items in their physical form in any case, it did not seem too problematic to leave out these links, but a shame nevertheless.  And likewise, with paywall material there seemed little point in dangling real access, and the promise of credible evidence, before the eyes of readers who would not be able to go beyond the login screen.  It seemed better to cite a specific item in combination with a general (unlinked) URL and date of consultation as reflecting our own research journey, rather than to promise access when we could not deliver it.

With few exceptions the URLs that have been retained (and there are still 4000 of them) address specific items with a specific ID, and usually run to 20 to 40 characters.  DOIs are not bad once you figure out their structure and reformulate them as they should be, rather than the way they are normally cited on journal web pages.


And Google Books creates a very nice URL once you strip out all the complex formatting instructions that are normally generated as part of a search and inserted after the main ID.  This is what a Google Books' URL looks like if you were to use the 'search' version:


And this URL will take to the same book:


 And the Eighteenth-century Short Title Catalog generates some of the most elegant URLs I have found:


And to a lesser extent, so does the Ethos collection of doctoral theses at the British Library.


And London Lives and the Old Bailey Online do pretty well on this score:


In part, I suspect that these issues would all disappear if I had a better sense of the layer of structure that lies beneath the WWW.  But for the moment I am keen to have a short, human-readable URL that looks like it will last longer than the session I am currently logged on for.   All of which simply takes me back to the joy of academic slogging and the importance of the academic apparatus as something that evidences hard work and opens up scholarship to credible criticism that goes beyond simple romantic appreciation and prejudice.

I know all too well that one of the skills of an academic is the ability to judge a book by its cover and the form of the text it contains.   For the online we need to embed URLs into precisely this process - and the joy of all that editing was that at the end of it, I feel I have learned to do just that.

Monday, 9 December 2013

Big Data for Dead People: Digital Readings and the Conundrums of Positivism

The following post is drawn from the text of a keynote talk I delivered at the CVCE conference on 'Reading Historical Sources in the Digital Age', held in Luxembourg on the 4th and 5th of December 2013. In the nature of these kinds of texts the writing is designedly rough, the proof reading rudimentary, and the academic apparatus largely absent.

This talk forms a quiet reflection on how the creation of new digital resources has changed the ways in which we read the past; and an attempt to worry at the substantial impact it is having on the project of the humanities and history more broadly. In the process it asks if the collapse of the boundaries between types of data - inherent in the creation of digital simulcra - is not also challenging us to rethink the 'humanities' and all the sub-disciplines of which it is comprised. I really just want to ask, if new readings have resulted in new thinking? And if so, whether that new thinking is of the sort we actually want?

As Lewis Mumford suggested some fifty years ago, most of the time:

‘… minds unduly fascinated by computers carefully confine themselves to asking only the kind of question that computers can answer...’

Lewis Mumford, “The Sky Line "Mother Jacobs Home Remedies",” The New Yorker, December 1, 1962, p. 148.

But, it seems to me that we can do better than that, but that in the process we need to think a bit harder than we have about the nature of the Digital History project.

Perhaps the obvious starting point is with the concept of the distant reading of text, and that wonderful sense that millions of words can be consumed in a single gulp. Emerging largely from literary studies, and in the work of Franco Moretti and Stephen Ramsay, the sense that text – or at least literature – can be usefully re-read with the tools of the digital humanities has been regularly re-stated with the all the hyperbole for which the Digital Humanities is so well known. And, within reason, that hyperbole is justified.

My favourite example of this approach is Ben Schmidt’s analysis of the dialogue in Mad Men, in which he compares the language deployed by the scriptwriters against the corpus of text published in that particular year drawn from Google books. In the process he illustrates that early episodes over-state the ‘performative’ character of the language, particularly in relation to masculinity – that the scriptwriters chose to depict male characters talking about the outside world and objects, more frequently than did the writers of the early fifties. And that in the later episodes of the series, they depict male characters over-using words associated with interiority, emotions and personal perceptions. What I like about this is that it forms one of the first times I have been really surprised by ‘distant reading’. I just had not clocked that the series was developing a theme along these lines – that it embedded a story of the evolution of masculinity from a performative to an interiorised variety. But once Schmidt used a form of distant reading to expose the transition it felt right, obvious and insightful. In Schmidt’s words: ‘the show's departures from the past… let us see just how much everything has changed, even more than its successes.’… at mimicking past language. The same could be done with the works of George Elliot or Tolstoy (who both wrote essentially ‘historical’ novels), and with them too, I look forward to being surprised. In other words, the existence of something like Google Books and the Ngram viewer - which Schmidt's work depends upon - actually can change the character of how we ‘read’ a sentence, a word, a phrase, a genre – by giving a norm against which to compare it. Is it a ‘normal’ word, for the date? or more challenging, for the genre? for the place of publication? for the word's place in the long string of words that make up an article or a book?

But having lauded this example, I think we also have to admit that in most stabs at distant reading seems to tell us what we already know.

There was an industrial revolution involving iron. There was a war in the 1860s and so on.

What surprises me most, is that I am not more surprised.

In part, I suspect the banal character of most ngrams and network analyses is a reflection of the extent to which books, indexes, and text, have themselves been a very effective technology for thinking about words. And that as long as we are using digital technology to re-examine text, we are going to have a hard time competing with two hundred years of library science, and humanist enquiry. Our questions are still largely determined by the technology of books and library science, so it is little wonder that our answers look like those found through an older techonology.

But, the further we move away from either the narrow literary cannon; and more importantly the code that is text, to include other types of readings of other types of data - sound, objects, spaces - I hope the more unusual and surprising our readings – both close and distant - might become. And it is not just text and objects, but also cultures. The current collection of digital material that forms the basis for most of our research is composed of the maudlin leavings of rich dead white men (and some rich dead white women). Until we get around to including the non-cannonical, the non-Western, the non-textual and the non-elite, we are unlikely to be very surprised.

For myself, I am wondering how we might relate non-text to text more effectively; and how we might combine - for historical purposes - close and distant reading into a single intellectual practise; how we might identify new objects of study, rather than applying new methodologies to the same old bunch of stuff. And just by way of a personal starting point, I want to introduce Sarah Durrant. She is not important. Her experience does not change anything, but she does provide a slightly different starting point from all the rich dead white men. And for me, she represents a different way of thinking through how to ask questions of computers, without simply asking questions we know computers can answer.

Sarah claimed to have found two bank notes on the floor of the coffee house she ran in the London Road, on the Whitsun Tuesday, 1871; at which point she pocketed them. In fact they had been lifted from the briefcase of Sydney Tomlin, in the entrance way of the Birkbeck Bank, Chancery Lane, a few days earlier.

We know what Sarah looked like. This image is part of the record of her imprisonment at Wandsworth Gaol for two years at hard labour, and is readily available through the website of the UK's National Archives. We have her image, her details, her widowed status, the existence of two moles - one on her nose and the other on her chin. We have her scared and resentful eyes staring at us from a mug shot. I don't have the skill to interpret this representation in the context of the history of portraiture, or the history of photography - but it creates a powerful if under-theorised alternative starting point from which to read text - and has the great advantage of not being ‘text’; or at least not being words.

But, we also have the words recorded in her trial.

And because we have marked up this material to within an inch of their life in XML to create layer upon layer of associated data, we also have something more.

In other words, for Sarah, we can locate her words, and her image, her imprisonment and experience, both in ‘text’ and in the leavings of the administration of a trial, as marked up in the XML. And because we have studiously been giving this stuff away for a decade, there is a further ‘reading’ that is possible, via an additional layer of XML provided by Magnus Huber and his team at the University of Giessen. He has marked up all the text that purports to encompass a ‘speech act’. And so we also have a further ‘reading’ of Sarah as a speaker, and not just any speaker, but a working class female speaker in her 60s.

And of course, this allows us to compare what she says, to other women of the same age and class, using the same words; with a bit of context for the usage.

So, we already have a few ‘readings’, including text, bureaucratic process, and purported speech.

From all of which we know that Sarah, moles and all, was convicted of receiving; and that she had been turned in by a Mrs Seyfert - a drunk, who Durrant had refused a hand-out. And we know that she thought of her days in relation to the Anglican calendar, which by 1871, was becoming less and less usual – and reflects the language of her childhood.

And, of course, we have an image of the original page on which that report was published – a ghost of the material leavings of an administrative process.

And just in case, we can also read the newspaper report of the same trial.

So far, so much text, with a couple of layers of XML, and the odd image. But we also know who was in Wandsworth Gaol with her on the census day in 1871.

And we know where Durrant had been living when the crime took place – in Southwark, at No 1 London Road.

We know that she was a little uncertain about her age, and we know who lived up one flight of stairs, and down another. Almost randomly, we can now know an awful lot about most nineteenth century Londoners, allowing us to undertake a new kind of 'close reading'.

From which it is a small step to The Booth Archive site posted by the London School of Economics, which in turn lets us know a bit more about the street and its residents.

‘a busy shopping street', with the social class of the residents declining sharply to the West - coded Red for lower middle class.

But we can still do a bit better than this. We can also do what linguists and literary scholars are doing to their own objects of study - we can take apart the trial, for instance, as a form of generic text using facilities such as Voyant Tools. Turning a ‘historical reading’ in to a linguistic one:

And, if the OCR of the Times Digital Archive was sufficiently good (which it isn’t) - we could have compared the trial account, with the newspaper account as a measurable body of text.

And as with Magnus Huber’s Corpus mark-up, using that linguistic reading of an individual trial as a whole, in relation to Google Books, we could both identify the words that make this trial distinctive, and start the process of contextualising them. We could worry, for instance, at the fact that the trial includes a very early appearance of a 'Detective' giving evidence, and suggesting that Sarah’s experience was unusual and new - providing a different reading again:

In other words, our ability to do a bit of close reading - of lives, of people, of happenstance, and text, with a bit of context thrown in, has become much deeper than it was fifteen years ago.

But we can go further still. We could contextualise Sarah's experience among that of some 240,000 defendants like her, brought to trial over 239 years at the Old Bailey, and reported in 197,475 different accounts. We can visualise these trials by length, and code them for murder and manslaughter, or we could just as easily do it by verdict, or gender, punishment, or crime location. The following material is the outcome of a joint research project with William Turkel at the University of Western Ontario.

Sarah Durrant is here:

And in the process we can locate her experience in relation to the rise of ‘plea bargaining’ and the evolution of a new bureaucracy of judgement and punishment, as evidenced here:

Sarah’s case stood in the middle of a period during which, for the first time, large numbers of trials were being determined in negotiation with the police and the legal profession – all back-rooms and truncheons – resulting in a whole new slew of trials that were reported in just a few words. Read in conjunction with the unusual appearance of a ‘detective’ in the text, and her own use of the language of her youth, the character of her experience becomes subtly different, subtly shaded.

To put this differently, one of the most interesting things we can know about Sarah, is that she was confronted by a new system of policing, and a new system of trial and punishment, which her own language somehow suggests she would have found strange and hard to navigate. We also know that she was desperate to enter a plea bargain. "I know I have done wrong; but don't take me ... [to the station], or I shall get ten years"— pleading to be let go, in exchange for the two bank notes.

And in the end, it was the court's choice to refuse Durrant's plea for a bargain:

"THE COURT would not withdraw the case from the Jury, and stated, the case depended entirely upon the value of the things stolen. GUILTY of receiving— Two Years' Imprisonment."

In other words, Sarah’s case exemplifies the implementation of a new system of justice in which the state – the police and the court – took to themselves a new power to impose its will on the individual. And, it also exemplifies the difficulty that many people – both the poor and the old – must have had in knowing how to navigate that knew system.

But it also places her in a new system designed to ensure an ever more certain and rising conviction rate. And of course, we can see Sarah’s place in that story as well:

Even without the plea bargain, Sarah’s conviction was almost certain – coming as it did in a period during which a higher proportion of defendants were found guilty than at almost any other time before or since. Modern British felony conviction rates are in the mid-70 percent range.

Or alternatively, we can go back to the trial text and use it to locate similar trials – ‘More like this’ – using a TF-IDF – text frequency/inverse document frequency methodology, to find the ten or hundred most similar trials.

In fact these seem to be noteworthy mainly for the appearance of bank-notes and female defendants, and the average length of the trials – none, for instance, can be found among the shorter plea bargains trials at the bottom of the graph, and instead are scattered across the upper reaches, and are restricted to the second half of the nineteenth century - sitting amongst the trials involving the theft of 'bank notes'; and theft more generally, which were themselves, much more likely than crimes of violence, to result in a guilty verdict. At a time when the theft resulted in a conviction rates of between 78% and 82%; killings had a conviction rate of between 41% and 57%.

In other words, applying TF-IDF methodologies provides a kind of bridge between the close and distant readings of Sarah's trial.

And of course, while I don’t do topic modelling, you could equally apply this technique to the text, by simply thinking of the trials as ‘topics’; and I suspect you would find similar results.

But we can read it in other ways as well. We can measure, for instance, whether the trial text has a consistent relationship with the trial outcome - did the evidence naturally lead to the verdict? This work is the result of a collaboration between myself and Simon DeDeo and Sara Klingenstein at the Santa Fe Institute (see Dedeo, et al, 'Bootstrap Methods for the Empirical Study of Decision-Making and Information Flows in Social Systems', for a reflection of one aspect of this work). And in fact, trial texts by the 1870s did not have a consistent relationship to verdicts - probably reflecting again the extent to which legal negotiations were increasingly being entered in to outside the courtroom itself, in police cells, and judge’s chambers - meaning the trials themselves become less useful as a description of the bureaucratic process:

Or, coming out of the same collaboration, we can look to alternative measures of the semantic content of each trial - in this instance, a measure of the changing location of violent language. This analysis is based on a form of ‘explicit semantics’, using the categories of Roget’s thesaurus to group words by meaning. Durrant's trial was significantly, but typically, for 1871, unencumbered with the language of violence. Whereas, seventy years earlier, it would as equally, be likely to contain descriptions of violence – even though it was a trial for that most white collar of crimes, receiving.

In other words, the creation of new tools and bodies of data, have allowed us to 'read' this simple text and the underlying bureaucratic event that brought it into existence, and arguably some of the social experience of a single individual, in a series of new ways. We can do ‘distant reading’, and see this trial account in the context of 127 million words - or indeed the billions of words in Google Books; and we can do a close reading, seeing Sarah herself in her geographical and social context.

In this instance, each of these readings, seems to reinforce a larger story about the evolution of the court, of a life, of a place - a story about the rise of the bureaucracy of the modern state, and of criminal justice. But it was largely by starting from a picture, a face, a stair of fear, that the story emerged.

But the point is wider than this. Reading text – close, distant, computationally, or immersively - is the vanilla sex of the digital humanities and digital history. It is all about what might be characterised as the 'textual humanities'. And for all the fact that we have mapped and photographed her, Sarah remains most fully represented in the text of her trial. But, if you want something with a bit more flavour we need to move beyond what was deliberately coded to text – or photographs – and be more adventurous in what we are reading.

In performance art, in geography and archaeology, in music and linguistics, new forms of reading are emerging with each passing year that seem to significantly challenge our sense of the ‘object of study’. In part, this is simply a reflection of the fact that all our senses and measures are suddenly open to new forms of analysis and representation - when everything is digital, everything can be read in a new way.

Consider for a moment:

This is the ‘LIVE’ project from the Royal Veterinary College in London, and their ‘Haptic Interface’. In this instance they have developed a full scale ‘haptic’ representation of a cow in labour, facing a difficult birth, which allows students to physically engage and experience the process of manipulating a calf in situ. I haven’t had a chance to try this, but I am told that it is a mind altering experience. But for the purpose of understanding Sarah’s world, it also presents the possibility of holding the banknotes, of diving surreptitiously into the briefcase, of feeling the damp wall of her cell, and the worn wooden rail of the bar at the court. It suggests that reading can be different; and should include the haptic - the feel and heft of a thing in your hand. This is being coded for millions of objects through 3d scanning; but we do not yet have an effective way of incorporating that 3d text in to how we read the past.

The same could be said of the aural - that weird world of sound on which we continually impose the order of language, music and meaning; but which is in fact a stream of sensations filtered through place and culture.

Projects like the Virtual St Paul's Cross, which allows you to ‘hear’ John Donne’s sermons from the 1620s, from different vantage points around the square, changes how we imagine them, and moves from ‘text’ to something much more complex, and powerful. And begins to navigate that normally unbridgeable space between text and the material world.

For Sarah, my part of a larger project to digitise andlink the records of nineteenth-century criminal transportation and imprisonment, is to create a soundscape of the courtroom where Sarah was condemned; and to re-create the aural experience of the defendant - what it felt like to speak to power, and what it felt like to have power spoken at you from the bench. And in turn, to use that knowledge, to assess who was more effective in their dealings with the court, and whether, having a bit of shirt to you, for instance, effected your experience of transportation or imprisonment.

All of which is to state the obvious. There are lots of new readings that change how we connect with historical evidence – whether that is text, or something more interesting. In creating new digital forms of inherited culture - the stuff of the dead - we naturally innovate, and naturally enough, discover ever changing readings.

And in the process it feels that we are slowly creating an environment like Katy Börner's notion of a Macroscope - that set of tools, and digital architecture, that allows us to see small and large, at one and the same time; to see Sarah Durrant's moles, while looking at 127 million words of text.

But, before I descend in to that somewhat irritating, Digital Humanities cliché where every development is greeted as both revolutionary, and life enhancing - before I become a fully paid up techno-utopian, I did want to suggest that perhaps all of these developments still leave us with the problem I started with - that the technology is defining the questions we ask. And it is precisely here, that I start to worry at the second half of my title: the 'conundrums of positivism'.

About four years ago - in 2009 or so, I was confronted by something I had not expected. At that time, Bob Shoemaker and I had been working on digitising trial records and eighteenth-century manuscripts for the Old Bailey and London Lives projects for about ten years. In the Old Bailey we had some 127 million words of accurately transcribed text and in the London Lives site, we had 240,000 pages of manuscript materials reflecting the administration of poverty and crime in eighteenth-century London - all transcribed and marked up for re-use and abuse by a wider community of scholars. It all felt pretty cool to me.

But for all the joys of discovery and search digitisation made possible, and the joys of representing the underlying data statistically; none of it had really changed my basic approach to historical scholarship. I kept on doing what I had always done - which basically involved reading a bunch of stuff, tracing a bunch of people and decisions across the archives of eighteenth-century London, and using the resulting knowledge to essentially commentate on the wider historiography of the place and period. My work was made easier, the publications more fully evidenced, and new links and associations were created, that did substantially change how one might look at communities and agency. But, intellectually, digitisation, the digital humanities, did not feel different to me, than had the history writing of twenty years before – to that point, I found myself remarkable un-surprised. But then something happened.

About that time, Google Earth was beginning to impact on geography. With its light, browser based approach to GIS, it had allowed a number of people to create some powerful new sites. Just in my own small intellectual backyard, people like Richard Rogers and a team of collaborators out the National Library of Scotland, were building sites that allowed historical maps to be manipulated, and populated with statistical evidence, online, and in a relatively intuitive Google maps interface. And this was complemented by others, such as the New York Public Library warping site.

It was an obvious thing to want to do something similar for London. And it was a desire to recreate something like this, that led to the Locating London's Past, a screenshot of which I have used already a couple of times. The site used a warped and rectified version of John Rocque's 1746 map of London, in association with the first 'accurate' OS map of the same area, all tied up in a Google Maps container, to map 32,000 place names, and 40,000 trials, and a bunch of other stuff.

But this was where I had my comeuppance. Because in making this project happen, I found myself working with Peter Rauxloh at the Museum of London Archaeological Service, and several of his colleagues - all archaeologists of one sort or another. And from the moment we sat down at the first project meeting, I realised that I was confronted with something that fundamentally challenged my every assumption about history and the past. What shocked me was that they actually believed it.

Up till then it had been a foundational belief of my own, that while we can know and touch the leavings of the dead, the relationship between a past 'reality' and our understanding of it was essentially unknowable - that while we used the internal consistency of the archive to test our conclusions, and in order to build ever more compelling descriptions and explanations of change - actually, we were studying something that was internally consistent, but detached from a knowable reality. In most cases, we were studying 'text', and text alone - with its at least ambiguous relationship to either the mind of the author (whatever that is), and certainly an ambiguous relationship to the world the author inhabited.

Confronted by people happy to define a point on the earth's surface as three simple numbers, and to claim that it was always so, was a shock. This is not to say that the archaeologists were being naïve, far from it, but that having been trained up as a text historian - essentially a textual critic - in those meetings I came face to face with the existence of a different kind of knowing. And, of course, this was also about the time that 'culturomics' was gaining extensive international attention; with its claim to be able to 'read' history from large scale textual change, and to create a 'scientific' analysis of the past. Lieberman Aiden and Michel claim that the process of digitisation, has suddenly made the past available for what they themselves describe as 'scientific purposes

In some respects, we have been here before. In the demographic and cliometric history so popular through the 1970s and 80s, extensive data sets were used to explore past societies and human behaviour. The aspirations of that generation of historians were just as ambitious as are those of the creators of culturomics. But, demography and cliometrics started from a detailed model of how societies work, and sought to test that model against the evidence; revising it in light of each new sample and equation.

The difference with most 'big data' approaches and culturomics is that there is no pretence to a model. Instead, their practitioners seek to discover patterns in the entrails of human leavings hoping to find the inherent meanings encoded there. What I think the scientific community - and quite frankly most historians - finds so compelling is that like quantitative biology and DNA analysis, big data is using one of the controlling metaphors of 20th-century science, 'code breaking' and applying it to a field that has hitherto resisted the siren call of analytical positivism.

Since the 1940s the notion that 'codes' can be cracked to reveal a new understanding of 'nature' has formed the main narrative of science. With the re-description of DNA as just one more code in the 1950s, wartime computer science became a peacetime biological frontier. In other words, what both textual ‘big data’, and the spatial turn, bring to the table is a different set of understandings about the relationship between the historical 'object of study', and a knowable human history; all expressed in the metaphor of the moment - code.

We can all agree that text and objects and landscape form the stuff of historical scholarship, and I suspect that none of us would want to put an exclusionary boundary around that body of stuff. But simply because the results of big data analysis are represented in the grammar of maths (and in 'shock and awe' graphics); or in hyper-precise locations referenced against the modern earth's surface, there is an assumption about the character of the 'truth' the data gives us access to. One need look no further than the use of 'power law' distributions - and the belief that their emergence from raw data reflects an inherently 'natural' phenomenon - to begin to understand how fundamentally at odds traditional forms of historical analysis - certainly in the humanities - is from the emerging 'scientific' histories associated with 'big data'.

But, it is not really my purpose to criticise either the Culturomics team, or archaeologists and geographers (who are themselves engaged in their own form of auto-critique). Rather I just want to emphasise that in choosing to move towards a 'big data' approach - new ways of reading the past - and in adopting the forms of representation and analysis that come with big data, all of us are naturally being pushed subtly towards a kind of social science, and a kind of positivism, which has been profoundly out of favour for at least the last thirty years.

In other words, there seems to me to be a real tension between the desire on the one hand to include the 'reading' of a whole new variety of data in to the process of writing history; and, on the other, the extent to which each attempt to do so, tends to bring to the fore a form of understanding that is at odds with much of the scholarship of the last forty years. We are in danger of giving ourselves over to what sociologists refer to as 'problem closure' - the tendency to reinvent the problem to pose questions that available tools and data allow us to answer - or in Lewis Mumfords words, ask questions we know that computers can answer.

It feels to me as if our practise as humanists and historians is being driven by the technology, rather than being served by it. And really, the issue is that while we have a strong theoretical base from which to critique the close reading of text - we know how complex text is - we do not have the same theoretical framework within which to understand how to read a space, a place, an object, or the inside of a pregnant cow - all suddenly mediated and brought together by code - or to critique the reading of text at a distance. And as importantly, even if there are bodies of theory directed individually at each of these different forms of stuff (and there are); we certainly do not have a theoretical framework of the sort that would allow us to relate our analysis of the haptic, with the textual, the aural and the geographical. Having built our theory on the sands of textuality, we need to re-invent it for the seas of data.

But to come to some kind of conclusion: history is not the past, it is a genre constructed by us from practises first delineated during the enlightenment. Its forms of textual criticism, its claims to authority, its literary conventions, the professional edifice which sifts and judges the product; its very nature and relationship with a reading and thinking public; its engagement with memory and policy, literature and imagination, are ours to make and remake as seems most useful.

For myself, I will read anew, and use all the tools of big data, of ngrams and power laws; and I will publish the results with graphs, tables and GIS; but I refuse to forget that my object of study, my objective, is an emotional, imaginative and empathetic engagement with Sarah Durrant, and all the people like her.

Sunday, 8 September 2013

Wood and Memory

 The post that follows is personal.  It is in the tradition of memoir-like and self-revealing blogging; and I am posting it here because this blog started off as a hybrid that was intended to encompass both my professional and private personae.  In the last eighteen months or so it has rather evolved in to a more professional beast, with digital history and policy overwhelming more personal topics.  But this transition was not intended.  In other words, if you are looking for a nice post on 'Big Data', or the politics of Open Access, please look away now.

Wood and Memory

I did not know my uncle very well.  Richard Pozzini ('Dicky') was my mother's much younger brother, and I only knew him for a couple of years when I was ten and eleven, and he was in his late twenties.  At the age of 28, in the late 1960s  he killed himself.   As a child, my brother and sisters and I were protected from the full impact of his death, and it was seldom spoken of.  A couple of years later, Virgil and Gina, Dicky's mother and father, my grandparents, moved into a small apartment above a garage attached to my parent's house in San Francisco, and for the next eight or nine years, I lived in and out of their home.  I never heard my grandfather, Virgil (Nonno), mention Dicky in all those years, nor did Dicky's mother, Gina (Nonni), then, or in the thirty years to her death in 2001, ever voluntarily discuss him without being prompted.  Nor has my mother, Dicky's older sister, talked much about him - few fond memories of a shared childhood.  He was always present, in carved objects and panels, in the models he made as a child, but mainly in the silence he left behind.  
Gina, Florence, Virgil and Richard 'Dicky' Pozzini. c.1949.

But I remember Dicky, and I want to record my memories of him, and just take some time to think about him, and to do something to memorialise a wonderful and creative life that nevertheless ended in despair.

Dicky did many things - he was a farmer, and a cyclist, a photographer, a student of Italian literature, a member of the Peace Corp in Brazil and an artist; but my memories of him are as a student of nature and a maker of things.

A series of a bird sculptures made around 1968 or 69.

He grew up on the smallholding his parents Virgil and Gina bought in the early 1930s, as immigrants trying to recapture the rural lifestyle they had left behind in Northern Italy: fourteen acres of grapes and walnut trees in El Campo, Woodbridge, just outside Lodi in the Central Valley of California.  The area is now famous for its wine, but then it felt like the dry, hot factory floor of California agriculture.  Dicky's parents made a hard scrabble living combining wine making and market gardening with whatever work came to their very skillful hands.  In my memory they seemed to be experimenting with a new crop, a new strategy to make money every year.  Virgil made all their furniture, and Gina made all their clothes.  Dinner was as likely to feature song birds or hare, as meat from a butcher. And while there was always food on the table, it never felt like there was a lot of money in the bank. 

The 'Ranch' was really just a classic California subdivision, with a small bungalow, a tank house, a windmill (which had been replaced by an electric pump by the 1960s), a well and a few out-buildings.  And it was never big enough to support them as an exclusively agricultural concern.  Gina went to work in the fruit (processing peaches and cherries through the hot nights of summer) and Virgil made furniture and fitted kitchens for his better-off neighbours in the workshop.

Dicky grew up driving tractors, and working in the shop, helping with the wine; fishing and hunting the river trout and small birds of the neighbourhood; and just getting on with all the gathering and processing that a small holding required.  It was a life of making and repairing, processing and planning for the next season; and it required huge imagination as well as simple hard work.  One of my earliest memories of Dicky is helping him make sausages in the tank house kitchen (reserved for processing large batches of food).  While I was allowed to turn the handle, he stood over the meat grinder expertly tying off each link as the meat filled the casing.  But it was not all work.  He also took us on adventures to the river, and taught us how to make whistles from a a bit of reed.  He was the first person I ever saw use a cast-net for river fishing. 

But more than anything, from an early age he was a wonderful craftsman.  My brother still has the perfect miniature model cars Dicky made as a child (reflecting designs from the 1910s and 1920s), and my parents have the carvings of birds, and of a bunch of grapes and a rooster that he made.  The panels always hung on the wall in my grandparents' living room.

El Campo was anything but an easy life.  Virgil was a disappointed man who had spent twenty years living in all-male quarters in mining towns, and lodging houses - a victim of that largely male diaspora suffered by the Irish, Italians and Poles in the first decades of the last century.  He was given to alcohol  and socialising; and drank a fair portion of the wine he made.  He always hankered after a return to Italy, but following his marriage in 1929, he never went back.  He could be violent to his wife and children, and has always been held up by my mother as the canker at the heart of that particular knotty world.  She herself was largely protected from him by her own mother, but as a boy Dicky was expected to tough it out.  But, Dicky also had good friends, and a varied life at the heart of a small community.  His childhood certainly did not leave him fearful, or unambitious, frightened of the world, or unable to make friends.

After high school, Dicky went to Italy cycling through the North - to visit relatives, and I suspect, just to get out of rural America.  By all accounts the trip was a hard one - with little money and less support -  but it must have been enough to spark a desire to study Italian literature, which he did for a couple of years at San Francisco State.   I don't know how long this lasted, though he never finished the degree.  And soon enough, he was off to Brazil with the Peace Corp, and a couple of years later, following a short stay back at the Ranch, returned to Brazil to help set up an agricultural co-operative in the rain forest.

The co-operative eventually failed, and Dicky found himself back in San Francisco in his late twenties, and in need of a living.  With three or four false starts behind him, I suspect returning the US was very hard.  In that particular world, there was not much sympathy for failure. 

I remember driving up to Lodi with him in what must have been one of his first visits to see his parents after returning for good from South America.  We drove the eighty miles or so in the most dilapidated pick-up you can imagine - there was no key, and Dicky had to hot wire it every time we stopped.  Virgil and Gina had given up the Ranch the year before and had moved in to a small house in Lodi; and the afternoon was punctuated with Virgil and Dicky arguing, while Gina and I listened from the kitchen.   I will always remember Virgil saying, 'If I thought you were going to return, we would never have sold the Ranch'.

Dicky and me (aged 10) around 1967 or 68.

I doubt Dicky could ever have gone back to El Campo, but it seemed a powerful accusation to me at the age of nine or ten, still shocked by the loss of a much loved childhood haunt.

Dicky did not talk to me about the argument, and we drove back to San Francisco pretty much in silence (I was a kid).  And in the next month or so, Dicky went about setting up a wood working shop on Valencia Street, and tried to figure out how to make a living.

For the next year or so he was a frequent visitor to my parents house, and a regular presence in our lives.  I remember visiting Muir Woods with him once, and being allowed to use his Rolex camera for a few close ups of plants and insects.  At weekends there were the craft fairs where he sold the small pieces he was making by then.  It was San Francisco in 1968 and 1969, so all the work was psychedelic, loud and colourful.  As kids we were encouraged to make 'God's Eyes' and scented candles. 
I don't remember how long this lasted - probably not more than eighteen months.  My parents heard of Dicky's suicide late on a Sunday night, on our return from a long weekend trip to somewhere I don't remember. He had shot himself.

From here my memories are about how not to deal with a death like Dicky's.  The silence, the mismanaged attempts to protect children from hard adult emotions.  The awkward attempts to explain the truth; and to work through the guilt that wracked my parents' and grandparents' lives for years. 

But that tragedy and experience is not the point of this blog.  Instead, I just wanted to remember a person, Dicky Pozzini.  He was a craftsman and an artist, a socialist, and a generous soul.  Lives are made of people and experience, and Dicky is one of the people who has shaped my life, and whom I  simply want to remember.

These days I am doing as much woodworking as time allows; making stuff, because making stuff is important.  And every time I cut in to a piece of black walnut, with its evocative smell so familiar from my childhood, or let my gouge reveal the curve of bowl on the lathe, I think of Dicky.

Two fish sculptures made by my son, Nick and I in 2011.  The walnut was laid down to cure by Dicky Pozzini in 1968.

Wednesday, 22 May 2013

Stuff and Dead People

In recent years I find myself using the terms Stuff and Dead People in talks and titles more and more.  And as a historian I find myself conceptualising my work as being about Stuff inherited from Dead People.  Both expressions just sound right.  But it occurs to me that while I have a relatively clear sense of what I am intending to convey when I use these terms, their meanings might not be entirely apparent to others.  For this reason I thought I would have a stab at providing a couple of definitions, and a brief explanation of why I find these terms so useful

In my usage Stuff encompasses all the different varieties of artefact that can be used in practising history.  The term is in some respects an attempt avoid saying that our object of study is text or image, the manmade landscape or a piece of furniture, or indeed even data in its broadest form.  Instead, the use of Stuff is intended to signify that my practise as a historian actively seeks to make use of all of these things.  In terms of an epistemology, it is an attempt to distance myself from the categories of knowing that I (we) have inherited.  Stuff denies the taxonomies of knowing that define a museum object as being different to a pamphlet; a hedgerow different to a  teapot.  In part this usage reflects a profound disillusion with the narrow practise of textual comparison that lies at the heart of the Rankean tradition of historical analysis; but it is also a recognition that new technologies allow us to encompass new types of evidence in new ways.  When all Stuff is data it can be interrogated across boundaries that seemed natural and unbreachable just a few decades ago (between a hedgerow and a teapot). And while data itself is also a form of Stuff, and the transition from varieties of stuff to data is itself a process of creating a new taxonomy, there remains a rather wonderful transition involved.  There is an opportunity to rethink the meanings of Stuff, and without a new vocabulary it is all that much more difficult to do so.

In other words, Stuff is a simple rejection of post-enlightenment categorisation.

In some respects Dead People serves a similar function.  The use of Dead People avoids the traps of both identity and social modelling; while at the same time giving some shape to the object of historical study (human culture in the past).  Ironically some Dead People are still alive.  Henry Kissinger is apparently still breathing, but is nevertheless a figure of substantial historical analysis.  In my view he is undeniably Dead People.  At the same time, because cultural history seems to take longer to turn journalism in to books, Amy Winehouse and Michael Jackson may be dead, but they are not yet Dead People.

The term Dead People implies a refusal to describe the people of past as men or women, workers or citizens, artists or authors.  And in doing so, like Stuff, is used to signal that I do not find the traditional categories and boundaries that comprise social science very helpful.

Stuff we inherit from Dead People is my object of study as a historian. 

One could convey these ideas using other words.  The results might be a bit long winded, but could certainly point up my intention.  At the same time, the use of these terms serve a slightly wider function.  They form an attempt to de-centre the language of historical and social science authority that underpins the professional claims of academic historians as a whole.  By refusing to use the categories and languages of authority we inherited, I am self-consciously rejecting the systems that underpin the professional academic practise of history. 

It is perhaps a ridiculous comparison, but I like to think of the use of these terms as akin to the transition in thinking brought about by the evolution of labelling in quantum theory between the proposal of the eight-fold-way in the 1950s and the November Revolution of 1974.  Like most people of my generation and education, I was raised in an Einsteinian universe in which unusual phenomenon were described in the most secure of scientific jargon - we believed in the physics because it was expressed in the language of authority.  But in the 1970s, in particular, a whole new language of strangeness and charm was broadcast to a popular audience.  As a teenager schooled in an older tradition, this challenged me to rethink.  By using everyday words to describe complex phenomena I was forced to interrogate what I believed more closely than I would otherwise have done.  I don't understand quantum, but suspect I understand Einsteinian physics better as a result!   I use the terms Stuff and Dead People in the hope that their use will challenge listeners to question the labels and phenomena they think I am talking about.