Sunday 9 November 2014

Big Data, Small Data and Meaning

This post was originally written as the text for a talk I gave at a British Library Lab's event in London in early November 2014. In the nature of these things, the rhythms of speech and the verbal ticks of public speaking remain in the prose. It has been lightly edited, but the point remains the same.  

In recent months there has been a lot of talk about big stuff. Between 'Big Data' and calls for a return to ‘Longue durée’ history writing, lots of people seem to be trying to carve out their own small bit of 'big data'. This post represents a reflection on what feels to me to be an important emerging strategy for information interrogation driven by the arrival of 'big data' (a 'macroscope'); and a tentative step beyond that, to ask what is lost by focusing exclusively on the very large. 

And the place I need to start is with the emergence of what feels to me like an increasingly commonplace label – a ‘macroscope’ - for a core aspiration of a lot of people working in the Digital Humanities. 

As far as I can tell, the term ‘macroscope’ was coined in 1969 by Piers Jacob, and used as the title of his science fiction/fantasy work of the same year – in which the ‘macroscope’, a large crystal, able to focus on any location in space-time with profound clarity, is used to produce something like a telescope of infinite resolution. In other words, a way of viewing the world that encompasses both the minuscule, and the massive. The term was also taken up by Joel de Rosnay and deployed as the title of a provocative book on systems analysis first published in 1979. The label has also had a long and undistinguished afterlife as the trademark for a suite of project management tools – a ‘methodology suite’ - supported by the Fujistu Corporation. 

But I think the starting point for interest in the possibility of creating a ‘macroscope’ for the Digital Humanities, comes out of computer science, and the work of Katy Börner from around 2011.
Her designs and advocacy for the development of a ‘Plug and Play Macroscope’, seems to have popularised the idea to a wider group of Digital Humanists and developers. To quote Börner

'Macroscopes provide a "vision of the whole," helping us "synthesize" the related elements and detect patterns, trends, and outliers while granting access to myriad details. Rather than make things larger or smaller, macroscopes let us observe what is at once too great, slow, or complex for the human eye and mind to notice and comprehend.' (Katy Börner, ‘Plug-and-Play Macroscopes’, Communications of the ACM, Vol. 54 No. 3, Pages 60-6910.1145/1897852.1897871)

In other words, for Börner, a macroscope is a visualisation tool that allows a single data point, to be both visualised at scale in the context of a billion other data points, and drilled down to its smallest compass. This was not a vision or project initially developed in the humanities. Instead it was a response to the conundrums of ‘Big Data’ in both STEM academic disciplines, and the wider commercial world of information management. But more recently, a series of ‘macroscope’ projects have begun to emerge from within the humanities, tied to their own intellectual agendas, and subtly recreating the idea with a series of distinct emphases. 

Perhaps the project most heavily promoted recently, is Paper Machines, created by Jo Guldi and Chris Johnson-Robertson – and the MetLab at Harvard. This forms a series of visualisation tools, built to work with Zotero, and ideally allowing the user to both curate a large scale collection of works, and explore its characteristics through space, time and word usage. In other words, it is designed to allow you to build your own Google Books, and explore. There are problems with Paper Machines, and most people I know have struggled to make it work consistently. But it rather nicely builds on the back of functionality made available through Zotero, and effectively illustrates what might be described as a tool for ‘distant reading’ that encompasses elements of a ‘macroscope’. 

What is most interesting about it, however, is the use its creators make of it in seeking to shift a wider humanist discussion from one scale of enquiry to another. Last month, to great fanfare, CUP published Jo Guldi and David Armitage’s History Manifesto, which argues that once armed with a ‘macroscrope’ – Paper Machines in their estimation historians should pursue an analysis of how ‘big data’ might be used to re-negotiate the role of the historian – and the humanities more generally. Basically, what Guldi and Armitage are calling for through both the Manifesto and through Paper Machines, is the re-invention of ‘Longue durée’ history – telling ever larger narratives about grand sweeps of historical change, encompassing millennia of human experience. And to do this in pursuit of taking on the mantle of a public intellectual, able to speak with greater authority to ‘power’. 

In the process they explicitly denigrate notions of ‘micro-history’ as essentially irrelevant. At one and the same time, they seem to me to celebrate the possibility of creating a ‘macroscope’, while abjuring half its purpose. What we see in this particular version of a ‘macroscope’ is a tool that privileges only one setting on the scale between a single data point, and the sum of the largest data set we can encompass. In other words, by seeking the biggest of big stories, it is missing the rest. 

Perhaps the other most eloquent advocate for a ‘macroscope’ at the minute is Scott Weingart. With Shawn Graham and Ian Milligan, he is writing a collective online ‘book’ entitled, Big Digital History: Exploring Big Data through a Historian’s Macroscope. The book is a nice run through of digital humanist tools, but the important text from my perspective is a blog post Weingart published on the 14 September 2014. The post was called: The moral role of DH in a data-driven world; and in it, Weingart advocates a very specific vision of a ‘macroscope’, in which the largest scale of reference and view is made intelligible through the application of a formal version of network analysis. 

Weingart is a convincing advocate for network analysis, performed in light of some serious and sophisticated automated measures of distance and direction. And his work is a long way ahead of much of the naïve and unconvincing use of network visualisations current in large parts of the Digital Humanities. Weingart also makes a powerful case for where a limited number of DH tools – primarily network analysis and topic modelling - could be deployed in re-engaging the ‘humanities’ with a broader social discussion. 

Again, like Guldi and Armitage, Weingart seeks in 'Big Data' a means through which the Humanities can ‘speak to power’. As with the work of Armitage and Guldi, the pressing need to turn Digital Humanities to political account appears to motivate a search for large scale results that can be deployed in competition with the powerful voices of a positivist STEM tradition. My sense is that Weingart, Armitage and Guldi are all essentially scanning the current range of digital tools, and selectively emphasising those that feel familiar from the more ‘Social Science’ end of the Humanities. And that having located a few of them, they are advocating we adopt them in order to secure our place at the table. 

In other words, there is a cultural/political negotiation going on in these developments and projects that is driven by a laudable desire for ‘relevance’, but which effectively moves the Humanities in the direction of a more formal variety of Social Science. 

Others still, are arguably doing some of the same work, but using a different language, or at the least seeking a different kind of audience. Jerome Dobson, for example, has recently begun to describe the use of Geographical Information Systems (GIS) in historical geography, as a form of ‘macroscope’. This usage doesn’t come freighted with the same political claims as are current in Digital Humanities, but seem to me an entirely reasonable way of highlighting some of the global ambitions – and sensitivity to scale - that are inherent in GIS. The notion - perhaps fostered most fully by Google Earth - that you can both see the world in its entirety, as well as zoom in to the smallest detail, seems at one with a data driven ‘macroscope’. But, again, the scale most geographers want to work with is large – patterns derived from billions of data points. And again, the siren call of GIS, tends to pull humanist enquiry towards a specific form of social science. 

And finally, we might also think of the approach exemplified in the work of Ben Schmidt as another example of a ‘macroscope’ approach – particularly his ‘prochronism’ projects. These take individual words in modern cinema and television scripts that purport to represent past events – things like Downton Abbey and Mad Men - and compares them to every word published in the year they are meant to represent. 

Building on Google Books and Google Ngrams, Schmidt is effectively mixing scales of analysis at the extremes of ‘big data’, on the one hand – all words published in a single year – and small data, on the other. Of all the examples mentioned so far, it is only Schmidt who is actually using the functionality of a ‘macroscope’ effectively, making it all the more ironic that he doesn’t adopt the term. 

And almost uniquely in the Digital Humanities – a field equally remarkable for its febrile excitement, and lack of demonstrable results – Schmidt’s results have been starkly revealing. My favourite example, is his analysis of the scripts of Mad Men, which illustrates that early episodes referencing the 1950s, overuse language associated with the ‘performance’ of masculinity – words that reflect ‘behaviour’. And that later episodes, located in the 1970s, overuse words reflecting the internalised emotional experience of masculinity. For me this revealed beautifully the larger narrative arc of the programme in a way that had not been obvious prior to his work. Schmidt has little of the wider agenda to influence policy and politics evident in that of Armitage, Guldi and Weingart, but ironically, it is his work that is having some of the greatest extra-academic impact, via the anxiety it has created in the script writers of the shows he analyses. 

All of which is simply to say that playing with and implementing ideas around a ’macroscope’ is quite popular at the moment. And a direction of travel which, with caveats, I wholly support. But it also leaves me in something of a conundrum. 

Each of these initiatives, with the possible exception of Schmidt’s work, seems to locate themselves somewhere other than the Humanities I am familiar with. And this seems odd. Issues of scale are central to this. Claiming to be doing ‘big history’ sounds exciting; while claiming that more formal ‘network analysis’, will answer the questions of a humanist enquiry, appears to create a bridge between disciplines – allowing Humanists and more data driven parts of the Social Sciences to share a methodology and a conversation. But with the exception of Schmidt’s work, these endeavours seem to be privileging particular types of analysis – Social Science types of analysis – over more traditionally Humanist ones. 

In some ways, this is fine. I have discovered to my own benefit, that working with ‘Big Data’ at scale and sharing methodologies with other disciplines is both hugely productive, and hugely fun. To the extent that ‘big stories’ and new methodologies provide the justification for collaborating with researchers from a variety of disciplines – statisticians, mathematicians and computer scientists – they are wholly positive, and a simple ‘good thing’. 

And yet… I find myself feeling that in the rush to define how we use a ‘macroscope’, we are losing touch with what humanist scholars have traditionally done best. 

I end up is feeling that in the rush to new tools and ‘Big Data’ Humanist scholars are forgetting what they spent much of the second half of the twentieth century discovering – that language and art, cultural construction, human experience, and representation are hugely complex – but can be made to yield remarkable insight through close analysis. In other words, while the Humanities and ‘Big Data’ absolutely need to have a conversation; the subject of that conversation needs to change, and to encompass close reading and small data. 

The Stanford Humanities Centre defines the ‘Humanities’ as: 

'…the study of how people process and document the human experience. Since humans have been able, we have used philosophy, literature, religion, art, music, history and language to understand and record our world.'

Which makes the Humanities sound like the most un-exciting, ill-defined, unsalted, intellectual porridge ever. And yet, when I think about the scholarly works that have shaped my life, there is none of this intellectual cream of wheat. 

Instead, there are a series of brilliant analyses that build from beautifully observed detail at the smallest of scales. I look back to the British Marxist tradition in history – to Raphael Samuel and Edward Thompson – and what I see are closely described lives, built from fragments and details, made emotionally compelling by being woven into ever more precise fabrics of explanation. 

A gesture, a phrase, a word, an aching back, a distinctive tattoo. 'My dearest …. Remember when…' 

The real power of work in this tradition, lay in its ability to deploy emotive and powerful detail in the context of the largest of political and economic stories. And the political project that underpinned it, was not to ‘speak to power’, but to mobilise the powerless, and democratise identity and belonging. With Thompson’s liquid prose, a single poor, long dead framework knitter affected more change than any amount of more formal economic history. 

Or I think of the work of Pierre Bourdieau, Arlette Farge and de Certeau, and the ways in which they again use the tiny fragments of everyday life - the narratives of everyday experience - to build a compelling framework illustrating the currents and sub-structures of power. 

Or I think of Michel Foucault, who was able to turn on its head every phrase and telling line – to let us see patterns in language – discourses – that controlled our thoughts. Foucault profoundly challenged us to escape the limits of the very technologies of communication and analysis we used; and to see in every language act, every phrase and word, something of politics. 

By locating the use of a ‘macroscope’ at the larger scale, seeking the Longue durée, and the ear of policy makers, recent calls for how we choose to deploy the tools of the Digital Humanities appear to deny the most powerful politics of the Humanities. If today we have a public dialogue that gives voice to the traditionally excluded and silenced – women, and minorities of ethnicity, belief and dis/ability – it is in no small part because we now have beautiful histories of small things. In other words, it has been the close and narrow reading of human experience that has done most to give voice to people excluded from ‘power’ by class, gender and race. 

Besides simply reflecting a powerful form of analysis, when I return to those older scholarly projects I also see the yearning for a kind of ‘macroscope’. Each of these writers strive to locate the minuscule in the massive; the smallest gesture in its largest context; to encompass the peculiar and eccentric in the average and statistically significant. 

What I don’t see in modern macroscope projects is a recognition of the power of the particular; or as William Blake would have it: 

To see a World in a grain of sand, 
And a Heaven in a wild flower...
                               Auguries of Innocence (1803, 1863).

Current iterations of the idea of a macroscope, with all their flashy, shock and awe visualisations, probably score over these older technologies of knowing in their sure grasp of data at scale, but in the process they seem to lose the ability to refocus effectively. 

For all the promise of balancing large and small scales, the smaller and particular seem to have been ignored. Ever since the Apollo 17 sent back its pictures of earth as a distant blue marble, our urge towards the all-inclusive, global and universal has been irresistible. I guess my worry is that in the process we are losing the ability to use fine detail in the ways that make the work of Thompson and Bourdieau, Foucault and Samuel, so compelling. 

So, by way of wending towards some kind of inconclusive conclusion. I just want to suggest that if we are to use the tools of 'Big Data' to capture a global image, it needs to be balanced with the view from the other end of the macroscope (along with every point in between). 

In part this is just about having self-confidence as humanist scholars, and ironically serving a specific role in the process of knowing, that people in STEM are frequently not very good at. 

Several recent projects I was privileged to participate in, involved some hugely fun work with mathematicians and information scientists exploring the changing linguistic patterns found in the Old Bailey trials – all 127 million words worth. And after a couple of years of working closely with a bunch of brilliant people, what I gradually realised was that while mathematicians do a lot of ‘close reading’ – of formulae and algorithms - like most scientists, they are less interested than I am in the close reading of a single datum. In STEM cleaning data is a chore. Geneticists don’t read the human genome base by base; and our knowledge of the Higgs Boson is built on a probability only discovered after a million rolls of the dice, with no one really looking too carefully at any single one. 

In many respects ‘big data’ actually reinforces this tendency, as the assumption is that the ‘signal’ will come through, despite the noise created by outliers and weirdness. In other words, ‘Big Data’ supposedly lets you get away with dirty data.  In contrast, humanists do read the data; and do so with a sharp eye for its individual rhythms and peculiarities – its weirdness. 

In the rush towards 'Big Data' – the Longue durée, and automated network analysis; towards a vision of Humanist scholarship in which Bayesian probability is as significant as biblical allusion, the most urgent need seems to me to be to find the tools that allow us to do the job of close reading of all the small data that goes to make the bigger variety. This is not a call to return to some mythical golden age of the lone scholar in the dusty archive – going gradually blind in pursuit of the banal. This is not about ignoring the digital; but a call to remember the importance of the digital tools that allow us to think small; at the same time as we are generating tools to imagine big. 

In relation to text, you would think this is easy enough. Easy enough to, like Ben Schmidt, test each word against its chronological bed-fellows; or measure its distance from an average for its genre. When I am reading a freighted phrase from the 1770s, like ‘pursuit of happiness’, I want to know that till then, ‘happiness’ was almost exclusively used in a religious context – ‘Eternal Happiness’ - and that its use in a secular setting would have caught in a reader’s mind as odd and different - new. We should be able to mark the moment when Thomas Jefferson allowed a single word to escape from one ‘discourse’ and enter another – to read that word in all its individual complexity, while seeing it both close and far. 

I know of no work designed to define the content of a ‘discourse’, and map it back in to inherited texts. I know of no projects designed with this notion in mind. And if you want a take home a message from this post, it is a simple call for ‘radical contextualisation’. 

 To do justice to the aspirations of a macroscope, and to use it to perform the Humanities effectively – and politically – we need to be able to contextualise every single word in a representation of every word, ever. Every gesture contextualised in the collective record all gestures; and every brushstroke, in the collective knowledge of every painting. 

Where is the tool and data set that lets you see how a single stroll along a boulevard, compares to all the other weary footsteps? And compares it in turn to all the text created along that path, or connected to that foot through nerve and brain and consciousness. Where is the tool and project that contextualises our experience of each point on the map, every brush stroke, and museum object? 

This is not just about doing the same old thing – of trying to outdo Thompson as a stylist, or Foucault for sheer cultural shock. My favourite tiny fragment of meaning – the kind of thing I want to find a context for - comes out of Linguistics. It is new to me, and seems a powerful thing: Voice Onset Timing – that breathy gap between when you open your mouth to speak, and when the first sibilant emerges. This apparently changes depending on who are speaking to – a figure of authority, a friend, a lover. It is as if the gestures of everyday life can also be seen as encoded in the lightest breathe. Different VOTs mark racial and gender interactions, insider talk, and public talk.

In other words, in just a couple of milliseconds of empty space there is a new form of close reading that demands radical contextualisation (I am grateful to Norma Mendoza-Denton for introducing me to VOT). And the same kind of thing could be extended to almost anything. The mark left by a chisel is certainly, by definition, unique, but it is also freighted with information about the tool that made it, the wood and the forest from which it emerged; the stroke, the weather on the day, and the craftsman. 

One of the great ironies of the moment is that in the rush to big data – in the rush to encompass the largest scale, we are excluding 99% of the data that is there. And if we are going to build a few macroscopes, I just want to suggest that, along with the blue marble views, we keep hold of the smallest details. And if we do so, looking ever more closely at the data itself – remembering that close reading can be hugely powerful - Humanists will have something to bring to the table, something they do better than any other discipline. They can provide a world of ‘small data’ and more importantly, of meaning, to balance out the global and the universal – to provide counterpoint in the particular, to the ever more banal world of the average.

Tuesday 15 July 2014

Doing it in public: Impact, blogging, social media and the academy


The text below is derived from a short talk I gave in February for the Library at the University of Sussex.  At the time (and in the text) I promised to post it as a blog, but never quite found the time. 


Impact is an awkward thing in British Higher Education.  Most of the time it feels like just one more bludgeon used to batter hapless academics into submission.  It is frequently shorthand for an agenda handed down from on high, privileging near-market research and the agendas of government.  And yet no one spends a lifetime researching, teaching and writing about something if they don't believe it is important - if they don't believe that what they do contributes to a better world. We all want to have 'impact'.  The question is how can we do so in a way that reflects our own values, rather than those of whatever government happens to be in power this week?

This question is all the more important because our traditional assumptions about how our work effects a broader social discourse seem increasingly threadbare.  When the print run of most monographs number just a few hundred copies (most of which disappear in to American research libraries, never to be read or used), and when journal articles proliferate beyond number because they serve the needs of big publishing, rather than academic dialogue - we need to think harder about how we do the job of the humanities.  If we simply continue in an older vein - having small (vociferous) conversations amongst ourselves, in professional seminars and at conferences, through book reviews and in the specialist hard copy press - we will lose our place in the broader social dialog. If there is a 'crisis' in the humanities, it lies in how we have our public debates, rather than in their content. 

It seems to me that the solution to this problem is all around us, and that in order to address it, we need to remember that the role of the academic humanist has always been a public one - however mediated through teaching and publication.   By building blogging, and twitter, flickr, and shared libraries in Zotero, in to our research programmes - into the way we work anyway - we both get more research done, and build a community of engaged readers for the work itself.  We can do what we have always done, but do it better; as a public performance, in dialog amongst ourselves, and with a wider public.

The best (and most successful) academics  are the ones who are so caught up in the importance of their work, so caught up with their simple passion for a subject, that they publicise it with every breadth. Twitter and blogs, and embarrassingly enthusiastic drunken conversations at parties, are not add-ons to academic research, but a simple reflection of the passion that underpins it.  

A lot of early career scholars, in particular, worry that exposing their research too early, in too public a manner, will either open them to ridicule, or allow someone else to 'steal' their ideas.  But in my experience, the most successful early career humanists have already started building a form of public dialog in to their academic practise - building an audience for their work, in the process of doing the work itself.  

Perhaps the best example of this is Ben Schmidt, and his hugely influential blog: Sapping Attention.  His blog posts contributed to his doctorate, and will form part of his first book.  In doing this, he has crafted one of the most successful academic careers of his generation - not to mention the television consultation business, and world-wide intellectual network.

Or Helen Rogers, whose maintains two blogs: Conviction: Stories from a Nineteenth-Century Prison - on her own research; and also the collaborative blog, Writing Lives, created as an outlet for the work of her undergraduates. They bring together research and teaching, and in the process are building a substantial community of interest.

Or Adam Crymble and his blog - Thoughts on Public & Digital History - where he melds practical posts addressing straightforward DH problems, with substantial interventions in policy.  Crymble's recent appointment to a lectureship in digital history rested in large measure on his blog.  

The list could go on.  The Many Headed Monster, the collective blog authored by Brodie Waddell, Mark Hailwood,  Laura Sangha and Jonathan Willis, is rapidly emerging as one of the sites where 17th century British history is being re-written.   While Jennifer Evans is writing her next book via her blog, Early Modern Medicine.

The most impressive thing about these blogs (and the academic careers that generate them), is that there is no waste - what starts as a blog, ends as an academic output, and an output with a ready-made audience, eager to cite it.

For myself the point is that these scholars don't waste text, and neither do I.  If I give a talk, I turn it into a blog. Not everything is blogged, but the vast majority of the public presentations I make as part of my job, will be.  And while many of these texts will never contribute to an academic article, about half of them do.   As a result blogging has become part of my own contribution to what I think of as an academic public sphere.  It becomes a way of thinking in public and revising ones work, to make it better, in public.  And knowing that there is an audience (whatever its size), changes how one does it - forcing you to think a little harder about the reader, and to think a little harder about the standards of record keeping and attribution that underpin your research.  

One of my favourite blogging experiences involves embedding blogs in undergraduate assessment.  By forcing students to write 'publicly', their writing rapidly improves.  From being characterized by the worst kind of bad academic prose - all passive voice pomposity - undergraduate writing in blogs is frequently transformed in to something more engaging, simply written, and to the point.  From writing for the eyes of an academic or two,  students are forced to imagine (or actually confront) a real audience.  Blogging has the same effect on more professional academic writers - many of whom assume that if the content is good, the writing somehow doesn't matter.

But as importantly, blogs are part of establishing a public position, and contributing to a debate.  

Twitter is in some ways the same - or at least, like blogging, twitter is good for making communities, and finding collaborators; and letting other people know what you are doing.  But, it also has another purpose.  

Dan Cohen - the director of the Digital Public Library of America - always says about Twitter that the important thing is that at the end of the week, it makes you aware of all the publications  and developments, calls for papers, and conferences, you need to know about in order to keep up with your corner of the academy. It is not about what you had for breakfast.  It is about being on top of your field.

Between them, twitter and blogging just make good academic sense.  And while you need to avoid all the kittens and trolls, click bate and self-promoting gits, these forms of social media are rapidly evolving in to the places where the academic community is embodied.  They are doing the job of the seminar, and the letters page.  They are where our conversation is happening.

And on participating in this 'academic public sphere', there are only a few rules.  First - be yourself.  If you want credit, you have to own your material.  In other words, never be anonymous.  And second, remember that everything from Academia.edu, to Twitter, to Facebook and Flickr, is a form of publication, and should be taken seriously as such.  If you would not say it in an academic review, or in the questions following a public lecture, don't say it on Twitter.

And finally, keep track of it.  Use GoogleAnalytics, or something similar.  Know who you are talking to.  This involves nothing more challenging than cutting and pasting four lines of code, but provides more data, at a more granular level than you can possibly need.  

All of which is simply to say, that social media are building in to what feels like an increasingly coherent environment reflecting communities of interest - allowing us, online, to be just what we claim to be the rest of the time - a community of scholars.  The objections - which usually come down to the fear that someone will steal your ideas, your work, your credit - are best addressed by doing it in public.  And in the process there is every hope that we can rebuild the humanities as a wider public discussion, able to more effectively reach beyond the academy - to have 'impact'.  


Friday 3 January 2014

Judging a book by its URLs

It will sound odd, but I have recently had a great time editing URLs.  Robert Shoemaker and I have have just finished a book for CUP, derived from the London Lives project, and called - London Lives: Poverty, Crime and the Making of a Modern City, 1690-1800. It is a long book (170,000 words) and each quote and reference in it is linked via a URL to the original document or article, book or web-resource used as evidence or to contextualize the argument.  It will be published as both an ebook and in hard copy, and the links need to be robust, and secure.  My estimate is that there are in the region of 4,000 URLs included in the manuscript (which was written collaboratively in PMWiki).  In the end, I found that I could identify an appropriate link for 98% of all footnote references, but then had to eliminate around 10% of these, as the relevant URL was just not useable.  The book took some nine years, and I am glad it is finished.

One of my final jobs was editing those 4000 URLs.   It took about three months work, spread over the last year, and I have just finished spending a week or so confirming what I hope will be their final form.  When I have told people about this work many have looked incredulous and suggested that this is the sort of technical implementation process that should be left to others.  A couple of otherwise nice people have suggested I dump this job on the shoulders of the nearest PhD student.  But for myself, it is precisely the kind of thing that an author should do for themselves.  And in doing it, two things kept coming to mind.  First was how the role of the scholar in creating a rigorous academic apparatus is a central part of the intellectual journey that academic writing involves - and that we should see the implementation of the online version of this in the light of the precise writing of footnotes and references that mark out good scholarship.  And second, that URLs encode a system of design and intent, online architecture and system of access, that signal the quality and permanence (the academic credibility and perceived audience) of historical materials online.  And that just as we have always sorted and judged scholarship by its form, we should think a bit harder about how the form of a URL can let us interrogate online materials.

On the first point, I do not know of much discussion of the joys of this kind of academic slog.  There is a lot of good writing on research and archives (by Carolyn Steedman and Arlette Farge among many others), on writing and thinking, but no-one talks much about the painstaking labour that goes in to turning a rough draft in to a final finished piece of scholarship.  And here I am really talking about generating accurate and fully comprehensive footnotes that reflect both the material cited, and the research journey that resulted in the main text.   This has become much easier with online catalogs and citation management packages, but nevertheless remains laborious and a reflection of our collective and individual commitment to a particular kind of evidenced discussion.  But for me it also represents my favourite compromise.  The writing of history is a wonderfully imaginative and creative process.  And in some respects we wish to judge the product of history writing as art.  Is it enjoyable to read? Is it convincing?  Does it do the job of good writing in liberating the readers' imagination?  In making these judgements we tend to appeal to a notion of 'value' that is cultural and that privileges dominant forms of authority.  This aspect of judgement is essentially romantic; with all the implications for western and elite hegemony embedded in that idea.  At the same time history writing is the result of simple hard work of a more technical kind - in the archives, in collating and collecting, re-ordering and interrogating data.  And it is valuable because it encompasses that hard work.  The beauty of the academic apparatus is that it evidences this and in the process generates a different measure of value.  In other words it is where quality is tied to a 'labour theory of value'.  I love the academic slog because it is where un-moored judgement is tied down to hard labour; and where value can be universalized in a common human experience (work).  In other words I really enjoyed editing 4000 URLs precisely because in them and their associated footnotes lies a claim to and evidence of the hard labour that underpins the book itself.

 At the same time, the process also taught me to read URLs differently.  Clearly coders and web designers do this as a matter of course.  But I am a historian and want to read URLs as a scholar, rather than as a programmer or designer.  And for me, the important thing is that URLs embed the structure of a site, making it plain to see for anyone willing to look hard; and that they are made up of both the character of a library reference, and a command directed at the new technology of discovery - the Internet .  There are just lots of different types of URL.

There are 'Search URLs' that include all the elements that  take the user past a collection to a specific object, but don't let you go directly there without the query.  And there are URLs that encode a cataloging hierarchy.  There are URLs that sift data, or work in your browser to change the data delivered, highlighting phrases or sifting material.  And there are URLs that encode licensing, passwords, and access information.  It is easy enough to find that the whole search journey that took you from a library catalog to an individual item is encoded directly in the URL, and even personalized to you, the machine you are using, or the forms of access you can deploy.  It is easy to find URLs that run on for hundreds of characters, each element divided by a '&' or a '%', or such.

But in creating robust reproducible links to credible historical materials most of these URLs are at least problematic if not useless.  If they include details for institutional access, or session information, they cannot be re-used by someone else.  These URLs are friable and fragile things and not fit for scholarly purposes.  And as a result, for the London Lives book we have been forced to eliminate all the links we originally hoped to include to forty or fifty different sites.  To take a single example, most archives structure their online collections with search in mind, making it difficult to link to a single item.  I spent a lot of time finding the catalog entry for every manuscript we cited in the London Metropolitan Archives, and Westminster Archives Centre, only to regretfully strip out the links when confronted by a complex URL that just did not look credible as a long term citation of the item itself.

Even in its simplest, and in the form recommended by the site for sharing a link, a London Metropolitan Archives URL looks like this:

http://search.lma.gov.uk/scripts/mwimain.dll/144/LMA_OPAC/web_detail/REFD+P69~2FBRI~2FB~2F001~2FMS06554~2F004?SESSIONSEARCH


Since we had consulted these items in their physical form in any case, it did not seem too problematic to leave out these links, but a shame nevertheless.  And likewise, with paywall material there seemed little point in dangling real access, and the promise of credible evidence, before the eyes of readers who would not be able to go beyond the login screen.  It seemed better to cite a specific item in combination with a general (unlinked) URL and date of consultation as reflecting our own research journey, rather than to promise access when we could not deliver it.

With few exceptions the URLs that have been retained (and there are still 4000 of them) address specific items with a specific ID, and usually run to 20 to 40 characters.  DOIs are not bad once you figure out their structure and reformulate them as they should be, rather than the way they are normally cited on journal web pages.

dx.doi.org/10.1353/sec.2010.0268

And Google Books creates a very nice URL once you strip out all the complex formatting instructions that are normally generated as part of a search and inserted after the main ID.  This is what a Google Books' URL looks like if you were to use the 'search' version:

 http://books.google.co.uk/books?id=1sMJGt7_rTAC&printsec=frontcover&dq=%22Prosecution+and+Punishment:+Petty+Crime+and+the+Law%22&hl=en&sa=X&ei=rrzGUq_aDsSy7Aa_9YGQCg&redir_esc=y#v=onepage&q=%22Prosecution%20and%20Punishment%3A%20Petty%20Crime%20and%20the%20Law%22&f=false

And this URL will take to the same book:

 books.google.co.uk/books?id=1sMJGt7_rTAC

 And the Eighteenth-century Short Title Catalog generates some of the most elegant URLs I have found:

estc.bl.uk/T174945

And to a lesser extent, so does the Ethos collection of doctoral theses at the British Library.

ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.354762

And London Lives and the Old Bailey Online do pretty well on this score:

www.londonlives.org/browse.jsp?div=LMSMPS501980014
http://www.oldbaileyonline.org/browse.jsp?ref=t17910413-19


In part, I suspect that these issues would all disappear if I had a better sense of the layer of structure that lies beneath the WWW.  But for the moment I am keen to have a short, human-readable URL that looks like it will last longer than the session I am currently logged on for.   All of which simply takes me back to the joy of academic slogging and the importance of the academic apparatus as something that evidences hard work and opens up scholarship to credible criticism that goes beyond simple romantic appreciation and prejudice.

I know all too well that one of the skills of an academic is the ability to judge a book by its cover and the form of the text it contains.   For the online we need to embed URLs into precisely this process - and the joy of all that editing was that at the end of it, I feel I have learned to do just that.