I have been asked to participate in a panel at the annual CERL conference - and to speak for no more than five minutes or so. Initially, I was just going to wing it, but then, in writing up a couple notes, five minutes worth of text found its way on to the screen. In the spirit of never wasting a grammatical sentence, the text is below. I probably wont follow it at the conference, but it reflects what I wanted to say.
CERL - British Library, 31 October 2012:
We all know just how transformative
the digitisation of the inherited print archive has been. Between Google Books, ECCO, EEBO, Project
Gutenberg, the Burney Collection, the British Library's 19th century
newspapers, the Parliamentary Papers, the Old Bailey Online, and on and on,
something new has been created. And it
is a testament to twenty years of seriously hard graft. But, it is one of the great ironies of the
minute that the most revolutionary technical change in the history of the human
ordering of information since writing - the creation of the infinite archive,
with all its disruptive possibilities - has resulted in a markedly conservative
and indeed reactionary model of human culture.
For both technical and legal reasons,
in the rush to the on line, we have given to the oldest of Western canons a new
hyper-availability, and a new authority. With the exception of the genealogical sites,
which themselves reflect the Western bias of their source materials and
audience, the most common sort of historical web resource is dedicated to
posting the musings of some elite, dead, white, western male - some scientist,
or man of letters; or more unusually, some equally elite, dead white woman of
letters. And for legal reasons as much
as anything else, it is now much easier to consult the oldest forms of
humanities scholarship instead of the more recent and fully engaged
varieties. It is easier to access work from
the 1890s, imbued with all the contemporary relevance of the long dead, than it
is to use that of the 1990s.
Without serious intent and political
will - a determination to digitise the more difficult forms of the
non-canonical, the non-Western, the non-elite and the quotidian - the materials
that capture the lives and thoughts of the least powerful in society - we will
have inadvertently turned a major area of scholarship, in to a fossilised
irrelevance.
And this is all the more important
because just at the same moment that we have allowed our cultural inheritance
to be sieved and posted in a narrowly canonical form; the siren voices of the
information scientists: the Googlers, coders and Culturomics wranglers, have
discovered in that body of digitised material, a new object of study. All digital texts is now data, that date is
now available for new forms of analysis, and that data is made up of the stuff
we chose to digitise. All of which embeds a subtle biase towards a
particular subset of the human experience.
Using measures derived from Ngrams, and topic modelling, natural
language processing, and TF-IDF similarity measures; scientists are beginning
to use this text/data as the basis for a new search for mathematically
identifiable patterns. And in the
process, the information scientists are beginning to carve out what is being
presented as 'natural' patterns of change, that turn the products of human
culture into a simple facet of a natural, and scientifically intelligible
world. The only problem with this is
that the analysts undertaking this work are not overly worried by the nature of
the data they are using. For most, the
sheer volume of text makes its selective character irrelevant.
But if we are not careful, we will
see the creation of a new 'naturalisation' of human thought based on the
narrowest sample of the oldest of dead white males. And to this particular audience I just want
to suggest that we need to be much more critical about what it is that we
digitise; what we allow to represent the cultures libraries and collections
stand in for; and that we need to engage more comprehensively and intelligently
with the simple fact that we are in the middle of a selective recreation of
inherited culture.