Sunday 19 June 2011

Culturomics, Big Data, Code Breakers and the Casaubon Delusion

Suddenly it seems as if 'big data' humanities is all the crack; with quantitative biologists and mathematicians diving in where previously only historians, literary critics and linguists dared to swim.  Digital humanists have been slowly engineering a new field from history and linguistics (aided and abetted by library science) for over a decade, gradually building new bodies of evidence, and road testing new methodologies.  But in just the last year or so, the biologists and mathematicians, with Google's help, have stolen a march on all their puny efforts.  In particular, it seems that Science and Nature have fallen head over heels in love with 'culturomics' and the heady enthusiasms of Erez Lieberman Aiden and Jean-Baptiste Michel, and their Google ngram viewer.  To read the most recent issue of  Nature is to be confronted with a heady mix of big science and gushing Hello Magazine prose, that work to mythologise the new 'science' of  culturomics and its creators.  It feels like the birth of a myth and of a brand.


This is all rather wonderful, and I am a huge fan of the Google ngram viewer, and the playful way it allows scholars and students to engage with the 'infinite archive' of inherited texts.  I think Aiden and Michel (and Google) have done the humanities a huge service.   But their real achievements do not quite explain the cloud of hyperbole that seems to be rising around them.


And this made me wonder what is really at issue here?  What is it about culturomics that turns on the reporters from  Nature.  At its heart, the use of word frequency with a reasonably sized (if problematic) data set simply provides one more form of evidence to be added to all the rest.  Knowing that the term 'electricity' peaks between 1870 and 1900 is useful evidence, but does not provide either an explanation for why, or a description of how it is being used.   Historians will no doubt look this particular gift horse in the mouth, and worry at the condition of its teeth; but they will also happily use the ngram viewer as one more component in a complex landscape of evidence.  This use may be delayed by the peculiar lack of any guidance on how to cite the results of a search, but it will be normalised in due course.


But simply providing a new body of evidence is not what seems to get Nature going.  Instead, it is the claim that the ngram viewer lays the basis for a new 'science', and that the results make other forms of historical analysis redundant.  In the words of Aiden and Michel, somehow this data is uniquely available for 'scientific purposes',  in contrast of other forms of evidence. 

It is not, therefore, the mechanics of the ngram viewer that is at issue.  Instead it is the underlying intellectual paradigm that Aiden and Michel bring to its use.  They appear to claim to be able to read history from the patterns the ngram viewer exposes - to decipher significant patterns from the data itself.  Their great party tricks (and they are particularly impressive in live performance) include the analysis of the decline of irregular verbs to a describable mathematical pattern, an equation, and the rise of 'celebrity' as measured by the number of times an individual is mentioned in print.  These imply that all historical development can, like irregular verbs, be described in mathematical terms, and that 'human nature', like the desire for fame, can be used as a constant to measure the changing technologies of culture. 

In some respects, we have been here before.  In the demographic and cliometric history so popular through the 1970s and 80s, extensive data sets were used to explore past societies and human behaviour.  The aspirations of that generation of historians were just as ambitious as are those of the parents of culturomics.  But, demography and cliometrics started from a detailed model of how societies work, and sought to test that model against the evidence; revising it in light of each new sample and equation.

The difference with culturomics is that there is no pretence to a model.  Instead, its practitioners will simply seek to discover patterns in the entrails of human speech, hoping to find the inherent meanings encoded there.  What I think the scientific community finds so compelling is that like quantitative biology and DNA analysis, Aiden and Michel are using one of the controlling metaphors of 20th-century science, 'code breaking' and applying it to a field that has hitherto resisted the siren call of analytical positivism.  


Since the 1940s the notion that 'codes' can be cracked to reveal a new understanding of 'nature' has formed the main narrative of science.  With the re-description of DNA as just one more code in the 1950s, wartime computer science became a peacetime biological frontier (cashing in on big-pharma, as military expenditure declined).  That Aiden comes from a background in DNA analysis should clue us to the fact that culturomics is an attempt to apply the same kind of code breaking to human society as a whole.



I strongly suspect that the project will fail, just as naive readings of DNA as a code for life have largely failed to fulfil their promise. But much more importantly, this attempt to repurpose a 'scientific' approach to historical analysis simply miss-understands the function of history itself.  These large-scale visualisations of language may be the raw material of history, the basis for an argument, the foundation for a narrative, the evidence put in the appendix in support of a subtle point, but they do not serve as a work of history. 

Historians interpret the past to the present.  They marshal evidence and use all the tools of genre writing to allow a modern reader to engage with the past.  And the questions they ask are not driven by the evidence, but by the needs of a modern society.  Gender history, the history of sexuality, and of race, have been created by two generations of historians not because the archives are groaning under the weight of relevant evidence, but because our society needs to understand the role of these forces in the present.  The fundamental flaw with culturomics is that it assumes that history is about the past; that what historians seek to achieve is an ever more accurate description of everything.  Instead, it is about the present.  Ironically, Aiden and Michel have rediscovered the 'Casaubon delusion'; and believe, like George Eliot's tragic figure, that they can create a new 'Key to all Mythologies'.   They need to listen to the Dorotheas of this world.

48 comments:

Ernesto said...

All I can say is "bravo". And thank you.

jbmichel said...

Hey Tim -

Interesting thoughts, It's great to see people diving into the discussion, and especially how much the discussion has changed in only six short months!

A few comments, in three parts.

---
You wrote:
"This use may be delayed by the peculiar lack of any guidance on how to cite the results of a search, but it will be normalised in due course."

Actually, in the "About the Ngram Viewer" (http://ngrams.googlelabs.com/info) section, we write: 

"If you're going to use this data for an academic publication, please cite:

Jean-Baptiste Michel*, Yuan Kui Shen, Aviva Presser Aiden, Adrian Veres, Matthew K. Gray, William Brockman, The Google Books Team, Joseph P. Pickett, Dale Hoiberg, Dan Clancy, Peter Norvig, Jon Orwant, Steven Pinker, Martin A. Nowak, and Erez Lieberman Aiden*. Quantitative Analysis of Culture Using Millions of Digitized Books. Science (Published online ahead of print: 12/16/2010)"

jbmichel said...

(Part 2 of 3)

You wrote:
"At its heart, the use of word frequency with a reasonably sized (if problematic) data set simply provides one more form of evidence to be added to all the rest...But simply providing a new body of evidence is not what seems to get Nature going.  Instead, it is the claim that the ngram viewer lays the basis for a new 'science', and that the results make other forms of historical analysis redundant."

We don't understand why this straw-man keeps coming up, as we have been unambiguous about this point in the past: n-gram results are a new form of evidence. They do not make any extant method of historical analysis redundant. Here are a few primary sources. 

From the paper:

"Culturomic results are a new type of evidence in the humanities."

From the very first line of the Culturomics FAQ:

"1. Is this supposed to replace close reading of texts?
Absolutely not. Anyone who has appreciated the work of a great artist - say, Shakespeare - or an insightful scholar - say, Michael Walzer's Exodus and Revolution - couldn't possibly think that quantitative approaches can replace close reading. 

Quite the opposite is true: quantitative methods can be a great source of ideas that can then be explored further by studying primary texts.

2. How does this relate to other methods in the humanities?
Our hope is that the culturomic approach will be able to supplement existing techniques."

From the current Nature piece:

"
A.
And yet [Erez] doesn’t think that the old approaches will ever disappear. “I think you should use the best methods available — and all of them,” he says. “And I think that includes carefully reading texts and trying to get behind what authors think.”

B.
"…[Erez] tells the story of Isaac Casaubon, a sixteenth-century Protestant scholar, who undermined the presumed 
Egyptian provenance of a set of religious texts by identifying a reference to a Greek play on words — something that could only have been written hundreds of years later. “That point is as objective an interpretive remark as any remark a scientist might make,” says Lieberman Aiden. “So the methods of humanists are very, very formidable. And I think the degree of insecurity they have over whether these methods are here to stay is not really befitting.”
"

We think these texts make it unambiguously clear that we have absolutely no intention of replacing existing methods. There is no viable alternative reading of our statements in this area, and the attribution of this attitude to us is simply incorrect - a Casaubon delusion.

Our goal is no more than to enable such data to provide - in your exact words - "one more form of evidence to be added to all the rest".

---

jbmichel said...

(Part 3 of 3)

You wrote:
"Historians interpret the past to the present.  They marshal evidence and use all the tools of genre writing to allow a modern reader to engage with the past.  And the questions they ask are not driven by the evidence, but by the needs of a modern society...The fundamental flaw with culturomics is that it assumes that history is about the past..."

We make absolutely no such assumption. We are agnostic about what motivates a person's questions. What interests us is the process by which scholars 'marshal evidence'. 

Data is evidence. Just as those who read Akkadian can skillfully marshal Akkadian primary sources, those with quantitative skills can skillfully marshal data. Our goal is to contribute data and methods so that this new form of evidence can thrive.

All the best,
Jean-Baptiste Michel
Erez Lieberman Aiden

Elijah Meeks said...

I wish the DH community could be as openly critical of work done by self-identified DH practitioners as they are of Culturomics.

Arno Bosse said...

Elijah: bingo.

Tim Hitchcock said...

Dear Jean-Baptiste, Thanks for your comments on the post. On the issue of citations - the point I was trying to make was that there are no directions for citing a graph generated using the ngram viewer, as opposed to citing your articles. This is a wider problem than just with the ngram viewer, and until humanists figure out how to cite a search and its results in a repeatable and credible manner, we will be practising increasingly poor scholarship.

On the wider issue of your own and Erez's engagement with the humanities and history; the situation is somewhat out of your control, as the science press has emphasised those aspects of your work that imply a new and newly 'scientific' approach. This is an emphasis that has a powerful appeal to those given to a crude factology, and reflects the continuing distance between approaches to knowledge. There is clearly a sizeable chasm between your own practise and how it is represented. But while you remain at the stage of demonstrating a methodology, rather than using it to write history, the issue of how your practise works with other and older forms of scholarship, will remain - and is worth interrogating.

Let me also re-iterate how much I admire your work.

What I very much hope will emerge in due course is some great history that uses your techniques and methodologies to evidence society's understanding of the past. And I very much hope that you and Erez will be the ones to write it.

My experience of the Digital Humanities community as a whole is that it tends to enthuse over new tools and ways of visualising data, without being sufficiently concerned to critique the usefulness and purpose of the wider project, or to relate that project to the functions of humanist and social science scholarship. In many respects the criticisms I have made of your project are just as true of the wider Digital Humanities community, and form a topic with which I am continually struggling in my own work.

All the best, Tim

Unknown said...

Wiztech Automation is a Chennai based one-stop Training Centre/Institute for the Students Looking for Practically Oriented Training in Industrial Automation PLC, SCADA, DCS, HMI, VFD,VLSI, Embedded, and others – IT Software, Web Designing and SEO.

PLC Training in Chennai
Embedded Training in Chennai
VLSI Training in Chennai
DCS Training in Chennai
IT Training Institutes in Chennai
Web Designing Training in Chennai

Anonymous said...

We have to give our best effort as to understanding and proving exactly the same cited provisions which are said to be so important. mortgage motivation letter sample

DedicatedHosting4u said...

This is really a wonderful post. First, I would rather thank the stress of golf but the connection plays a leading role in the accommodation. Fantastic stuff.

DedicatedHosting4u.com

AvaHill said...

Here I am sharing a very useful resource for students looking for assignment help.Check essay writers world

OGEN Infosystem (P) Limited said...

Thanks for this awesome information. If you need an attractive website design for your business, visit OGEN Infosystem and also get SEO Services for your business promotion.
Website Designing Company

Ciana Langham said...

Hello! I have got great news for people who struggle with writing papers! Here is cool place that contains an information about medical research paper topics nad a lot of other.

Check Mot History said...

Check MOT History online to check the MOT status of any Car/MOT or Vehicle. MOT History helps you to know about your vehicle if you are going to buy it. You have to know about all the details of MOT or Vehicle before buying. Check MOT History now for free.

Monnika Jacob said...

I got complete solutions for my assignments quick and fast from Someone Do My Dissertation which helped to gain great career growth.

whatsapp plus themes said...

123movies Thanks for sharing your inspiring blog the information was very useful to my project..

Huongkv said...

Đặt mua vé tại Aivivu, tham khảo

vé máy bay đi Mỹ giá rẻ 2021

vé máy bay tết Vietjet

giá vé máy bay đi canada bao nhiêu

săn vé máy bay đi Pháp

vé máy bay đi Anh quốc

vé máy bay giá rẻ khuyến mãi

combo đà nẵng golden bay

combo nha trang tháng 7

SKIP HIRE SERVICES UK said...

I have enjoyed this great info so much. This was really very interesting and helpful to read. I can't wait to read more from this site.

Skip Hire

AL FAHAD IT CONSULTING said...

I must recommend this blog.

AL FAHAD IT CONSULTING is the best IT company in UAE, KSA and BAHARIN.

Bookkeeping Services

ajabgjab said...

Thanks for sharing us a great information that is actually helpful Hot Water Drinking Benefits     

Anonymous said...

Web Development Company in India

yadongbizz said...

You are so awesome! I don’t suppose I have read anything like this before. So wonderful to discover another person with genuine thoughts on this issue. Really.. many thanks for starting this up. This site is something that’s needed on the web, someone with a little originality. 한국야동

Also feel free to visit may web page check this link 야설

yadongbizz said...

I have to thank you for the efforts you’ve put in penning this blog. I am hoping to check out the same high-grade content from you in the future as well 야동

Also feel free to visit may web page check this link 국산야동

yadongbizz said...

Wow what wonderful information this was. I really liked it a lot. By the way, I was looking online for some writing help when I landed here. 한국야동

Also feel free to visit may web page check this link 야동

Srinagar Houseboats said...

This is amazing information put in this blog, thank you so much for sharing this information.
Kashmir Houseboat Price

CSS Founder said...

CSS Founder is known as the best website designing company in Gurgaon. Connect with us if you want best website design and development service at an affordable price.

sarah said...

Well written article Management Dissertation Help Dunedin

Anonymous said...

Well Written Post Do My Homework Help USA

Unknown said...

Very nice and informative article. Keep posting like this. Thank you. luxury apartments in Noida extension

CSS Founder said...

Really useful information. Thank you so much for sharing.It
Web Design companies in Doha

PromoteDial - Prince Kumar said...

Connect with PromoteDial that is known as the best SEO Company in bandra and increase your online business. We can rank your website on the first page of Google.

Jamie Starr said...

the article has a great deal of investment in content and science. I took the time to read them and found them quite interesting. You can check out short memorial quotes for loved ones

akashpromotedial said...

The main function of a website is to make your business accessible to people easily and provide information about your business, with a website, you can also expand your customers. With a website, you can be miles ahead of your competitors. Contact us if you want to build a website
Web design company Wachira

Akash Kanaujiya said...

Join CSS Founder and take your business forward in this digital world. And give a new identity to your business, go here to know what service we provide
Web design company Seattle

Steadfast services said...

Steadfast service is the best immigration service in UAE, Dubai If you are looking for a immigration service so connect with us
https://www.steadfastservices.eu/work-permit-in-poland.php

alina497 said...

From creating the most technical and bespoke academic assignments on demand to securing the most excellent academic grades, Great Assignment Helper furnishes the best Civil engeneering assignment help to students at the cheapest rates.

Akash Kanaujiya said...

Our digital marketing and website design services company, the name is CSS Founder, which provides the very best services there is no better than CSS Founder, which has been providing services in the field of digital marketing and website design services for the last 20 years and CSS Founder has made the business of people through the website and digital marketing. Click the link and join us. Website design company Aqaba

Css Founder PVT. LTD. Web Design said...

Wow, cool post. I'd like to write like this too - taking time and real hard work to make a great article... but I put things off too much and never seem to get started. Thanks though Website design company in Manjeri

Akash Kanaujiya said...

If you are searching for the best website design company then CSS Founder is the best option for you please go to the given link for more details and contact us Website design company in Kansas City

7 Hotel Hills and Resorts | BlogSpot said...

thanks for sharing such a nice information. hotels in lansdowne

7 Hotel Hills and Resorts | BlogSpot said...

Thank you for your interest ! Top Resorts in Lansdowne

University Homework Help for Students said...

I wanted to take a moment to express my appreciation for your outstanding blog posts. Your writing style is impressive, and I always look forward to reading your articles. Your insightful and informative posts provide valuable knowledge and insights on various topics, making them an excellent resource for readers. Your dedication to sharing your expertise is truly admirable, and I applaud your efforts to help others through your writing. Thank you for your contribution to the online community, and keep up the great work!

University Homework Help is an online platform that provides academic assistance to students at an affordable price. It offers a wide range of services, including assignment help, essay writing, and online tutoring. The platform is staffed with a team of experienced tutors who are experts in their fields and can provide personalized support to students in need.

One of the key benefits of University Homework Help is its affordability. The platform is designed to be accessible to students of all budgets, and its pricing structure is transparent and straightforward. Students can choose the service that best meets their needs and budget, and they only pay for what they need.

Despite its affordable pricing, University Homework Help does not compromise on quality. The platform is committed to providing high-quality academic assistance that helps students achieve their academic goals. Its team of tutors and writers are experienced and qualified, and they are dedicated to providing personalized support to every student.

Overall, University Homework Help is an excellent resource for students looking for academic economics homework support at an affordable price. Its commitment to quality and accessibility make it a valuable asset to any student looking to improve their grades and succeed in their academic career.

Mariah John said...

Case Study Help Germany
Do you need best and cheap assignment help Germany for better Grades?
Case Study Help Germany delivered top notch and high quality assignment for university and colleges students. Our expert writer’s team has highly qualified and skilled who assist you with 100% unique and Plagiarism free Content.

The Hindustan Blogs said...

Do you want to know the TIPS TO MAKE YOUR WEBSITE ATTRACTIVE? If yes then read this blog for getting more information.

Vishal Tomar said...

With VisitsVisa, applying for an India Visa has never been easier. Save time and effort by using our convenient online platform, allowing you to focus on planning your itinerary and enjoying your trip. Experience hassle-free travel with Visits Visa – your trusted partner for all your Indian visa requirements.

India Visa said...

Fulfill your wish to visit India with your family and friends with the help of India Visa Apply. We provide you with hassle-free Indian Visa services to make you feel comfortable and energetic on your journey.

cssfoundergurgaon said...

When it comes to creating an effective online presence for your business, choosing the right website design company is important. With countless options available, finding a reputable and reliable provider can be overwhelming. That's where CSS Founder comes in. As a top and trusted website designing company in Kolkata, CSS Founder offers comprehensive solutions to meet your specific needs.

Alainaa said...

Your knowledge of New York business divorce issues is priceless! It is really admirable how you share your knowledge and handle the legal nuances. I appreciate you being the go-to person in the industry! Business Divorce Attorney New York