Friday, 3 January 2014

Judging a book by its URLs

It will sound odd, but I have recently had a great time editing URLs.  Robert Shoemaker and I have have just finished a book for CUP, derived from the London Lives project, and called - London Lives: Poverty, Crime and the Making of a Modern City, 1690-1800. It is a long book (170,000 words) and each quote and reference in it is linked via a URL to the original document or article, book or web-resource used as evidence or to contextualize the argument.  It will be published as both an ebook and in hard copy, and the links need to be robust, and secure.  My estimate is that there are in the region of 4,000 URLs included in the manuscript (which was written collaboratively in PMWiki).  In the end, I found that I could identify an appropriate link for 98% of all footnote references, but then had to eliminate around 10% of these, as the relevant URL was just not useable.  The book took some nine years, and I am glad it is finished.

One of my final jobs was editing those 4000 URLs.   It took about three months work, spread over the last year, and I have just finished spending a week or so confirming what I hope will be their final form.  When I have told people about this work many have looked incredulous and suggested that this is the sort of technical implementation process that should be left to others.  A couple of otherwise nice people have suggested I dump this job on the shoulders of the nearest PhD student.  But for myself, it is precisely the kind of thing that an author should do for themselves.  And in doing it, two things kept coming to mind.  First was how the role of the scholar in creating a rigorous academic apparatus is a central part of the intellectual journey that academic writing involves - and that we should see the implementation of the online version of this in the light of the precise writing of footnotes and references that mark out good scholarship.  And second, that URLs encode a system of design and intent, online architecture and system of access, that signal the quality and permanence (the academic credibility and perceived audience) of historical materials online.  And that just as we have always sorted and judged scholarship by its form, we should think a bit harder about how the form of a URL can let us interrogate online materials.

On the first point, I do not know of much discussion of the joys of this kind of academic slog.  There is a lot of good writing on research and archives (by Carolyn Steedman and Arlette Farge among many others), on writing and thinking, but no-one talks much about the painstaking labour that goes in to turning a rough draft in to a final finished piece of scholarship.  And here I am really talking about generating accurate and fully comprehensive footnotes that reflect both the material cited, and the research journey that resulted in the main text.   This has become much easier with online catalogs and citation management packages, but nevertheless remains laborious and a reflection of our collective and individual commitment to a particular kind of evidenced discussion.  But for me it also represents my favourite compromise.  The writing of history is a wonderfully imaginative and creative process.  And in some respects we wish to judge the product of history writing as art.  Is it enjoyable to read? Is it convincing?  Does it do the job of good writing in liberating the readers' imagination?  In making these judgements we tend to appeal to a notion of 'value' that is cultural and that privileges dominant forms of authority.  This aspect of judgement is essentially romantic; with all the implications for western and elite hegemony embedded in that idea.  At the same time history writing is the result of simple hard work of a more technical kind - in the archives, in collating and collecting, re-ordering and interrogating data.  And it is valuable because it encompasses that hard work.  The beauty of the academic apparatus is that it evidences this and in the process generates a different measure of value.  In other words it is where quality is tied to a 'labour theory of value'.  I love the academic slog because it is where un-moored judgement is tied down to hard labour; and where value can be universalized in a common human experience (work).  In other words I really enjoyed editing 4000 URLs precisely because in them and their associated footnotes lies a claim to and evidence of the hard labour that underpins the book itself.

 At the same time, the process also taught me to read URLs differently.  Clearly coders and web designers do this as a matter of course.  But I am a historian and want to read URLs as a scholar, rather than as a programmer or designer.  And for me, the important thing is that URLs embed the structure of a site, making it plain to see for anyone willing to look hard; and that they are made up of both the character of a library reference, and a command directed at the new technology of discovery - the Internet .  There are just lots of different types of URL.

There are 'Search URLs' that include all the elements that  take the user past a collection to a specific object, but don't let you go directly there without the query.  And there are URLs that encode a cataloging hierarchy.  There are URLs that sift data, or work in your browser to change the data delivered, highlighting phrases or sifting material.  And there are URLs that encode licensing, passwords, and access information.  It is easy enough to find that the whole search journey that took you from a library catalog to an individual item is encoded directly in the URL, and even personalized to you, the machine you are using, or the forms of access you can deploy.  It is easy to find URLs that run on for hundreds of characters, each element divided by a '&' or a '%', or such.

But in creating robust reproducible links to credible historical materials most of these URLs are at least problematic if not useless.  If they include details for institutional access, or session information, they cannot be re-used by someone else.  These URLs are friable and fragile things and not fit for scholarly purposes.  And as a result, for the London Lives book we have been forced to eliminate all the links we originally hoped to include to forty or fifty different sites.  To take a single example, most archives structure their online collections with search in mind, making it difficult to link to a single item.  I spent a lot of time finding the catalog entry for every manuscript we cited in the London Metropolitan Archives, and Westminster Archives Centre, only to regretfully strip out the links when confronted by a complex URL that just did not look credible as a long term citation of the item itself.

Even in its simplest, and in the form recommended by the site for sharing a link, a London Metropolitan Archives URL looks like this:

Since we had consulted these items in their physical form in any case, it did not seem too problematic to leave out these links, but a shame nevertheless.  And likewise, with paywall material there seemed little point in dangling real access, and the promise of credible evidence, before the eyes of readers who would not be able to go beyond the login screen.  It seemed better to cite a specific item in combination with a general (unlinked) URL and date of consultation as reflecting our own research journey, rather than to promise access when we could not deliver it.

With few exceptions the URLs that have been retained (and there are still 4000 of them) address specific items with a specific ID, and usually run to 20 to 40 characters.  DOIs are not bad once you figure out their structure and reformulate them as they should be, rather than the way they are normally cited on journal web pages.

And Google Books creates a very nice URL once you strip out all the complex formatting instructions that are normally generated as part of a search and inserted after the main ID.  This is what a Google Books' URL looks like if you were to use the 'search' version:

And this URL will take to the same book:

 And the Eighteenth-century Short Title Catalog generates some of the most elegant URLs I have found:

And to a lesser extent, so does the Ethos collection of doctoral theses at the British Library.

And London Lives and the Old Bailey Online do pretty well on this score:

In part, I suspect that these issues would all disappear if I had a better sense of the layer of structure that lies beneath the WWW.  But for the moment I am keen to have a short, human-readable URL that looks like it will last longer than the session I am currently logged on for.   All of which simply takes me back to the joy of academic slogging and the importance of the academic apparatus as something that evidences hard work and opens up scholarship to credible criticism that goes beyond simple romantic appreciation and prejudice.

I know all too well that one of the skills of an academic is the ability to judge a book by its cover and the form of the text it contains.   For the online we need to embed URLs into precisely this process - and the joy of all that editing was that at the end of it, I feel I have learned to do just that.


Becky said...

I'm an academic librarian and spend a good bit of time explaining to students the value of a book's apparatus--references, index, table of contents--in descending order of importance, for the most part. This piece fits right in, and I appreciate your expressing so well how all the work of creating a book affects its quality.

I'll be searching your blog for any posts on indexes, my favorite scholarly tool.

Becky Kornegay

Tim Hitchcock said...

Hi Becky, Thanks for your comment - very much appreciated. And please keep up teaching students about book structures! I am continually depressed by the extent to which historians largely fail to do this.

Indexes are interesting - though I haven't blogged about them in particular (this blog is a bit random!). But what I have always wanted to try is turning traditional book indexes on their head, and using them to model reader response - essentially assuming that text that attracts human created index entries comprises text that the readers eye gravitated towards (even if it is only the eye of the indexer).

Tim Hitchcock

Janice said...

What a smart analysis. (And may I say that I'm excited for the book project which has inspired this post?)

It's difficult to get tech-phobic colleagues and students to understand that just copying the URL from their browser bar isn't going to create a durable URL that others can use. It's also maddening when the software or database choices for a project hamper the URL reusability - this should be a basic concern for anyone building a web project these days!

John Muccigrosso said...

I worry whenever there's a ? in a URL (URI, really). That usually means that the underlying search mechanism, which is liable to change, is being included. So when they switch out the engine in an upgrade, the reference won't work anymore.

For example, your London Lives URI is clearly using a little Java on the server. How long will that last?

I think the Eighteenth-century Short Title Catalog has the best URI: just an ID number after the /.

Sebastian Heath has got a few good posts on this. For example,

Or his list of Very Clean URIs at

(Hmm, can I tag him here with @sebth?)

Kristopher Nelson said...

As a former web developer who is now a PhD student in history, this was fascinating--it's so ingrained in me to see URLs as reflections of what goes on behind the scenes that I often forget that this isn't how most people see them!

Anyway, just one thought: I am always suspicious of the durability of URLs that include a "?," even though this is extremely standard on so many sites. The "?" indicates that the bit after that is being passed on to another layer of the system that parses it as, effectively, a search. In my experience, if the backend technology changes, this part is the hardest to redirect to the proper resource.

I always prefer URLs that have only path information in them ("/"), because even though these still require backend tech to handle they tend to be the most stable into the future.

Of course, this is all very nice in theory! You do what you can, as you have, to reduce the complexity as much as possible.

This is an area where web devs could stand to pay attention historians/librarians/archivists, instead of just their immediate concerns.

Thanks for all the hard work on the project!

Tim Hitchcock said...

Dear All, Thanks for your comments. I particularly wanted to thank John for the link to Sebastian Heath's blog on this - I loved his line: If a URL looks unstable, it is.

But thanks also to both John and Kristopher for raising the issue of ? and the difficulties it creates. I am looking forward to discussing how to avoid this issue on the Old Bailey and London Lives sites.

One area that continues to interest me, is the effect that locating more and more functionality on the browser side of the equation, will have. One site I have been largely unable to reference effectively is Locating London's Past, just because the things I want to cite - maps - are generated on the fly, don't exist as 'objects' and don't actually show up in the URl.

Unknown said...
This comment has been removed by a blog administrator.
Unknown said...

I wonder if you have tried using the Internet Archive's Wayback Machine's "Save Page Now" tool to capture a page as it appeared when you accessed it for use as a trusted citation in the future?

Unknown said...

I wonder if you have tried using the Internet Archive's Wayback Machine's "Save Page Now" tool to capture a page as it appeared when you accessed it for use as a trusted citation in the future?

Unknown said...

I wonder if you have tried using the Internet Archive's Wayback Machine's "Save Page Now" tool to capture a page as it appeared when you accessed it for use as a trusted citation in the future?

Tim Hitchcock said...

I hadn't seen the 'Save Page Now' function - thanks very much for pointing me in this direction!

Phil said...

Just read this on the LSE Website. It reminded me of this old post; it expresses my own pleasure in reference-hunting, as well as a similar sense of how academic writing can combine the most unmoored and speculative creativity with a Gradgrindian level of groundedness ("Now, what I want is, References!"). Nothing like it, when it works.

jack wilson said...

Hi, I am Jackson from Chennai. I am technology freak. I did Big Data Hadoop Training in Chennai at FITA. This is useful for me to make a bright career in IT field.

Melisa said...

I have read your blog, it was good to read & I am getting some useful info's through your blog keep sharing... Informatica is an ETL tools helps to transform your old business leads into new vision. Learn Informatica training in chennai from corporate professionals with very good experience in informatica tool.
Best Informatica Training In Chennai|Informatica training center in Chennai

John Adam said...

your article is good information. i very like it, keep it up.

get more twitter followers instantly

Anonymous said...

Indeed a very good experience been shown this would also create some more values by the time and also there would be more things come to the mind if we would able to transcribe to our services. blog writing service

Vinoth Kumar said...

Wiztech Automation is a Chennai based one-stop Training Centre/Institute for the Students Looking for Practically Oriented Training in Industrial Automation PLC, SCADA, DCS, HMI, VFD,VLSI, Embedded, and others – IT Software, Web Designing and SEO.

PLC Training in Chennai
Embedded Training in Chennai
VLSI Training in Chennai
DCS Training in Chennai
IT Training Institutes in Chennai
Web Designing Training in Chennai

Amirtha rao said...

Superb explanation & it's too clear to understand the concept as well, keep sharing admin with some updated information with right examples.

Hadoop Training in Chennai|Big Data Training in Chennai|Fita Chennai reviews

tanya sweet said...

First of all i am saying that i like your post very much.I am really impressed by the way in which you presented the content and also the structure of the post. Hope you can gave us more posts like this and i really appreciate your hardwork.

Kiss Day 2017
Happy New Year 2018 Quotes
attitude dp for whatsapp in hindi
Good Night Quotes
Birthday Wishes to Brother
Happy New Year 2017 Poems
Valentine Week List 2017

Anonymous said...

Valentine day Wishes
Valentine's day Wishes 2017
Valentine's day 2017
Valentine day Quotes
Valentine day Poem
Valentine day celebration with Girlfriend
Valentine's Week List 2017

Albert Smith said...

I love the creativity on this site. The title of the article is very interesting and unique and it propels the reader to read the entire article so that you can figure out why you should judge a book by its URL. I am looking forward to reading more articles from this site that will help me to improve my command of English and range of vocabulary which are important skills possessed by Proposal Editors.

Agnes H. Bryant said...

This is exceptionally decent blog and educational. I have looked numerous locales however was not ready to get data same as your site but if you visit the website then you meet your target. I extremely like the thoughts and exceptionally intersting to peruse so much and Please Update and I would love to peruse more from your site,Thanks

ashxyz said...

Thanks for sharing such an amazing blog,really useful and informative.

Website Design Course in Bangalore | Web Designing Course in Bangalore

Sebi said...

Really a usefull post . Thankyou
Website Developers | Web Designers in bangalore | Website Designing company in bangalore
| SEO services in bangalore

Cheating Playing Cards in Delhi said...

cheating playing cards in India

Thanks For Sharing such beautiful information with us.I hope you will Share some more info about Cheating Playing in Delhi India.Please Keep Sharing...!

ptiacademy said...

Firstly I say thanks to you for publishing such a great blog. It is full of knowledge and for everyone.
PTI Academy one of the trusted IT institute in Jaipur offer ios development at an affordable price if someone interested then you can visit our website.

srikanthsri said...

tutuapp vip
tutuapp apk
tutu app apk
tutuapp install
tutuapp download
The first version is specifically designed to provide users with unlimited free downloads of their favorite apps.

MovieInfo said...

Chinese New Year 2019 GIF

Birthday Wishes for Boss

Slap Day 2019

7 Feb to 21 Feb Days List

Aruna Ram said...

This blog is awesome! In this post is very interesting for all readers and I am waiting for your more post from this blog admin.

Embedded System Course Chennai
Embedded Course in chennai
Unix Training in Chennai
Power BI Training in Chennai
Tableau Training in Chennai
Oracle Training in Chennai
Pega Training in Chennai
Oracle DBA Training in Chennai
Embedded System Course Chennai
Embedded Training in Chennai

ptiacademy said...

Sugan Chand Shopify Developer

htop said...

thanks for sharing this informations
aws training center in chennai
aws training in chennai
angularjs training in chennai
best hadoop training in chennai
best python training in chennai
selenium training in chennai
selenium training in omr

Entertaining Game Channel said...

This is Very very nice article. Everyone should read. Thanks for sharing. Don't miss WORLD'S BEST TrainDrivingSimulatorFreeGames

htop said...

thanks for sharing this information
aws training center in chennai
aws training in chennai
aws training institute in chennai
best angularjs training in chennai
angular js training in sholinganallur
angularjs training in chennai
azure training in chennai
best devops training in chennai

Unknown said...

Thanks for sharing an informative blog Like "Judging a book by its URLs" Really useful.
bring more article
Google Ads Company
Google Ads services

Asif Shaik said...
This comment has been removed by the author.
Unknown said...

Very in depth article
Web Development Company in UAE
Mobile App Development Dubai
Mobile App Design Dubai
Mobile Application Development in Dubai
Web Development Company in UAE

Anbarasan14 said...

Informative blog! it was very useful for me.Thanks for sharing. Do share more ideas regularly.
English Speaking Classes in Mulund West
IELTS Classes in Mulund
German Classes in Mulund
French Classes in Mulund
Best Spoken English Classes in Chennai
IELTS Coaching Centre in Chennai
English Speaking Course in Mumbai
IELTS Coaching in Mumbai
Spoken English Class in T Nagar
IELTS Coaching in Anna Nagar

teez said...

happy national dog day 2019
national dog day captions
national dog day facts
national dog day cards

national dog day celebrations
national dog day USA

Suruchi Pandey said...

I glanced through a couple of articles on your blog and found them genuinely relevant to my search. Keep up with this worthwhile write-ups.
Website Designer in Lucknow | Web design company

Asif Shaik said...
This comment has been removed by the author.
jacklinemelda said...

We are the best Marketing Paper Writing Services company providing Business Writing Service services written by highly skilled writers. Our writers are skillful in their areas of expertise since they have years of experience in their subject areas.

High Five said...

Happy Teacher's Day Lines
teachers day wishes cards
teacher thank you cards
appreciation quotes for teachers
happy teachers day poems 2019
Teachers day qoute 2019
inspirational message for teachers day
Teachers day quotes hindi
happy teachers day sms 2019

rocky said...

Very informative, I like the way, you have covered this article. You have covered it in very different manner. Thanks
berlin marathon Entry
berlin marathon Route
berlin marathon Results
berlin marathon Lottery
berlin marathon Prize Money

Livetrendnews said...

Great post I really like and enjoy this post. Great working with you
Lopamudra Raut Affairs
Click Here

Partha said...

Great work there.

সিংহ এবং ইঁদুর – ঈশপ এর গল্প | The Lion and The Mouse story in Bengali