Historyonics: Can we make a thesaurus of meanings for digital humanities?

Friday, 2 July 2010

Can we make a thesaurus of meanings for digital humanities?

In an idle moment I have been re-reading the introduction to Roget's Thesaurus, and have been struck by what a thesaurus actually does. It breaks up all words in to one of eight categories and assigns a number to each category. "Affections", for instance, is class eight. It then subdivides these categories in to more sub-categories, with "cheerfulness" falling under "Pleasure and Pleasurableness", and being assigned a number of 868 - after discontent (867) and before solemnity (869). If you then look up 868 - cheerfulness, it is divided again into 20 further subcategories, with around 15 words in each category. So, "gladden" falls under 868.6, and as it is the second word in the category, could be expressed as 868.6.2. The second half of the thesaurus simply lists all these words in alphabetical order to allow the user access to the hierarchy of meaning in the first half.

In other words, a thesaurus assigns a numerical value, that equates to one or more category of meaning for every word it contains.

If you took a large body of text - ten years of the Times, or everything published in 1840 - and broke it up into a set of words, and assigned each word a number from the thesaurus's hierarchy, you would end up with a unique numerical representation of the collected meaning of the words that make up the text.

If you take the sentence 'the sunlight warmed his forehead' and convert it into its thesaurus equivalents, you end with 334.10.7; 328.17.3; 239.5

It can be equated with: "The midday sun cooked his brow", which is 334.10.11; 328.17.20; 239.5

This would allow a kind of semantic search that was not dependent on direct context, and would not be universal, but would work perfectly well for historical text in English (all the better suited to historical material, as Roget was a late Enlightenment figure, and his categories map well on to historical text). There would remain an issue of disambiguation (making a distinction between "clip" as a noun, versus "clip" as a verb); but this could be mitigated either though mathematical approximations (you could create a third unique number that essentially averaged the two or more meanings assigned to any single word), or you could simply live with the errors generated, on the assumption that the historians are used to filtering their own reading. You could also apply the chronological data contained in the Oxford Historical Thesaurus to map how close (or distant) an individual text is to standard usage for a particular period; or how different genre relate to a standard evolving language (how literature vs law treatises map onto accepted usage in the decade they were written).

As we are confronted by massive text objects (I think the notion from linguistics of a corpora is less useful for historians who are seeking to find information, rather than define bodies of text), the ability to locate related or similar text across genre and texts is important. It would also be another way of approaching the measurement of "distance" between texts.

Alternatively, and this is closer to Roget's original scheme you could use this numerical labelling as a basic form of computer aided reading. You could, for instance, assign a colour to each broad category, and a shade of that colour to each sub-category; allowing you to identify the work that different parts of the text is doing, through a simple visual examination. When skimming through a large body of text, at perhaps 40 pages on a single screen, the colour coding would allow you to identify areas in which "affections" are directly discussed, or any of the other thesaurus categories - "Space", or "Physics" or "Matter".

Have I missed something - Does anyone know why we don't use a thesaurus based numerical hierarchy to code meaning in large texts? It would give you a "word" = "a set of numbers" (an unique number for each major word) in a paragraph or sentence or text division, which could then be compared statistically, or colour coded to reflect the breadth of meaning found. Or you could colour code for types of words and locate relevant sections in a large text in a particular colour. It just seems dead obvious as a way of moving towards the ideal at the heart of the semantic web, while avoiding the creation of 'triples', and the ever retreating promise of universality. My guess is either that librarians having been doing this for the last thirty years and not telling me (librarians are cruel that way), or that the rest of the world forgot to read the introduction to Roget's Thesaurus, which is also possible. Of course, the final possibility is that I don't understand the semantic web; and that ontologies in aggregate are already a form of thesaurus.

11 comments:

reviewstella said...: Amazing and informative post check amazon product related post ...
best rear view mirror camera
top 10 best rear view mirror cameras
rear view mirror cameras
mirror cameras
Best rear view mirror dash cam; 1 April 2020 at 05:00
maxwell said...: Nice post Are you looking for the
Best work socks for boots ? It is necessary to wear a comfortable and formal socks for work. However, it is not easy to find the best socks in most of the shops.; 18 April 2020 at 22:39
nawabzada said...: ●▬▬▬▬PART TIME JOBS▬▬▬▬▬●

I am making $165 an hour working from home. i was greatly surprised at the same time as my neighbour advised me she changed into averaging $ninety five however I see the way it works now. I experience masses freedom now that i'm my non-public boss. that is what I do......
↓↓↓↓COPY THIS SITE↓↓↓↓

HERE►►►►►►www.besttrends7.com; 7 February 2021 at 12:33
Emmaswift said...: I love this kinda article it is an awesome article and it will help a lot of others and those students who are looking for Cheap essay writing help - £4 essay so thank you guys for sharing this article with us.; 6 May 2022 at 06:34
zoeymary80 said...: Your entry is quite educational. I relish reading business plan assignment for students aid websites cause I enjoy knowledge about impressions and emotions, and you can find ultimate current information skilled. I have never encountered this type of produced news before. This info type you can see these there; 3 February 2023 at 01:49
Gordon Ashley said...: This comment has been removed by the author.; 15 March 2023 at 05:03
Gordon Ashley said...: It was an amazing article. Realy helpful. Allow me to introduce myself. tunnel rush; 15 March 2023 at 05:04
Annie james said...: I think a thesaurus of meanings for digital humanities would be a great idea. While working on my dissertation in English Literature, many times, I found the need for clarification or context for certain terms. I found that English Literature Dissertation Help came in quite handy when breaking down convoluted literary concepts into more understandable definitions. With synonyms or related terms, it would make digital humanities much easier to understand.; 29 December 2024 at 23:50
klaus said...: My skin's texture has improved significantly since I started purchasing skincare supplements from the canadian vitamin shop a few months ago. Liposomal vitamin C and glutathione are two products that truly make a difference. The fact that I'm supporting trustworthy businesses that put an emphasis on quality and consumer safety makes my purchases more easier.; 21 March 2025 at 04:39
Salman said...: Hi there, this is Salman. I'm from Pakistan. I am a lifestyle and fashion product specialist, with a particular emphasis on Premium shoes Pakistan. Having worked in the fashion sector for many years, I use my knowledge of trends and industry insights to help brands become more visible. Curating and promoting fashionable, high-quality products that epitomise luxury fashion and modern lifestyle is the focus of my work.; 12 June 2025 at 23:56
Muhammad said...: Hello, I’m Muhammad, I’m from Pakistan and currently working at Dealsexpress. I specialize in lifestyle, fashion, and the footwear industry, with a strong focus on skechers shoes for women. With a passion for emerging style trends and consumer needs, I aim to create content that blends quality with value. My work reflects a deep understanding of fashion dynamics in the Pakistani market. At Dealsexpress, I strive to bring the latest and most comfortable footwear options to our customers.; 13 June 2025 at 23:14

Historyonics

Friday, 2 July 2010

Can we make a thesaurus of meanings for digital humanities?

11 comments:

Labels

About Me