New algorithms for tracking changes in word usage and comparing the meanings of sentences
Computer scientist Tom Kenter from the University of Amsterdam's Informatics Institute (IvI) has developed an algorithm that enables computers to work out whether two sentences mean the same thing. Kenter has also developed a method to automatically track changes in word usage.The results will be presented at the Conference on Information and Knowledge Management 2015 (CIKM 2015) in Melbourne, Australia, on 19-23 October 2015.
It is important for search engines to know what information can be found on a website, so that the search results that are displayed are as accurate as possible. Because the same information can be worded in a variety of different ways, an effective automated method that is able to discover whether two web pages contain the same information is essential.
The traditional way of finding out whether two sentences contain the same information is to look at how many words they have in common. In simple cases such as 'We're going on a bicycle ride tomorrow' versus 'We're going on a long bicycle ride tomorrow', this method is fairly successful.
However, comparing words literally doesn't always work. For example, the following sentences have barely any words in common: 'I want to go play basketball' and 'I'd like to go shoot some hoops', although a few of the words are related in meaning, such as 'want' and 'like'.
Tom Kenter's research demonstrates that comparing two sentences word by word based on similar meanings, instead of the recurrence of the exact same words, yields valuable information. New ways of representing this information have achieved excellent results on a standard evaluation set.
Together with humanities scholars from the University of Amsterdam and Utrecht University, Tom Kenter has also developed an algorithm to track changes in word usage over time. The algorithm compares the contexts of words over a number of years and tracks the relationship between word meanings. This makes it possible to detect gradual changes in word usage. Historians have reviewed the results of this method to allow for a systematic evaluation.
One of the problems with research based on historical written material (such as the National Library of the Netherlands’ digitised archive of Dutch newspapers from the past few centuries) is the difference between our current vocabulary and word usage in the past. For example, the research revealed that where today we talk about abortion, people used to use the term 'family planning'.
The results of both studies will be presented at the Conference on Information and Knowledge Management 2015 (CIKM 2015) in Melbourne, Australia, on 19-23 October 2015.