‘Divide and conquer’ for faster self learning search engines
Search engines such as Google often use hundreds of different search algorithms, all of which aim to find the best possible match between a user's search query and web page content. Continually expanding and improving these search algorithms is an important reason for Google's leading position among search engines. But in order to improve the results it is vital to know which algorithms produce the best ranking of web pages to a query.
Masrour Zoghi, PhD student at ILPS (Informatics Institute at the Faculty of Science), developed a new method that allows search engines to quickly and efficiently identify the best performing search algorithms. The results of his research, which is supported by the Netherlands Organisation for Scientific Research (NWO), will be presented at the leading international ‘Conference on Web Search and Data Mining’ (WSDM 2015) in Shanghai in February.
One way to compare the results of different algorithms is called interleaving, a method whereby the search engine analyses users' click behaviour to determine the algorithm that yields the most satisfactory results. The results of two search algorithms (A and B) are mixed, and the search engine then keeps track of which pages users click. If the user clicks on a page found by algorithm A, the search engine knows that A produces better results than B in this particular case. By scaling up this type of analysis (using millions of users), the search engine automatically learns which algorithm yields the best results.
Divide and conquer
A key challenge in interleaving is the fact that many hundreds of comparisons between different ranking algorithms have to be made before a search engine can learn which algorithm yields the best results for a given query. To address this challenge, Zoghi developed a new method that pre-selects which algorithms to compare according to a ‘divide and conquer’ strategy. This new method, titled ‘mergeRUCB,’ groups together similar algorithms, thus reducing the number of comparisons to be made and ensuring a quicker way for the search engine to learn which algorithms are best.
Power to the user
His experimental results show that his method enables search engines to more quickly identify the best algorithms, while at the same time giving the users a greater role in shaping the search results they obtain.