Computer predicts popularity of pictures on twitter
An interview with Thomas Mensink
Photos of cats are extremely popular on websites like Twitter, Facebook and Instagram. But why pictures are popular online is actually not very clear. Thomas Mensink and his colleagues Cees Snoek and Spencer Capallo from the UvA Informatics Institute attempt to unravel this mystery with a new computer algorithm.
Where does this fascination with Internet pictures come from?
'It's still difficult to predict whether a picture will become popular on the social networks, or not, based purely on the visual content," says Thomas Mensink (33), computer scientist. 'We are trying to unravel the properties of a picture that cause it to go viral online. To do this, we have created a computer model, that automatically searches for patterns, both in pictures that turn out to be popular and in images that hardly anyone looks at. We already know that pictures of cats, cars and athletes score well. But I'm looking for other, hidden properties that we don't yet know about.'
Does the computer model already work a bit?
'To my own amazement, we have already made significant progress. I see that our computer model reveals factors that determine whether or not a picture is popular. We don't tell the computer model in advance what makes the pictures the software is analysing popular. The aim is that the software will discover objective properties that popular pictures have in common itself. We run the computer model with a dataset of a million photos from Twitter, Flickr and Facebook. For some of these, we don't know at that moment which pictures are popular. We only check afterwards whether the images the computer suggests are in fact popular, or not.'
How do you measure a subjective concept like popularity?
'Popularity is indeed subjective. But we can objectively count the number of 'likes' and 'shares' on a photo. I use two objective measures here. One is whether a photo is actively popular, or is shared with other people and commented on. The other is whether a picture is passively popular. Then people just look at it, or give it a 'like'. We now know that actively popular pictures have different content properties than pictures viewed passively. But as a researcher, I'm remaining cautious. Because of course we don't know whether our definition of popularity is correct. So we don't know whether our computer model is good enough either. This is all still at a basic research stage. The study is financed by COMMIT/, the public-private ICT research programme, and falls under the SealincMedia project.'
Which hidden properties have you revealed so far?
'Cats and dogs do feature. But our computer model also points to pictures of a starry sky and beautiful landscapes. Cartoon characters score well too, as do celebrities, fashion photos and old-fashioned black and white photographs. Manga animations are also shared constantly.'
How does a computer model like this work?
'Our model teaches itself, as we continuously add feedback into the software. The model has now identified between ten and twenty factors for popularity. We divide the datasets for these millions of images in two. One set is called the training set. For these images, we know which are popular and which are not. We use this set to teach our model what scores well and what doesn't. The second set is the test set. We use this to test the computer model, to check whether the software can guess correctly. So our question is: can the model accurately predict whether a picture will be liked and shared? We do the same with pictures that are not popular. Can the model select these too? It seems that photos showing food, like bread, pancakes or a smoothie, do not score well on social media.'
'We then compare the model's predictions to the actual measured popularity. This allows us to refine the computer model. The results are much better than could be expected on the basis of coincidence. Of course this is also due to the vastly improved calculating power of computers.'
'Almost faultless prediction of what is in a picture is approaching rapidly. I didn't expect that myself, but I did hope. That's why I began this study. I expect that, within five years, the computer model will be able to predict the popularity of images on the Internet reasonably well. This could lead to practical applications, in the advertising world, for example. Or perhaps when you take a photo, the software in your smartphone will soon be able to advise you how to optimise it for popularity.'