For best experience please turn on javascript and use a modern browser!
You are using a browser that is no longer supported by Microsoft. Please upgrade your browser. The site may not present itself correctly if you continue browsing.
The Spotlight introduces a different Data Science Centre Affiliate Member every month. This month: Erkan Karabulut, PhD student at the Intelligent Data Engineering Lab (INDElab) of the Informatics Institute, Faculty of Science.

Tell us more about your role and how you apply data science to your projects.

I am developing new methods for discovering human-understandable patterns from data. We do this for knowledge discovery, that is, finding previously unknown, interesting, and non-trivial patterns in the data. These patterns can also be used in interpretable decision-making models for high-stakes scenarios. An example of that is in biomedical use cases, such as patient diagnosis, where models are required to provide an explicit reasoning. One pattern in that domain from a hepatitis dataset is “if the patient has taken antivirals and shows no signs of anorexia (a type of eating disorder), then the survival rate is X percent”. 

Traditionally, this has been done with data mining algorithms, where people have developed smart ways of scanning the data to discover patterns based on the co-occurrence of values. We are now trying to improve algorithmic methods with more efficient neural network-based models. The advantage of that is twofold. One is that we can learn patterns much faster as the neural networks scale better. This also means that it is now affordable to apply knowledge discovery on bigger datasets. And the second is that we can identify non-trivial patterns more effectively than the trivial ones. 

Is there a project from this past year that you are most proud of? 

During a research visit to Amsterdam University Medical Center (UMC), I had the opportunity to put our methods to the test in a real-world use case. The standard way of evaluating methods like ours would be a statistical validation. But in this case, we are learning patterns from patients’ blood count measurements to identify previously unknown relations between matters in the blood and diagnoses. 

I am collaborating with clinical chemists and have developed user interfaces for them to have easy access to our methods. With an initial sanity check, we have quickly seen that our methods are able recover many patterns that are used in practice, such as high white blood cells in the blood are associated with infections. And now we are looking into other patterns we discovered and trying to make sense of them from a biomedical perspective. 

What do you like most about being a DSC member? 

DSC brings together a diverse set of people with unique perspectives on solving real problems. I am taking inspiration from how people in different communities use data science to address issues that I would never have known existed before. Knowing that every novel idea is a fusion of existing ideas, this always gives me new perspectives on how to approach my research! 

What is your favourite data science method? 

As I have worked on knowledge discovery methods since the beginning of my PhD, I would say that such methods are my favorites. This is simply because of the value that lies in developing more efficient and effective knowledge discovery models. They can be applied to any domain, with healthcare being the most critical one to me, where people work with tabular data, graph-shaped data, sequential data, or text. 

Are you camp Python/R/or something else? 

For every method we developed, I made sure that there was an easy-to-use Python interface for practitioners. That is simply because Python is more widely used, and we wanted to make our methods available to a wider audience. However, I am not necessarily camp Python. I would use whatever language suits best to the task at hand.