7 May 2026
I am developing new methods for discovering human-understandable patterns from data. We do this for knowledge discovery, that is, finding previously unknown, interesting, and non-trivial patterns in the data. These patterns can also be used in interpretable decision-making models for high-stakes scenarios. An example of that is in biomedical use cases, such as patient diagnosis, where models are required to provide an explicit reasoning. One pattern in that domain from a hepatitis dataset is “if the patient has taken antivirals and shows no signs of anorexia (a type of eating disorder), then the survival rate is X percent”.
Traditionally, this has been done with data mining algorithms, where people have developed smart ways of scanning the data to discover patterns based on the co-occurrence of values. We are now trying to improve algorithmic methods with more efficient neural network-based models. The advantage of that is twofold. One is that we can learn patterns much faster as the neural networks scale better. This also means that it is now affordable to apply knowledge discovery on bigger datasets. And the second is that we can identify non-trivial patterns more effectively than the trivial ones.
During a research visit to Amsterdam University Medical Center (UMC), I had the opportunity to put our methods to the test in a real-world use case. The standard way of evaluating methods like ours would be a statistical validation. But in this case, we are learning patterns from patients’ blood count measurements to identify previously unknown relations between matters in the blood and diagnoses.
I am collaborating with clinical chemists and have developed user interfaces for them to have easy access to our methods. With an initial sanity check, we have quickly seen that our methods are able recover many patterns that are used in practice, such as high white blood cells in the blood are associated with infections. And now we are looking into other patterns we discovered and trying to make sense of them from a biomedical perspective.
DSC brings together a diverse set of people with unique perspectives on solving real problems. I am taking inspiration from how people in different communities use data science to address issues that I would never have known existed before. Knowing that every novel idea is a fusion of existing ideas, this always gives me new perspectives on how to approach my research!
As I have worked on knowledge discovery methods since the beginning of my PhD, I would say that such methods are my favorites. This is simply because of the value that lies in developing more efficient and effective knowledge discovery models. They can be applied to any domain, with healthcare being the most critical one to me, where people work with tabular data, graph-shaped data, sequential data, or text.
For every method we developed, I made sure that there was an easy-to-use Python interface for practitioners. That is simply because Python is more widely used, and we wanted to make our methods available to a wider audience. However, I am not necessarily camp Python. I would use whatever language suits best to the task at hand.