For best experience please turn on javascript and use a modern browser!
You are using a browser that is no longer supported by Microsoft. Please upgrade your browser. The site may not present itself correctly if you continue browsing.
Liang Telkamp, alumna of the Master’s programme Artificial Intelligence at the University of Amsterdam, has won the Amsterdam AI Thesis Prize for her thesis “Beyond PII: Contextually Sensitive Data Detection in Tabular Datasets. An LLM-powered framework for detecting sensitive information.” The thesis was supervised by Madelon Hulsebos of the Centrum Wiskunde & Informatica (CWI).

Protecting sensitive data

As more organisations share data online, it is important to detect and protect sensitive information. Many tools focus only on traditional personal data, like names or email addresses. Liang’s thesis goes further, looking at non personal sensitive data that could be risky and including the context to determine whether data is sensitive.

UN Humanitarian Data Exchange use-case

The framework, developed in collaboration with the United Nations on the Humanitarian Data Exchange (HDX) platform, helps detect sensitive information in humanitarian datasets, such as locations of hospitals or shelters, before the data is published. This makes data sharing safer and reduces the risk of misuse. It is now being implemented to support ongoing UN data-protection workflows.

How the framework works

Telkamp’s framework uses large language models (LLMs) in two ways:

  1.  Detect-then-reflect: The LLM first identifies columns that may contain sensitive data entities, like email addresses, phone numbers, or ID numbers. It then examines the whole table to determine if the data is truly sensitive. For example, numbers that look like IDs could actually be product codes and are not marked as sensitive.
  2.  Retrieve-then-detect: The LLM gathers relevant external knowledge, such as UN guidelines for handling humanitarian data, and uses this knowledge to identify sensitive information that is not traditional personal data, like locations of shelters or hospitals in conflict zones.

These two mechanisms work together to detect sensitive information accurately and safely. They also reduce false positives and consider both the table content and the rules and risks for this type of data.

Read the thesis here

Amsterdam Thesis Award

The Amsterdam AI Thesis Award is awarded to Bachelor’s and Master’s students who present exciting and innovative work in the field of AI and Data Science research. The Amsterdam AI Thesis Award is organised  to promote new AI students, AI research, encouraging diversity, and foster collaboration within the AI and Data Science communities.