12 June 2026
Modern research produces huge volumes of data. To reuse these datasets, others need good metadata: clear descriptions of what the data contains, how it was collected, and what its limitations are. In practice, this documentation is often incomplete, very technical, or written only for a narrow group of experts.
The new two‑year project, led by the UvA Intelligent Data Engineering Lab (INDElab), will use generative AI (GenAI) – AI systems that can automatically generate text – to create tailored metadata and explanations for different audiences: specialists in the same field, researchers from other disciplines, students, and policy makers.
These AI tools will be integrated into widely used open‑source data repositories, so that richer metadata and clearer descriptions become available exactly where people search for data.
The approach will be tested first in three areas: construction materials, machine learning, and plant sciences. The construction materials community, with its mix of material scientists, process engineers and civil engineers, is a prime example of how diverse methods and vocabularies can block data reuse – even when everyone studies closely related questions.
This project will make datasets produced in one domain significantly more reusable across scientific fields and beyond, including for students, policy makers, and other societal stakeholders. By harnessing state of the art AI it enhances the value of existing data, supports ongoing investments in FAIR dataPaul Groth, Professor of Data Science and project coordinator
The project has a total budget of €538,756, of which €260,000 goes to the University of Amsterdam. From UvA, Prof. Paul Groth (Professor of Data Science) and Dr Daphne Miedema (assistant professor) are involved. Partners include TU/e, TU Delft, Wageningen University & Research, Utrecht University Library, the UvA University Library, and SURF.
Read the journal article that was the inspiration for this project