For best experience please turn on javascript and use a modern browser!
You are using a browser that is no longer supported by Microsoft. Please upgrade your browser. The site may not present itself correctly if you continue browsing.
Imagine a research community where everyone studies concrete but can’t easily use each other’s data. Material scientists test new forms of concrete, process engineers design more sustainable production methods, and civil engineers analyse how the latest materials affect bridges and buildings. They all generate valuable datasets – but each in different formats, with different methods, and using different technical jargon. Even within this single domain, reusing each other’s datasets is surprisingly difficult. A new NWO funded project, BridgeMD, aims to bridge these gaps using generative AI, so that data can more easily flow between disciplines – and beyond academia.

Turning AI into a “data explainer”

Modern research produces huge volumes of data. To reuse these datasets, others need good metadata: clear descriptions of what the data contains, how it was collected, and what its limitations are. In practice, this documentation is often incomplete, very technical, or written only for a narrow group of experts.

The new two‑year project, led by the UvA Intelligent Data Engineering Lab (INDElab), will use generative AI (GenAI) – AI systems that can automatically generate text – to create tailored metadata and explanations for different audiences: specialists in the same field, researchers from other disciplines, students, and policy makers.

These AI tools will be integrated into widely used open‑source data repositories, so that richer metadata and clearer descriptions become available exactly where people search for data.

Beyond concrete: connecting machine learning and plant science data

The approach will be tested first in three areas: construction materials, machine learning, and plant sciences. The construction materials community, with its mix of material scientists, process engineers and civil engineers, is a prime example of how diverse methods and vocabularies can block data reuse – even when everyone studies closely related questions.

Copyright: UvA
This project will make datasets produced in one domain significantly more reusable across scientific fields and beyond, including for students, policy makers, and other societal stakeholders. By harnessing state of the art AI it enhances the value of existing data, supports ongoing investments in FAIR data Paul Groth, Professor of Data Science and project coordinator

The project has a total budget of €538,756, of which €260,000 goes to the University of Amsterdam. From UvA, Prof. Paul Groth (Professor of Data Science) and Dr Daphne Miedema (assistant professor) are involved. Partners include TU/e, TU Delft, Wageningen University & Research, Utrecht University Library, the UvA University Library, and SURF.

More information

Read the journal article that was the inspiration for this project