For best experience please turn on javascript and use a modern browser!
You are using a browser that is no longer supported by Microsoft. Please upgrade your browser. The site may not present itself correctly if you continue browsing.
Two Master AI students, Konstantinos Papakostas and Irene Papadopoulou, have recently published a short paper at the highly regarded Association for Computational Linguistics (ACL'23) conference. Working under the guidance of Professor Evangelos Kanoulas of the Information Retrieval Lab (IRLab), the students developed the project initially during the "Information Retrieval 2" elective course in their second year.
ASQA evaluation, Wikipedia
ASQA evaluation, Wikipedia

How Do Language Models Handle Ambiguous Question Answering?

Previous research (Krishna et al. 2021) has demonstrated that language models can face difficulties when generating long-form answers that are both coherent and accurate, which indicates the need for further research before they can be reliably deployed in production. To gain insights into the limitations of current approaches, the authors examined how both dataset and model scaling affect the fluency and the comprehensiveness of question-answering (QA) systems. The findings suggest that larger models are more capable of handling complex questions, but due to the limited resources available for ambiguous QA, it remains a challenging domain.

The authors also investigated the alignment of automated metrics with human judgement. While simple string-matching metrics can be easily manipulated, the use of recently proposed metrics (Stelmakh et al. 2022) that rely on neural models to evaluate performance provides a more reliable assessment. Finally, the authors addressed the issue of hallucinations, where models generate responses that is not grounded in facts. They found that this phenomenon is not as prevalent due to the complexity of the task, which makes it difficult for fabricated answers to emerge from a medium-sized model's internal knowledge representation.

Overall, the paper contributes to the ongoing research on the limitations of language modeling and emphasizes the need for further improvements in the performance of these systems.