Malevolent Dialogue Response Detection and Evaluation
Abstract
Dialogue systems have been adopted in different domains and they interact with users in daily life. Dialogue generation methods have emerged from early rule-based and retrieval-based methods as corpus-based methods. Corpus-based conversational agents can generate more diverse and natural responses than template-based or retrieval-based agents. With the increased generative capacity of corpus-based conversational agents comes the need to detect and evaluate malevolent responses that are inappropriate in terms of content and dialogue acts. In this thesis, we first analyze malevolence problem of the state-of-the-art dialogue generation models, including both pre-trained generation models and sequence to sequence (S2S)-based generation models. Second, we advance research on the malevolent dialogue response detection and classifying (MDRDC) task. We define the task and build a hierarchical malevolent dialogue taxonomy (HMDT). We create a labeled multi-turn dialogue dataset and formulate the MDRDC task as a hierarchical classification task. We present a confidence-based classification model that beats the baselines for single-label dialogue malevolence detection. Third, we propose the task of multi-label dialogue malevolence detection and crowdsource a multi-label dataset, multi-label dialogue malevolence detection (MDMD), for multi-label dialogue malevolence detection from a single-label training set. Experiments conducted on MDMD show that MCRF method outperforms the best-performing baseline by a large margin. Finally, we propose a human-machine collaborative evaluation (HMCEval) framework for dialogue malevolence evaluation. Our experimental results show that HMCEval achieves around 99% evaluation accuracy with half of the human effort spared, showing that HMCEval provides reliable evaluation outcomes while reducing human effort by a large amount.