PACIS 2020 Proceedings

Mining Conversation Data for Reward Estimation in Dialog Policy Learning

Anh Duy Nguyen, Swinburne University of TechnologyFollow
Minyi Li, RMIT UniversityFollow
Bao Quoc Vo, Swinburne University of TechnologyFollow

Abstract

Reinforcement Learning approaches are commonly used for dialog policy learning. Reward function is an important part of RL algorithms which affects the training and quality of the policy. Handcrafted reward functions have been replaced by machine-learned reward functions in recent approaches with promising results. Such reward models compare agent actions with human actions, more human-like agent actions receive higher rewards. Reward models so far consider only the latest dialog turn when computing reward for agent action. In this paper, we hypothesize that using a sequence of turns to decide next agent action is more beneficial. Towards this claim we mine for common patterns in human-human task-oriented dialog data. The experiment results suggest that there are obvious patterns i.e., human-human communication in task-oriented dialogs follows some common sequences of actions. Such patterns can be potentially incorporated into reward models to train agents that could better imitate human behaviors.

Recommended Citation

Nguyen, Anh Duy; Li, Minyi; and Vo, Bao Quoc, "Mining Conversation Data for Reward Estimation in Dialog Policy Learning" (2020). PACIS 2020 Proceedings. 109.
https://aisel.aisnet.org/pacis2020/109

Download

COinS

When commenting on articles, please be friendly, welcoming, respectful and abide by the AIS eLibrary Discussion Thread Code of Conduct posted here.

PACIS 2020 Proceedings

Mining Conversation Data for Reward Estimation in Dialog Policy Learning

Abstract

Recommended Citation

Search

Links

Browse

Author Corner

PACIS 2020 Proceedings

Mining Conversation Data for Reward Estimation in Dialog Policy Learning

Authors

Abstract

Recommended Citation

Share

Search

Links

Browse

Author Corner