Search
2009 Volume 24
Article Contents
RESEARCH ARTICLE   Open Access    

Recent research advances in Reinforcement Learning in Spoken Dialogue Systems

More Information
  • Corresponding authors: Matthew Frampton ;  Oliver Lemon

Article Metrics

Article views(18) PDF downloads(526)

Other Articles By Authors

RESEARCH ARTICLE   Open Access    

Recent research advances in Reinforcement Learning in Spoken Dialogue Systems

  • Corresponding authors: Matthew Frampton ;  Oliver Lemon
The Knowledge Engineering Review  24 Article number: 10.1017/S0269888909990166  (2009)  |  Cite this article

Abstract: Abstract: This paper will summarize and analyze the work of the different research groups who have recently made significant contributions in using Reinforcement Learning techniques to learn dialogue strategies for Spoken Dialogue Systems (SDSs). This use of stochastic planning and learning has become an important research area in the past 10 years, since it promises automatic data-driven optimization of the behavior of SDSs that were previously hand-coded by expert developers. We survey the most important developments in the field, compare and contrast the different approaches, and describe current open problems.

    • For in-depth discussions of technical details such as temporal difference learning, Monte Carlo learning, eligibility traces, and Q-values, we refer the reader to Sutton and Barto (1998).

    • Supervised Learning (SL) algorithms are machine-learning algorithms, which generate a function that maps inputs to desired outputs.

    • A CL is a number between 0 and 1 based on acoustic measurements and defines how sure the system is to have performed correct recognition.

    • Multivariate linear regression (see page 1433 of Sheskin (2007)) models numerical data by a least squares function which is a linear combination of the model parameters and depends on >1 independent variables. A least squares function fits a model so that the sum of the squared residuals has its least value, a residual being the difference between an observed value and the value given by the model.

    • R2, the ‘coefficient of determination’ (see page 1230 of Sheskin (2007)), is the proportion of variability in a data set that is accounted for by a statistical model. R2 = 1 indicates that the fitted model explains all variability, R2 = 0, no ‘linear’ relationship between the dependent and independent variables, and R2 = 0.39, that approximately 39% of the variation in the dependent variable can be explained by the independent variables, and the remaining 61% by unknown variables/inherent variability.

    • In an n-fold cross-validation, the data is first divided into n (usually equal-sized) portions, and then in each of n folds, a different one of these portions is used for testing, while the remainder of the data is used for training. Results are averaged across the n folds.

    • A speech recognizer may provide a list, in order, of its top n hypotheses for a user utterance according to their CLs.

    • This is obviously also a very important issue for how best to evaluate the accuracy of user simulations.

    • If i = 1, i − 1 is considered to be the final slot, and if i is the final slot, i+ 1 is considered to be the first slot, for example, ‘So you want to fly from Edinburgh to where?’.

    • Utterances that cannot be handled by the system.

    • If there are no unfilled/unconfirmed slots to switch focus to, the strategy continues to follow DA2.

    • Copyright © Cambridge University Press 20092009Cambridge University Press
References (55)
  • About this article
    Cite this article
    Matthew Frampton, Oliver Lemon. 2009. Recent research advances in Reinforcement Learning in Spoken Dialogue Systems. The Knowledge Engineering Review. 24:166 doi: 10.1017/S0269888909990166
    Matthew Frampton, Oliver Lemon. 2009. Recent research advances in Reinforcement Learning in Spoken Dialogue Systems. The Knowledge Engineering Review. 24:166 doi: 10.1017/S0269888909990166
  • Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return