doi:10.1017/S0269888912000343

Ai H., Litman D.2008. Assessing dialog system user simulation evaluation measures using human judges. In Proceedings of the 46th Meeting of the Association for Computational Linguistics, Columbus, OH, USA, 622–629.

Ai H., Litman D.2009. Setting up user action probabilities in user simulations for dialog system development. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL), Singapore.

Anderson T.1962. On the distribution of the two-sample Cramér-von Mises criterion. Annals of Mathematical Statistics33(3), 1148–1159.

Carletta J.1996. Assessing agreement on classification tasks: the kappa statistic. Computational Linguistics22(2), 249–254.

Chandramohan S., Geist M., Lefèvre F., Pietquin O.2011. User Simulation in Dialogue Systems using Inverse Reinforcement Learning. In Proceedings of Interspeech 2011, Florence, Italy.

Cramer H.1928. On the composition of elementary errors. Second paper: statistical applications. Skandinavisk Aktuarietidskrift11, 171–180.

Cuayahuitl H., Renals S., Lemon O., Shimodaira H.2005. Human–computer dialogue simulation using hidden Markov models. In Proceedings of ASRU, 290–295. Cancun, Mexico

Cuayahuitl H.2009. Hierarchical Reinforcement Learning for Spoken Dialogue Systems. PhD thesis, University of Edinburgh, UK.

Doddington G.2002. Automatic evaluation of machine translation quality using n-gram cooccurrence statistics. In Proceedings of the Human Language Technology Conference (HLT), San Diego, CA, USA, 128–132.

Eckert W., Levin E., Pieraccini R.1997. User modeling for spoken dialogue system evaluation. In Proceedings of ASRU'97. Santa Barbara, USA.

Frampton M., Lemon O.2010. Recent research advances in reinforcement learning in spoken dialogue systems. The Knowledge Engineering Review24(4), 375–408.

Georgila K., Henderson J., Lemon O.2005. Learning user simulations for information state update dialogue systems. In Proceedings of Interspeech 2005. Lisboa, Portugal.

Georgila K., Henderson J., Lemon O.2006. User simulation for spoken dialogue systems: learning and evaluation. In Proceedings of Interspeech'06. Pittsburg, USA.

Janarthanam S., Lemon O.2009a. A data-driven method for adaptive referring expression generation in automated dialogue systems: maximising expected utility. In Proceedings of PRE-COGSCI 09. Boston, USA.

Janarthanam S., Lemon O.2009b. A two-tier user simulation model for reinforcement learning of adaptive referring expression generation policies. In Proceedings of SIGDIAL. London, UK.

Janarthanam S., Lemon O.2009c. Learning adaptive referring expression generation policies for spoken dialogue systems using reinforcement learning. In Proceedings of SEMDIAL. Stockholm, Sweden.

Janarthanam S., Lemon O.2009d. A Wizard-of-Oz environment to study referring expression generation in a situated spoken dialogue task. In Proceedings of ENLG, 2009. Athens, Greece.

Jung S., Lee C., Kim K., Jeong M., Lee G. G.2009. Data-driven user simulation for automated evaluation of spoken dialog systems. Computer Speech & Language23(4), 479–509.

Kullback S., Leiber R.1951. On information and sufficiency. Annals of Mathematical Statistics22, 79–86.

Levin E., Pieraccini R., Eckert W.1997. Learning dialogue strategies within the Markov decision process framework. In Proceedings of ASRU'97. Santa Barbara, USA.

Levin E., Pieraccini R., Eckert W.2000. A stochastic model of human–machine interaction for learning dialog strategies. IEEE Transactions on Speech and Audio Processing8(1), 11–23.

López-Cózar R., de la Torre A., Segura J., Rubio A.2003. Assesment of dialogue systems by means of a new simulation technique. Speech Communication40(3), 387–407.

Ng A. Y., Russell S.2000. Algorithms for inverse reinforcement learning. In Proceedings of 17th International Conference on Machine Learning. Morgan Kaufmann, 663–670.

Paek T., Pieraccini R.2008. Automating spoken dialogue management design using machine learning: an industry perspective. Speech Communication50, 716–729.

Papineni K., Roukos S., Ward T., Zhu W.2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, 311–318.

Pietquin O., Dutoit T.2006. A probabilistic framework for dialog simulation and optimal strategy learning. IEEE Transactions on Audio, Speech and Language Processing14(2), 589–599.

Pietquin O., Rossignol S., Ianotto M.2009. Training Bayesian networks for realistic man–machine spoken dialogue simulation. In Proceedings of the 1st International Workshop on Spoken Dialogue Systems Technology, Irsee, Germany, 4.

Pietquin O.2004. A Framework for Unsupervised Learning of Dialogue Strategies. PhD thesis, Faculté Polytechnique de Mons (FPMs), Belgium.

Pietquin O.2006. Consistent goal-directed user model for realisitc man–machine task-oriented spoken dialogue simulation. In Proceedingsof ICME'06. Toronto, Canada.

Rieser V.2008. Bootstrapping Reinforcement Learning-based Dialogue Strategies from Wizard-of-Oz data. PhD thesis, Saarland University, Department of Computational Linguistics.

Rieser V., Lemon O.2006. Simulations for learning dialogue strategies. In Proceedings of Interspeech 2006, Pittsburg, USA.

Rieser V., Lemon O.2008. Learning effective multimodal dialogue strategies from Wizard-of-Oz data: bootstrapping and evaluation. In Proceedings of ACL, 2008. Colombus, Ohio.

Russell S.1998. Learning agents for uncertain environments (extended abstract). In COLT’ 98: Proceedings of the 11th Annual Conference on Computational Learning Theory. ACM, 101–103. Madisson, USA.

Schatzmann J., Georgila K., Young S.2005a. Quantitative evaluation of user simulation techniques for spoken dialogue systems. In Proceedings of SIGdial'05. Lisbon, Portugal.

Schatzmann J., Stuttle M. N., Weilhammer K., Young S.2005b. Effects of the user model on simulation-based learning of dialogue strategies. In Proceedings of ASRU'05. Cancun, Mexico.

Schatzmann J., Thomson B., Weilhammer K., Ye H., Young S.2007a. Agenda-based user simulation for bootstrapping a POMDP dialogue system. In Proceedings of ICASSP'07. Honolulu, USA.

Schatzmann J., Thomson B., Young S.2007b. Statistical user simulation with a hidden agenda. In Proceedings of SigDial'07. Anvers, Belgium.

Schatzmann J., Weilhammer K., Stuttle M., Young S.2006. A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies. The Knowledge Engineering Review21(2), 97–126.

Scheffler K., Young S.2001. Corpus-based dialogue simulation for automatic strategy learning and evaluation. In Proceedings of NAACL Workshop on Adaptation in Dialogue Systems. Pittsburgh, PA, USA.

Singh S., Kearns M., Litman D., Walker M.1999. Reinforcement learning for spoken dialogue systems. In Proceedings of the NIPS'99. Vancouver, Canada.

Sutton R., Barto A.1998. Reinforcement Learning: An Introduction. MIT Press.

van Rijsbergen C. J.1979. Information Retrieval, second edn.Butterworths.

Walker M., Hindle D., Fromer J., Fabbrizio G. D., Mestel C.1997a. Evaluating competing agent strategies for a voice email agent. In Proceedings of the 5th European Conference on Speech Communication and Technology (Eurospeech'97), Rhodes, Greece.

Walker M., Litman D., Kamm C., Abella A.1997b. Paradise: a framework for evaluating spoken dialogue agents. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, 271–280. Madrid, Spain.

Williams J. D., Young S.2007. Partially observable Markov decision processes for spoken dialog systems. Computer Speech and Language21(2), 393–422.

Williams J., Poupart P., Young S.2005. Partially Observable Markov Decision Processes with Continuous Observations for Dialogue Management. In Proceedings of the SigDial Workshop (SigDial'06). Sydney, Australia.

Williams J.2008. Evaluating user simulations with the Cramer-von Mises divergence. Speech Communication50, 829–846.

Zukerman I., Albrecht D.2001. Predictive statistical models for user modeling. User Modeling and User-Adapted Interaction11, 5–18.