Search
2012 Volume 27
Article Contents
RESEARCH ARTICLE   Open Access    

A review of machine learning for automated planning

More Information
  • Abstract: Recent discoveries in automated planning are broadening the scope of planners, from toy problems to real applications. However, applying automated planners to real-world problems is far from simple. On the one hand, the definition of accurate action models for planning is still a bottleneck. On the other hand, off-the-shelf planners fail to scale-up and to provide good solutions in many domains. In these problematic domains, planners can exploit domain-specific control knowledge to improve their performance in terms of both speed and quality of the solutions. However, manual definition of control knowledge is quite difficult. This paper reviews recent techniques in machine learning for the automatic definition of planning knowledge. It has been organized according to the target of the learning process: automatic definition of planning action models and automatic definition of planning control knowledge. In addition, the paper reviews the advances in the related field of reinforcement learning.
  • 加载中
  • Aler R., Borrajo D., Isasi P.2002. Using genetic programming to learn and improve control knowledge. Artificial Intelligence141(1–2), 29–56.

    Google Scholar

    Amir E., Chang A.2008. Learning partially observable deterministic action models. Journal of Artificial Intelligence Research33, 349–402.

    Google Scholar

    Bacchus F., Kabanza F.2000. Using temporal logics to express search control knowledge for planning. Artificial Intelligence116(1–2), 123–191.

    Google Scholar

    Barto A., Duff M.1994. Monte carlo matrix inversion and reinforcement learning. Advances in Neural Information Processing Systems 6, 687–694.

    Google Scholar

    Bellingham J., Rajan K.2007. Robotics in remote and hostile environments. Science318(5853), 1098–1102.

    Google Scholar

    Bellman R., Kalaba R.1965. Dynamic Programming and Modern Control Theory. Academic Press.

    Google Scholar

    Benson S. S.1997. Learning Action Models for Reactive Autonomous Agents. PhD thesis, Stanford University.

    Google Scholar

    Bergmann R., Wilke W.1996. PARIS: flexible plan adaptation by abstraction and refinement. In Workshop on Adaptation in Case-Based Reasoning, ECAI-96.

    Google Scholar

    Bertsekas D. P.1995. Dynamic Programming and Optimal Control. Athena Scientific.

    Google Scholar

    Bertsekas D. P., Tsitsiklis J. N.1996. Neuro-Dynamic Programming (Optimization and Neural Computation Series, 3). Athena Scientific.

    Google Scholar

    Blockeel H., De Raedt L.1998. Top-down induction of first-order logical decision trees. Artificial Intelligence101, 285–297.

    Google Scholar

    Blockeel H., Raedt L. D., Ramong J.1998. Top-down induction of clustering trees. In Proceedings of the 15th International Conference on Machine Learning, San Francisco, CA, USA.

    Google Scholar

    Blum A. L., Furst M. L.1995. Fast planning through planning graph analysis. Artificial Intelligence90(1), 1636–1642.

    Google Scholar

    Bonet B., Geffner H.2001. Planning as heuristic search. Artificial Intelligence129(1–2), 5–33.

    Google Scholar

    Borrajo D., Veloso M.1997. Lazy incremental learning of control knowledge for efficiently obtaining quality plans. AI Review Journal—Special Issue on Lazy Learning11(1–5), 371–405.

    Google Scholar

    Botea A., Enzenberger M., Mller M., Schaeffer J.2005a. Macro-FF: improving AI planning with automatically learned macro-operators. Journal of Artificial Intelligence Research24, 581–621.

    Google Scholar

    Botea A., Müller M., Schaeffer J.2005b. Learning partial-order macros from solutions. In ICAPS 2005. Proceedings of the 15th International Conference on Automated Planning and Scheduling, Biundo, S., Myers, K. & Rajan, K. (eds). Monterey, California, 231–240.

    Google Scholar

    Botea A., Müller M., Schaeffer J.2007. Fast planning with iterative macros. In Proceedings of the International Joint Conference on Artificial Intelligence IJCAI-07, 1828–1833.

    Google Scholar

    Boutilier C., Reiter R., Price B.2001. Symbolic dynamic programming for first-order MDPs. In International Joint Conference on Artificial Intelligence, Seattle, Washington, USA.

    Google Scholar

    Brazdil P., Giraud-Carrier C., Soares C., Vilalta R.2009. Metalearning: Applications to Data Mining—Cognitive Technologies. Springer.

    Google Scholar

    Bresina J. L., Jansson A. K., Morris P. H., Rajan K.2005. Mixed-initiative activity planning for mars rovers. In IJCAI, Edinburgh, Scotland, UK, 1709–1710.

    Google Scholar

    Bui H. H., Venkatesh S., West G.2002. Policy recognition in the abstract hidden Markov model. Journal of Artificial Intelligence Research17, 451–499.

    Google Scholar

    Bulitko V., Lee G.2006. Learning in real-time search: a unifying framework. Journal of Artificial Intelligence Research25, 119–157.

    Google Scholar

    Bylander T.1991. Complexity results for planning. In International Joint Conference on Artificial Intelligence, IJCAI-91, Sydney, Australia.

    Google Scholar

    Bylander T.1994. The computational complexity of propositional STRIPS planning. Artificial Intelligence69(1–2), 165–204.

    Google Scholar

    Castillo L., Fdez-Olivares J., García-Pérez O., Palao F.2006. Bringing users and planning technology together. Experiences in SIADEX. In International Conference on Automated Planning and Scheduling (ICAPS 2006), Cumbria, UK.

    Google Scholar

    Charniak E., Goldman R. P.1993. A bayesian model of plan recognition. Artificial Intelligence64(1), 53–79.

    Google Scholar

    Cohen W. W.1990. Learning approximate control rules of high utility. In International Conference on Machine Learning, Austin, Texas, USA.

    Google Scholar

    Coles A., Smith A.2007. Marvin: a heuristic search planner with online macro-action learning. Journal of Artificial Intelligence Research28, 119–156.

    Google Scholar

    Cortellessa G., Cesta A.2006. Evaluating mixed-initiative systems: an experimental approach. In Proceedings of the 16th International Conference on Automated Planning & Scheduling, ICAPS-06, Cumbria, UK.

    Google Scholar

    Cresswell S., McCluskey T. L., West M.2009. Acquisition of object-centred domain models from planning examples. In Proceedings of the 19th International Conference on Automated Planning and Scheduling (ICAPS-09), Thessaloniki, Greece.

    Google Scholar

    Croonenborghs T., Driessens K., Bruynooghe M.2007a. Learning relational options for inductive transfer in relational reinforcement learning. In Proceedings of the 17th Conference on Inductive Logic Programming, Corvallis, OR, USA.

    Google Scholar

    Croonenborghs T., Ramon J., Blockeel H., Bruynooghe M.2007b. Online learning and exploiting relational models in reinforcement learning. In Proceedings of the 20th International Joint Conference on Artificial Intelligence. AAAI Press, 726–731.

    Google Scholar

    Cussens J.2001. Parameter estimation in stochastic logic programs. Machine Learning44(3), 245–271.

    Google Scholar

    Dawson C., Silklossly L.1977. The role of preprocessing in problem solving system. In International Joint Conference on Artificial Intelligence, IJCAI-77, Cambridge, MA, USA, 465–471.

    Google Scholar

    de la Rosa T., García-Olaya A., Borrajo D.2007. Using cases utility for heuristic planning improvement. In International Conference on Case-Based Reasoning, Belfast, Northern Ireland.

    Google Scholar

    de la Rosa T., Jiménez S., Borrajo D.2008. Learning relational decision trees for guiding heuristic planning. In International Conference on Automated Planning and Scheduling (ICAPS 08), Sydney, Australia.

    Google Scholar

    de la Rosa T., Jiménez S., García-Durán R., Fernández F., García-Olaya A., Borrajo D.2009. Three relational learning approaches for lookahead heuristic planning. In Working Notes of the ICAPS'09 Workshop on Planning and Learning, Thessaloniki, Greece.

    Google Scholar

    Driessens K., Matwin S.2004. Integrating guidance into relational reinforcement learning. Machine Learning57, 271–304.

    Google Scholar

    Driessens K., Ramon J.2003. Relational instance based regression for relational reinforcement learning. In International Conference on Machine Learning, Washington, DC, USA.

    Google Scholar

    Dzeroski S., Raedt L. D., Driessens K.2001. Relational reinforcement learning. Machine Learning43, 7–52.

    Google Scholar

    Edelkamp S.2002. Symbolic pattern databases in heuristic search planning. In International Conference on Automated Planning and Scheduling, Toulouse, France.

    Google Scholar

    Ernst G. W., Newell A.1969. GPS: A Case Study in Generality and Problem Solving, ACM Monograph Series. Academic Press.

    Google Scholar

    Erol K., Nau D. S., Subrahmanian V. S.1992. On the complexity of domain-independent planning. Artificial Intelligence56, 223–254.

    Google Scholar

    Estlin T. A., Mooney R. J.1996. Hybrid learning of search control for partial-order planning. In In New Directions in AI Planning. IOS Press, 115–128.

    Google Scholar

    Etzioni O.1993. Acquiring search-control knowledge via static analysis. Artificial Intelligence62(2), 255–301.

    Google Scholar

    Ferguson G., Allen J. F., Miller B.1996. Trains-95: towards a mixed-initiative planning assistant. In International Conference on Artificial Intelligence Planning Systems, AIPS96, Edinburgh, UK. AAAI Press, 70–77.

    Google Scholar

    Fern A., Yoon S., Givan R.2004. Learning domain-specific control knowledge from random walks. In International Conference on Automated Planning and Scheduling, Whistler, Canada, 191–199.

    Google Scholar

    Fern A., Yoon S. W., Givan R.2006. Approximate policy iteration with a policy language bias: solving relational Markov decision processes. Journal of Artificial Intelligence Research25, 75–118.

    Google Scholar

    Fikes R., Hart P., Nilsson N. J.1972. Learning and executing generalized robot plans. Artificial Intelligence3, 251–288.

    Google Scholar

    Fikes R., Nilsson N. J.1971. STRIPS: a new approach to the application of theorem proving to problem solving. Artificial Intelligence2, 189–208.

    Google Scholar

    Florez J. E., Garca J., Torralba A., Linares C., Garca-Olaya A., Borrajo D.2010. Timiplan: an application to solve multimodal transportation problems. In Proceedings of SPARK, Scheduling and Planning Applications woRKshop, ICAPS'10, Toronto, Canada.

    Google Scholar

    Fox M., Long D.2003. PDDL2.1: an extension to PDDL for expressing temporal planning domains. Journal of Artificial Intelligence Research, 61–124.

    Google Scholar

    Fuentetaja R., Borrajo D.2006. Improving control-knowledge acquisition for planning by active learning. In European Conference on Learning, Berlin, Germany, 138–149.

    Google Scholar

    García-Durán R., Fernández F., Borrajo D.2006. Combining macro-operators with control knowledge. In ILP, Santiago de Compostela, Spain.

    Google Scholar

    García-Durán R., Fernández F., Borrajo D. (2012). A prototype-based method for classification with time constraints: a case study on automated planning. Pattern Analysis and Applications Journal15(3), 261–277.

    Google Scholar

    García-Martínez R., Borrajo D.2000. An integrated approach of learning, planning, and execution. Journal of Intelligent and Robotics Systems29, 47–78.

    Google Scholar

    Gartner T., Driessens K., Ramon J.2003. Graph kernels and Gaussian processes for relational reinforcement learning. In International Conference on Inductive Logic Programming, ILP 2003, Szeged, Hungary.

    Google Scholar

    Gerevini A., Saetti A., Vallati M.2009a. An automatically configurable portfolio-based planner with macro-actions: PbP. In Proceedings of the 19th International Conference on Automated Planning and Scheduling (ICAPS-09), Thessaloniki, Greece.

    Google Scholar

    Gerevini A. E., Haslum P., Long D., Saetti A., Dimopoulos Y.2009b.Deterministic planning in the fifth international planning competition: Pddl3 and experimental evaluation of the planners. Artificial Intelligence173(5–6), 619–668.

    Google Scholar

    Ghallab M., Nau D., Traverso P.2004. Automated Planning Theory and Practice. Morgan Kaufmann.

    Google Scholar

    Gil Y.1992. Acquiring Domain Knowledge for Planning by Experimentation. PhD thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh.

    Google Scholar

    Gretton C., Thiébaux S.2004. Exploiting first-order regression in inductive policy selection. In Conference on Uncertainty in Artificial Intelligence, Banff, Canada.

    Google Scholar

    Helmert M.2009. Concise finite-domain representations for pddl planning tasks. Artificial Intelligence.

    Google Scholar

    Hernández C., Meseguer P.2007. Improving LRTA*(k). In International Joint Conference on Artificial Intelligence, IJCAI-07, Hyderabad, India, 2312–2317.

    Google Scholar

    Hoffmann J., Nebel B.2001a. The FF planning system: fast plan generation through heuristic search. Journal of Artificial Intelligence Research14, 253–302.

    Google Scholar

    Hoffmann J., Nebel B.2001b. The FF planning system: fast plan generation through heuristic search. Journal of Artificial Intelligence Research14, 253–302.

    Google Scholar

    Hogg C., Muñoz-Avila H., Kuter U.2008. HTN-MAKER: learning HTNs with minimal additional knowledge engineering required. In National Conference on Artificial Intelligence (AAAI'2008), Chicago, Illinois, USA.

    Google Scholar

    Hogg C., Kuter U., Muñoz-Avila H.2009. Learning hierarchical task networks for nondeterministic planning domains. In International Joint Conference on Artificial Intelligence, IJCAI-09, Pasadena, CA, USA.

    Google Scholar

    Howe A. E., Dahlman E., Hansen C., Scheetz M., Mayrhauser A. V.1999. Exploiting competitive planner performance. In Proceedings of the 5th European Conference on Planning, Durham, UK.

    Google Scholar

    Ilghami O., Nau D. S., Muñoz-Avila H.2002. CaMeL: learning method preconditions for HTN planning. In Proceedings of the 6th International Conference on AI Planning and Scheduling, Toulouse, France. AAAI Press, 131–141.

    Google Scholar

    Ilghami O., Muñoz-Avila H., Nau D. S., Aha D. W.2005. Learning approximate preconditions for methods in hierarchical plans. In International Conference on Machine Learning, Bonn, Germany.

    Google Scholar

    Ilghami O., Nau D. S., Muñoz-Avila H.2006. Learning to do HTN planning. In International Conference on Automated Planning and Scheduling, ICAPS 2006, Cumbria, UK.

    Google Scholar

    Jaeger M.1997. Relational bayesian networks. In Conference on Uncertainty in Artificial Intelligence, Rhode Island, Providence, USA.

    Google Scholar

    Jiménez S., Fernández F., Borrajo D.2008. The PELA architecture: integrating planning and learning to improve execution. In Proceedings of the 23rd AAAI Conference on Artificial Intelligence (AAAI-08), Chicago, IL, USA.

    Google Scholar

    Kaelbling L. P., Littman M. L., Moore A. P.1996. Reinforcement learning: a survey. Journal of Artificial Intelligence Research4, 237–285.

    Google Scholar

    Kambhampati S.2007. Model-lite planning for the web age masses: the challenges of planning with incomplete and evolving domain models. In: Senior Member Track of the AAAI. AAAI Press/MIT Press.

    Google Scholar

    Kambhampati S., Hendler J. A.1992. A validation structure-based theory of plan modification and reuse. Artificial Intelligence Journal55, 193–258.

    Google Scholar

    Keller R.1987. The Role of Explicit Contextual Knowledge in Learning Concepts to Improve Performance. PhD thesis, Rutgers University.

    Google Scholar

    Kersting K., Raedt L. D.2001. Towards combining inductive logic programming with Bayesian networks. In International Conference on Inductive Logic Programming, Strasbourg, France, 118–131.

    Google Scholar

    Khardon R.1999. Learning action strategies for planning domains. Artificial Intelligence113, 125–148.

    Google Scholar

    Kittler J.1998. Combining classifiers: A theoretical framework. Pattern Analysis and Application1(1), 18–27.

    Google Scholar

    Korf R. E.1985. Macro-operators: a weak method for learning. Artificial Intelligence26, 35–77.

    Google Scholar

    Korf R. E.1990. Real-time heuristic search. Artificial Intelligence42(2–3), 189–211.

    Google Scholar

    Lanchas J., Jiménez S., Fernández F., Borrajo D.2007. Learning action durations from executions. In Working notes of the ICAPS'07 Workshop on AI Planning and Learning, Rhode Island, Providence, USA.

    Google Scholar

    Larkin J., Reif F., Carbonell J.1988. FERMI: a flexible expert reasoner with multi-domain inference. Cognitive Science12(1), 101–138.

    Google Scholar

    Leckie C., Zukerman I.1991. Learning search control rules for planning: an inductive approach. In Proceedings of the International Workshop on Machine Learning. Morgan Kaufmann, 422–426.

    Google Scholar

    Martin M., Geffner H.2000. Learning generalized policies in planning using concept languages. In International Conference on Artificial Intelligence Planning Systems, AIPS00, Breckenridge, USA.

    Google Scholar

    Mcallester D., Givan R.1989. Taxonomic syntax for first order inference. Journal of the ACM40, 289–300.

    Google Scholar

    McCluskey T. L.1987. Combining weak learning heuristics in general problem solvers. In IJCAI'87: Proceedings of the 10th International Joint Conference on Artificial Intelligence. Morgan Kaufmann Publishers Inc., Milan, Italy, 331–333.

    Google Scholar

    McGann C., Py F., Rajan K., Ryan J., Henthorn R.2008. Adaptive control for autonomous underwater vehicles. In National Conference on Artificial Intelligence (AAAI'2008), Chicago, Illinois, USA.

    Google Scholar

    Mehta N., Natarajan S., Tadepalli P., Fern A.2008. Transfer in variable-reward hierarchical reinforcement learning. Machine Learning73(3), 289–312.

    Google Scholar

    Minton S.1988. Learning Effective Search Control Knowledge: An Explanation-Based Approach. Kluwer Academic Publishers.

    Google Scholar

    Mitchell T. M.1997. Machine Learning. McGraw-Hill.

    Google Scholar

    Mitchell T., Utgoff T., Banerji R.1982. Learning problem solving heuristics by experimentation. In Machine Learning: An Artificial Intelligence Approach, Michalski, R. S., Carbonell, J. G. & Michell, T. M. (eds). Morgan Kaufmann.

    Google Scholar

    Mourão K., Petrick R. P. A., Steedman M.2008. Using kernel perceptrons to learn action effects for planning. In Proceedings of the International Conference on Cognitive Systems (CogSys 2008), Karlsruhe, Germany.

    Google Scholar

    Mourão K., Petrick R. P. A., Steedman M.2010. Learning action effects in partially observable domains. In European Conference on Artificial Intelligence, Barcelona, Spain.

    Google Scholar

    Muggleton S.1995. Stochastic logic programs. In International Workshop on Inductive Logic Programming, Leuven, Belguim.

    Google Scholar

    Muise C., McIlraith S., Baier J. A., Reimer M.2009. Exploiting N-gram analysis to predict operator sequences. In 19th International Conference on Automated Planning and Scheduling (ICAPS), Thessaloniki, Greece.

    Google Scholar

    Muñoz-Avila Aha H., Breslow D. & Nau L.1999. HICAP: An interactive case based planning architecture and its application to noncombatant evacuation operations. In Conference on Innovative Applications of Artificial Intelligence, IAAI-99, Orlando, Florida, USA.

    Google Scholar

    Nau D. S., Smith S. J., Erol K.1998. Control strategies in htn planning: theory versus practice. In In AAAI-98/IAAI-98 Proceedings, Madison, Wisconsin. USA.

    Google Scholar

    Nau D., Ilghami O., Kuter U., Murdock J. W., Wu D., Yaman F.2003. SHOP2: an HTN planning system. Journal of Artificial Intelligence Research20, 379–404.

    Google Scholar

    Nayak P., Kurien J., Dorais G., Millar W., Rajan K., Kanefsky R.1999. Validating the DS-1 remote agent experiment. In Artificial Intelligence, Robotics and Automation in Space.

    Google Scholar

    Newton M. A. H., Levine J., Fox M., Long D.2007. Learning macro-actions for arbitrary planners and domains. In International Conference on Automated Planning and Scheduling, Providence, USA.

    Google Scholar

    Nilsson N. J.1984. Shakey the Robot. Technical Report 323, AI Center, SRI International.

    Google Scholar

    Oates T., Cohen P. R.1996. Searching for planning operators with context-dependent and probabilistic effects. In National Conference on Artificial Intelligence, Portland, Oregon, USA.

    Google Scholar

    Otterlo M. V.2009. The Logic of Adaptive Behavior: Knowledge Representation and Algorithms for Adaptive Sequential Decision Making under Uncertainty in First-Order and Relational Domains. IOS Press.

    Google Scholar

    Pasula H. M., Zettlemoyer L. S., Kaelbling L. P.2007. Learning symbolic models of stochastic domains. Journal of Artificial Intelligence Research29, 309–352.

    Google Scholar

    Porteous J., Sebastia L.2004. Ordered landmarks in planning. Journal of Artificial Intelligence Research22, 215–278.

    Google Scholar

    Quinlan J., Cameron-Jones R.1995. Introduction of logic programs: FOIL and related systems. NewGeneration Computing, Special issue on Inductive Logic Programming13(3–4), 287–312.

    Google Scholar

    Raedt L. D.2008. Logical and Relational Learning. Springer.

    Google Scholar

    Ramirez M., Geffner H.2009. Plan recognition as planning. In IJCAI'09: Proceedings of the 21st International Jont Conference on Artifical Intelligence, Pasadena, CA, USA.

    Google Scholar

    Ramirez M., Geffner H.2010. Probabilistic plan recognition using off-the-shelf classical planners. In National Conference on Artificial Intelligence (AAAI'2010), Atlanta, Georgia, USA.

    Google Scholar

    Reynolds S. I.2002. Reinforcement Learning with Exploration. PhD thesis, The University of Birmingham, UK.

    Google Scholar

    Richardson M., Domingos P.2006. Markov logic networks. Machine Learning62, 107–136.

    Google Scholar

    Rivest R. L.1987. Learning decision lists. Machine Learning2(3), 229–246.

    Google Scholar

    Sanner S., Boutilier C.2005. Approximate linear programming for first-order mdps. In Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence, Edinburgh, Scotland, UK, 509–517.

    Google Scholar

    Sanner S., Boutilier C.2006. Practical linear value-approximation techniques for first-order MDPs. In Proceedings of the 22nd Conference in Uncertainty in Artificial Intelligence, Cambridge, MA, USA.

    Google Scholar

    Sanner S., Kersting K.2010. Symbolic dynamic programming for first-order pomdps. In Proceedings of the 24th AAAI Conference on Artificial Intelligence (AAAI-10), Fox, M. & Poole, D. (eds). Atlanta, Georgia, USA, AAAI Press.

    Google Scholar

    Serina I.2010. Kernel functions for case-based planning. Artificial Intelligence174(16–17), 1369–1406.

    Google Scholar

    Shavlik J. W.1989. Acquiring recursive and iterative concepts with explanation-based learning. In Machine Learning.

    Google Scholar

    Shen W., Simon H. A.1989. Rule creation and rule learning through environmental exploration. In International Joint Conference on Artificial Intelligence, IJCAI-89, Detroit, Michigan, USA, 675–680.

    Google Scholar

    Srivastava S., Immerman N., Zilberstein S.2008. Learning generalized plans using abstract counting. In National Conference on Artificial Intelligence (AAAI'2008), Chicago, Illinois, USA.

    Google Scholar

    Strehl A. L., Littman M. L.2005. A theoretical analysis of model-based interval estimation. In Proceedings of the 21nd International Conference on Machine Learning (ICML-05), Bonn, Germany, 857–864.

    Google Scholar

    Sutton R. S., Barto A. G.1998. Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning). The MIT Press.

    Google Scholar

    Taylor M. E., Stone P.2007. Cross-domain transfer for reinforcement learning. In International Conference on Machine Learning, ICML, Corvallis, OR, USA.

    Google Scholar

    Theocharous G., Kaelbling L. P.2003. Approximate planning in POMDPs with macro-actions. In Proceedings of Advances in Neural Information Processing Systems 16, Whistler, Canada.

    Google Scholar

    Thiébaux S., Hoffmann J., Nebel B.2005. In defense of PDDL axioms. Artificial Intelligence168(1–2), 38–69.

    Google Scholar

    Veloso M. M., Carbonell J. G.1993. Derivational analogy in prodigy: automating case acquisition, storage, and utilization. Machine Learning10, 249–278.

    Google Scholar

    Veloso M. M., Pérez M. A., Carbonell J. G.1990. Nonlinear planning with parallel resource allocation. In Proceedings of the DARPA Workshop on Innovative Approaches to Planning, Scheduling, and Control, San Diego, CA, USA, Morgan Kaufmann, 207–212.

    Google Scholar

    Vrakas D., Tsoumakas G., Bassiliades N., Vlahavas I. P.2005. HAPRC: an automatically configurable planning system. AI Communications18(1), 41–60.

    Google Scholar

    Walsh T. J., Littman M. L.2008. Efficient learning of action schemas and web-service descriptions. In AAAI'08: Proceedings of the 23rd National Conference on Artificial Intelligence, Chicago, Illinois, USA. AAAI Press, 714–719.

    Google Scholar

    Wang X.1994. Learning planning operators by observation and practice. In International Conference on AI Planning Systems, AIPS-94, Chicago, Illinois, USA.

    Google Scholar

    Wang C., Joshi S., Khardon R.2007. First order decision diagrams for relational MDPs. In International Joint Conference on Artificial Intelligence, IJCAI-07, Hyderabad, India.

    Google Scholar

    Watkins C. J. C. H.1989. Learning from Delayed Rewards. PhD thesis, King's College, Oxford.

    Google Scholar

    Wiering M.1999. Explorations in efficient reinforcement learning. PhD thesis, University of Amsterdam IDSIA, the Netherlands.

    Google Scholar

    Winner E., Veloso M.2003. DISTILL: towards learning domain-specific planners by example. In International Conference on Machine Learning, ICML'03, Washington, DC, USA.

    Google Scholar

    Xu Y., Fern A., Yoon S. W.2007. Discriminative learning of beam-search heuristics for planning. In International Joint Conference on Artificial Intelligence, Hyderabad, India.

    Google Scholar

    Yang Q., Wu K., Jiang Y.2007. Learning action models from plan traces using weighted MAX-SAT. Artificial Intelligence Journal171, 107–143.

    Google Scholar

    Yoon S., Kambhampati S.2007. Towards model-lite planning: a proposal for learning and planning with incomplete domain models. In ICAPS2007 Workshop on Artificial Intelligence Planning and Learning, Providence, USA.

    Google Scholar

    Yoon S., Fern A., Givan R.2002. Inductive policy selection for first-order MDPs. In Conference on Uncertainty in Artificial Intelligence, UAI02, Alberta, Edmonton, Canada.

    Google Scholar

    Yoon S., Fern A., Givan R.2006. Learning heuristic functions from relaxed plans. In International Conference on Automated Planning and Scheduling (ICAPS-2006), Cumbria, UK.

    Google Scholar

    Yoon S., Fern A., Givan R.2007. Using learned policies in heuristic-search planning. In International Joint Conference on Artificial Intelligence, Hyderabad, India.

    Google Scholar

    Yoon S., Fern A., Givan R.2008. Learning control knowledge for forward search planning. Journal of Machine Learning Research9, 683–718.

    Google Scholar

    Younes H., Littman M. L., Weissman D., Asmuth J.2005. The first probabilistic track of the international planning competition. Journal of Artificial Intelligence Research24, 851–887.

    Google Scholar

    Zelle J., Mooney R.1993. Combining FOIL and EBG to speed-up logic programs. In International Joint Conference on Artificial Intelligence. IJCAI-93, Chambéry, France.

    Google Scholar

    Zhuo H., Li L., Yang Q., Bian R.2008. Learning action models with quantified conditional effects for software requirement specification. In ICIC '08: Proceedings of the 4th International Conference on Intelligent Computing, Shanghai, China. Springer-Verlag, 874–881.

    Google Scholar

    Zimmerman T., Kambhampati S.2003. Learning-assisted automated planning: looking back, taking stock, going forward. AI Magazine24, 73–96.

    Google Scholar

  • Cite this article

    Sergio Jiménez, Tomás De La Rosa, Susana Fernández, Fernando Fernández, Daniel Borrajo. 2012. A review of machine learning for automated planning. The Knowledge Engineering Review 27(4)433−467, doi: 10.1017/S026988891200001X
    Sergio Jiménez, Tomás De La Rosa, Susana Fernández, Fernando Fernández, Daniel Borrajo. 2012. A review of machine learning for automated planning. The Knowledge Engineering Review 27(4)433−467, doi: 10.1017/S026988891200001X

Article Metrics

Article views(17) PDF downloads(147)

RESEARCH ARTICLE   Open Access    

A review of machine learning for automated planning

The Knowledge Engineering Review  27 2012, 27(4): 433−467  |  Cite this article

Abstract: Abstract: Recent discoveries in automated planning are broadening the scope of planners, from toy problems to real applications. However, applying automated planners to real-world problems is far from simple. On the one hand, the definition of accurate action models for planning is still a bottleneck. On the other hand, off-the-shelf planners fail to scale-up and to provide good solutions in many domains. In these problematic domains, planners can exploit domain-specific control knowledge to improve their performance in terms of both speed and quality of the solutions. However, manual definition of control knowledge is quite difficult. This paper reviews recent techniques in machine learning for the automatic definition of planning knowledge. It has been organized according to the target of the learning process: automatic definition of planning action models and automatic definition of planning control knowledge. In addition, the paper reviews the advances in the related field of reinforcement learning.

    • http://www.icaps-conference.org/index.php/Main/Competitions

    • Here, the concept of Logic program is used in a narrower sense generally applied, that is, as a conjunction of disjunctions of literals where each disjunction presents at most one positive literal. This type of disjunction is called a Horn clause.

    • The concept of well-placed should be given in the domain theory. A block is considered well-placed when it is on the right block (or table) with respect to the goal, and all blocks beneath it are well-placed too.

    • Copyright © Cambridge University Press 20122012Cambridge University Press
References (148)
  • About this article
    Cite this article
    Sergio Jiménez, Tomás De La Rosa, Susana Fernández, Fernando Fernández, Daniel Borrajo. 2012. A review of machine learning for automated planning. The Knowledge Engineering Review 27(4)433−467, doi: 10.1017/S026988891200001X
    Sergio Jiménez, Tomás De La Rosa, Susana Fernández, Fernando Fernández, Daniel Borrajo. 2012. A review of machine learning for automated planning. The Knowledge Engineering Review 27(4)433−467, doi: 10.1017/S026988891200001X
  • Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return