Search
2016 Volume 31
Article Contents
RESEARCH ARTICLE   Open Access    

Combining reward shaping and hierarchies for scaling to large multiagent systems

More Information
  • Abstract: Coordinating the actions of agents in multiagent systems presents a challenging problem, especially as the size of the system is increased and predicting the agent interactions becomes difficult. Many approaches to improving coordination within multiagent systems have been developed including organizational structures, shaped rewards, coordination graphs, heuristic methods, and learning automata. However, each of these approaches still have inherent limitations with respect to coordination and scalability. We explore the potential of synergistically combining existing coordination mechanisms such that they offset each others’ limitations. More specifically, we are interested in combining existing coordination mechanisms in order to achieve improved performance, increased scalability, and reduced coordination complexity in large multiagent systems.In this work, we discuss and demonstrate the individual limitations of two well-known coordination mechanisms. We then provide a methodology for combining the two coordination mechanisms to offset their limitations and improve performance over either method individually. In particular, we combine shaped difference rewards and hierarchical organization in the Defect Combination Problem with up to 10 000 sensing agents. We show that combining hierarchical organization with difference rewards can improve both coordination and scalability by decreasing information overhead, structuring agent-to-agent connectivity and control flow, and improving the individual decision-making capabilities of agents. We show that by combining hierarchies and difference rewards, the information overheads and computational requirements of individual agents can be reduced by as much as 99% while simultaneously increasing the overall system performance. Additionally, we demonstrate the robustness of this approach to handling up to 25% agent failures under various conditions.
  • 加载中
  • Agogino A., HolmesParker C. & Tumer K.2012. Evolving large scale UAV communication system. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), Philadelphia, PA, July.

    Google Scholar

    Agogino A. & Tumer K.2008. Analyzing and visualizing multi-agent rewards in dynamic and stochastic domains. Journal of Autonomous Agents and Multi-Agent Systems (JAAMAS)17(2), 320–338.

    Google Scholar

    Barrett S., Stone P. & Kraus S.2011. Empirical evaluation of ad hoc teamwork in the pursuit domain. In Proceedings of 11th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2011), May.

    Google Scholar

    Bharathidasan A. & Ponduru V.2003. Sensor networks – an overview. IEEE Potentials.

    Google Scholar

    Challet D. & Johnson N.2002. Optimal combination of imperfect objects. Physics Review Letters89, 028071.

    Google Scholar

    Devlin S. & Kudenko D.2011. Theoretical considerations of potential-based reward shaping for multi-agent systems. In Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS).

    Google Scholar

    Farinelli A., Rogers A. & Jennings N.2008. Maximising sensor network efficiency through agent-based coordination of sense/sleep schedules. In Workshop on Energy in Wireless Sensor Networks.

    Google Scholar

    Grzes M. & Kudenko D.2010. Online learning of shaping rewards in reinforcement learning. Neural Networks23, 541–550.

    Google Scholar

    Hayden S., Carrick C. & Yang Q.1999. A catalog of agent coordination patterns. In Proceedings of the 3rd Annual Conference on Autonomous Agents.

    Google Scholar

    HolmesParker C., Agogino A. & Tumer K.2012. Evolving distributed resource sharing for cubesat constellations. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), Philadelphia, PA, July 2012.

    Google Scholar

    HolmesParker C., Agogino A. & Tumer K.2013. Exploiting structure and utilizing agent-centric rewards to promote coordination in large multiagent systems (extended-abstract). In Proceedings of the 12th International Conference on Autonomous Agents and Multiagent Systems (AAMAS).

    Google Scholar

    Horling B. & Lesser V.2005. A survey of multiagent organizational paradigms. Knowledge Engineering Review19(4), 281–316.

    Google Scholar

    Horling B., Mailler R. & Lesser V.2004. A case study of organizational effects in a distributed sensor network. In Proceedings of the International Conference on Intelligent Agent Technology.

    Google Scholar

    Howley E. & Duggan J.2011. Investing in the commons: a study of openness and the emergence of cooperation. Advances in Complex Systems14.

    Google Scholar

    Knudson M. & Tumer K.2010. Coevolution of heterogeneous multi-robot teams. In Genetic and Evolutionary Computation Conference (GECCO).

    Google Scholar

    Kok J. & Vlassis N.2006. Collaborative multiagent reinforcement learning by payoff propagation. Journal of Machine Learning Research (JMLR)7, 1789–1828.

    Google Scholar

    Mehta N., Ray S., Tadepalli P. & Dietterich T.2008. Automatic discovery and transfer of maxq hierarchies. In Proceedings of the 25th International Conference on Machine Learning (ICML).

    Google Scholar

    Ng A., Harada D. & Russell S.1999. Policy invariance under reward transformations: theory and application to reward shaping. In Proceedings of International Conference on Machine Learning.

    Google Scholar

    Panait L. & Luke S.2005. Cooperative multi-agent learning – the state of the art. Journal of Autonomous Agents and MultiAgent Systems (JAAMAS)11(3), 387–434.

    Google Scholar

    Rogers A., Farinelli A. & Jennings N.2010. Self-organising sensors for wide area surveillance using the max-sum algorithm. In Self-Organizing Architectures, 6090, 84–100. Lecture Notes in Computer Science, Springer.

    Google Scholar

    Sutton R. & Barto A.1998. Reinforcement Learning An Introduction. MIT Press.

    Google Scholar

    Tambe M., Bowring E., Jung H., Kaminka G., Maheswaran R., Marecki J., Modi P., Nair R., Okamoto S., Pearce J., Paruchuri P., Pynadath D., Scerri P., Schurr N. & Varakantham P.2005. Conflicts in teamwork – hybrids to the rescue. In Proceedings of the 4th International Conference on Autonomous Agents and Multiagent Systems (AAMAS).

    Google Scholar

    Tham C. & Renaud J.2005. Multi-agent systems on sensor networks a distributed reinforcement learning approach. In Intelligent Sensors, Sensor Networks and Information Processing Conference (ISSNIP).

    Google Scholar

    Tumer K.2005. Designing agent utilities for coordinated, scalable, and robust multiagent systems. In Challenges in the Coordination of Large Scale Multiagent Systems, P. Scerri, R. Mailler & R. Vincent (eds). Springer, 173–188.

    Google Scholar

    Vinyals M., Rodriguez-Aguilar J. & Cerquides J.2010. A survey on sensor networks from a multiagent perspective. The Computer Journal.

    Google Scholar

    Vrancx P., Verbeeck K. & Nowe A.2008. Decentralized learning in Markov games. IEEE Transactions on Systems, Man, and Cybernetics – Part B: Cybernetics38(4), 976–981.

    Google Scholar

    Williamson S., Gerding E. & Jennings N.2009. Reward shaping for valuing communications during multi-agent coordination. In Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS).

    Google Scholar

    Wolpert D. H. & Tumer K.2001. Optimal payoff functions for members of collectives. Advances in Complex Systems4(2/3), 265–279.

    Google Scholar

    Xu Y., Scerri P., Yu B., Okamoto S., Lewis M. & Sycara K.2005. An integrated token-based algorithm for scalable coordination. In Proceedings of the 4th International Conference on Autonomous Agents and Multiagent Systems (AAMAS).

    Google Scholar

    Zhang C., Abdallah S. & Lesser V.2009. Integrating organizational control into multi-agent learning. In Procceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS).

    Google Scholar

  • Cite this article

    Chris HolmesParker, Adrian K. Agogino, Kagan Tumer. 2016. Combining reward shaping and hierarchies for scaling to large multiagent systems. The Knowledge Engineering Review 31(1)3−18, doi: 10.1017/S0269888915000156
    Chris HolmesParker, Adrian K. Agogino, Kagan Tumer. 2016. Combining reward shaping and hierarchies for scaling to large multiagent systems. The Knowledge Engineering Review 31(1)3−18, doi: 10.1017/S0269888915000156

Article Metrics

Article views(22) PDF downloads(5)

Other Articles By Authors

RESEARCH ARTICLE   Open Access    

Combining reward shaping and hierarchies for scaling to large multiagent systems

The Knowledge Engineering Review  31 2016, 31(1): 3−18  |  Cite this article

Abstract: Abstract: Coordinating the actions of agents in multiagent systems presents a challenging problem, especially as the size of the system is increased and predicting the agent interactions becomes difficult. Many approaches to improving coordination within multiagent systems have been developed including organizational structures, shaped rewards, coordination graphs, heuristic methods, and learning automata. However, each of these approaches still have inherent limitations with respect to coordination and scalability. We explore the potential of synergistically combining existing coordination mechanisms such that they offset each others’ limitations. More specifically, we are interested in combining existing coordination mechanisms in order to achieve improved performance, increased scalability, and reduced coordination complexity in large multiagent systems.In this work, we discuss and demonstrate the individual limitations of two well-known coordination mechanisms. We then provide a methodology for combining the two coordination mechanisms to offset their limitations and improve performance over either method individually. In particular, we combine shaped difference rewards and hierarchical organization in the Defect Combination Problem with up to 10 000 sensing agents. We show that combining hierarchical organization with difference rewards can improve both coordination and scalability by decreasing information overhead, structuring agent-to-agent connectivity and control flow, and improving the individual decision-making capabilities of agents. We show that by combining hierarchies and difference rewards, the information overheads and computational requirements of individual agents can be reduced by as much as 99% while simultaneously increasing the overall system performance. Additionally, we demonstrate the robustness of this approach to handling up to 25% agent failures under various conditions.

    • This work was partially supported by the National Science Foundation under grant 0931591 and the National Energy Technology Laboratory under grant DE-FE0000857.

    • Allowing all agents to begin learning simultaneously created a “spike” into the system which significantly slowed down learning. The gradual introduction of the learning agents is softens this discontinuity in learning (Tumer, 2005).

    • © Cambridge University Press, 2016 2016Cambridge University Press
References (30)
  • About this article
    Cite this article
    Chris HolmesParker, Adrian K. Agogino, Kagan Tumer. 2016. Combining reward shaping and hierarchies for scaling to large multiagent systems. The Knowledge Engineering Review 31(1)3−18, doi: 10.1017/S0269888915000156
    Chris HolmesParker, Adrian K. Agogino, Kagan Tumer. 2016. Combining reward shaping and hierarchies for scaling to large multiagent systems. The Knowledge Engineering Review 31(1)3−18, doi: 10.1017/S0269888915000156
  • Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return