Artificial Intelligence Lab, Vrije Universiteit Brussel, Pleinlaan 2, Brussels1050, Belgium, e-mails: roxana.radulescu@vub.be, ann.nowe@vub.be"/> School of Computer Science, National University of Ireland Galway, GalwayH91 TK33, Ireland, e-mail: patrick.mannion@nuigalway.ie"/> Universiteit van Amsterdam, Amsterdam, The Netherlands, e-mail: yijie.zhang@student.uva.nl"/> Microsystems Technology, HU University of Applied Sciences Utrecht, Heidelberglaan 15, 3584CSUtrecht, The Netherlands, e-mail: diederik.yamamoto-roijers@hu.nl"/>
Search
2020 Volume 35
Article Contents
ORIGINAL RESEARCH   Open Access    

A utility-based analysis of equilibria in multi-objective normal-form games

  • This article extends an earlier unpublished paper (Rădulescu et al., 2019) that was originally presented at the Adaptive and Learning Agents Workshop 2019.

More Information
  • Abstract: In multi-objective multi-agent systems (MOMASs), agents explicitly consider the possible trade-offs between conflicting objective functions. We argue that compromises between competing objectives in MOMAS should be analyzed on the basis of the utility that these compromises have for the users of a system, where an agent’s utility function maps their payoff vectors to scalar utility values. This utility-based approach naturally leads to two different optimization criteria for agents in a MOMAS: expected scalarized returns (ESRs) and scalarized expected returns (SERs). In this article, we explore the differences between these two criteria using the framework of multi-objective normal-form games (MONFGs). We demonstrate that the choice of optimization criterion (ESR or SER) can radically alter the set of equilibria in a MONFG when nonlinear utility functions are used.
  • 加载中
  • Arifovic , J., Boitnott , J. F. & Duffy , J.2016. Learning correlated equilibria: an evolutionary approach. Journal of Economic Behavior & Organization 157, 171–190.

    Google Scholar

    Aumann , R. J.1974. Subjectivity and correlation in randomized strategies. Journal of Mathematical Economics1(1), 67–96.

    Google Scholar

    Aumann , R. J.1987. Correlated equilibrium as an expression of bayesian rationality. Econometrica: Journal of the Econometric Society 1, 1–18.

    Google Scholar

    Bergstresser , K. and Yu , P.1977. Domination structures and multicriteria problems in n-person games. Theory and Decision8(1), 5–48.

    Google Scholar

    Blackwell , D.et al.1956. An analog of the minimax theorem for vector payoffs. Pacific Journal of Mathematics6(1), 1–8.

    Google Scholar

    Borm , P., Tijs , S. & van den Aarssen , J.1990. Pareto equilibria in multi-objective games. Methods of Operations Research60, 303–312.

    Google Scholar

    Colby , M. & Tumer , K.2015. An evolutionary game theoretic analysis of difference evaluation functions. In Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, 1391–1398. ACM.

    Google Scholar

    Devlin , S. & Kudenko , D.2011. Theoretical considerations of potential-based reward shaping for multi-agent systems. In Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 225–232.

    Google Scholar

    Foster , D. P. & Vohra , R.1999. Regret in the on-line decision problem. Games and Economic Behavior29 (1–2), 7–35.

    Google Scholar

    Fudenberg , D. & Kreps , D. M.1993. Learning mixed equilibria. Games and Economic Behavior5 (3), 320–367. ISSN 0899-8256.

    Google Scholar

    Hart , S. & Schmeidler , D.1989. Existence of correlated equilibria. Mathematics of Operations Research14(1), 18–25.

    Google Scholar

    Igarashi , A. & Roijers , D. M.2017. Multi-criteria coalition formation games. In International Conference on Algorithmic DecisionTheory, 197–213. Springer.

    Google Scholar

    Jensen , J. L. W. V.et al.1906. Sur les fonctions convexes et les inégalités entre les valeurs moyennes. Actamathematica30, 175–193.

    Google Scholar

    Lozan , V. & Ungureanu , V.2013. Computing the pareto-nash equilibrium set in finite multi-objective mixed-strategy games. Computer Science Journal of Moldova, 21 (2).

    Google Scholar

    Lozovanu , D., Solomon , D. & Zelikovsky , A.2005. Multiobjective games and determining pareto-nashequilibria. Buletinul Academiei de Ştiinţe a Republicii Moldova. Matematica, (3), 115–122.

    Google Scholar

    Mannion , P., Devlin , S., Duggan , J. & Howley , E.2018. Reward shaping for knowledge-based multi-objective multi-agent reinforcement learning. The Knowledge Engineering Review33, e23.

    Google Scholar

    Mannion , P., Devlin , S., Mason , K., Duggan , J. & Howley , E.2017a. Policy invariance under reward transformations for multi-objective reinforcement learning. Neurocomputing263, 60–73.

    Google Scholar

    Mannion , P., Duggan , J. & Howley , E.2016a. An experimental review of reinforcement learning algorithms for adaptive traffic signal control. In Autonomic Road Transport Support Systems, McCluskey , L. T., Kotsialos , A., Müller , P. J., Klügl , F., Rana , O. & Schumann , R. (eds), 47–66. Springer International Publishing.

    Google Scholar

    Mannion , P., Duggan , J. & Howley , E.2017b. A theoretical and empirical analysis of reward transformations in multi-objective stochastic games. In Proceedings of the 16th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2017b.

    Google Scholar

    Mannion , P., Mason , K., Devlin , S., Duggan , J. & Howley , E. 2016b. Multi-objective dynamic dispatch optimisation using multi-agent reinforcement learning. In Proceedings of the 15th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2016b.

    Google Scholar

    Mossalam , H., Assael , Y. M., Roijers , D. M. & Whiteson , S.2016. Multi-objective deep reinforcement learning. In NIPS Workshop on Deep Reinforcement Learning.

    Google Scholar

    Nash , J.1950. Equilibrium points in n-person games. Proceedings of the National Academy of Sciences36(1), 48–49. ISSN 0027-8424.

    Google Scholar

    Nash , J.1951. Non-cooperative games. Annals of Mathematics54(2), 286–295.

    Google Scholar

    Papadimitriou , C. H. & Roughgarden , T.2008. Computing correlated equilibria in multi-player games. Journal of the ACM (JACM)55(3), 14.

    Google Scholar

    Rădulescu , R., Legrand , M., Efthymiadis , K., Roijers , D. M. & Nowé , A.2018. Deep multi-agent reinforcement learning in a homogeneous open population. In Proceedings of the 30th Benelux Conference on Artificial Intelligence (BNAIC 2018), 177–191.

    Google Scholar

    Rădulescu , R., Mannion , P., Roijers , D. & Nowé , A.2019. Equilibria in multi-objective games: a utility-based perspective. In Adaptive and Learning Agents Workshop (at AAMAS 2019), May 2019.

    Google Scholar

    Rădulescu , R., Mannion , P., Roijers , D. M. and Nowé , A.2020. Multi-objective multi-agent decision making: a utility-based analysis and survey. Autonomous Agents and Multi-Agent Systems34 (10).

    Google Scholar

    Reymond , M., Patyn , C., Rădulescu , R., Deconinck , G. & Nowé , A.2018. Reinforcement learning for demand response of domestic household appliances. In Proceedings of the Adaptive and Learning Agents Workshop at FAIM 2018.

    Google Scholar

    Roijers , D. M.2016. Multi-Objective Decision-Theoretic Planning. PhD thesis, University of Amsterdam.

    Google Scholar

    Roijers , D. M., Steckelmacher , D. & Nowé , A.2018. Multi-objective reinforcement learning for the expected utility of the return. In Proceedings of the Adaptive and Learning Agents Workshop at FAIM 2018.

    Google Scholar

    Roijers , D. M., Vamplew , P., Whiteson , S. & Dazeley , R.2013. A survey of multi-objective sequential decision-making. Journal of Artificial Intelligence Research48, 67–113.

    Google Scholar

    Roijers , D. M. & Whiteson , S.2017. Multi-objective decision making. Synthesis Lectures on Artificial Intelligence and Machine Learning11(1), 1–129.

    Google Scholar

    Shapley , L. S. & Rigby , F. D.1959. Equilibrium points in games with vector payoffs. Naval Research Logistics Quarterly6 (1), 57–61.

    Google Scholar

    Talpert , V., Sobh , I., Kiran , B. R., Mannion , P., Yogamani , S., El-Sallab , A. & Perez , P.2019. Exploring applications of deep reinforcement learning for real-world autonomous driving systems. In International Conference on Computer Vision Theory and Applications (VISAPP), February 2019.

    Google Scholar

    Vamplew , P., Dazeley , R., Berry , A., Issabekov , R. & Dekker , E.2011. Empirical evaluation methods for multiobjective reinforcement learning algorithms. Machine Learning84 (1–2), 51–80.

    Google Scholar

    Van Moffaert , K. & Nowé , A.2014. Multi-objective reinforcement learning using sets of pareto dominating policies. The Journal of Machine Learning Research15(1), 3483–3512.

    Google Scholar

    Virtanen , P., Gommers , R., Oliphant , T. E., Haberland , M., Reddy , T., Cournapeau , D., Burovski , E., Peterson , P., Weckesser , W., Bright , J., van der Walt , S. J., Brett , M., Wilson , J., Jarrod Millman , K., Mayorov , N., Nelson , A. R. J., Jones , E., Kern , R., Larson , E., Carey , C., Polat , İ., Feng , Y., Moore , E. W., Vand erPlas, J., Laxalde , D., Perktold , J., Cimrman , R., Henriksen , I., Quintero , E. A., Harris , C. R., Archibald , A. M., Ribeiro , A. H., Pedregosa , F., van Mulbregt , P. & Contributors , S.2019. SciPy 1.0–Fundamental Algorithms for Scientific Computing in Python. arXiv e-prints, art. arXiv:1907.10121, July 2019.

    Google Scholar

    Voorneveld , M., Vermeulen , D. & Borm , P.1999. Axiomatizations of paretoequilibria in multicriteria games. Games and Economic Behavior280 (1), 146–154.

    Google Scholar

    Walraven , E. & Spaan , M. T. J.2016. Planning under uncertainty for aggregated electric vehicle charging with renewable energy supply. In Proceedings of the European Conference on Artificial Intelligence, 904–912.

    Google Scholar

    Watkins , C. J. C. H. Learning from Delayed Rewards. PhD thesis, King’s College, Cambridge, UK, 1989.

    Google Scholar

    Wierzbicki , A. P.1995. Multiple criteria games – theory and applications. Journal of Systems Engineering and Electronics60 (2), 65–81.

    Google Scholar

    Wiggers , A. J., Oliehoek , F. A. & Roijers , D. M.2016. Structure in the value function of two-player zero-sum games of incomplete information. In Proceedings of the Twenty-second European Conference on Artificial Intelligence, 1628–1629. IOS Press.

    Google Scholar

    Yliniemi , L., Agogino , A. K. & Tumer , K.2015. Simulation of the introduction of new technologies in air traffic management. Connection Science270 (3), 269–287.

    Google Scholar

    Yliniemi , L. & Tumer , K.2016. Multi-objective multiagent credit assignment in reinforcement learning and nsga-ii. Soft Computing200 (10), 3869–3887.

    Google Scholar

    Zhang , Y., Rădulescu , R., Mannion , P., Roijers , D. M. & Nowé , A.2020. Opponent modelling for reinforcement learning in multi-objective normal form games. In Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2020), May 2020.

    Google Scholar

    Zinkevich , M., Greenwald , A. & Littman , M. L.2006. Cyclic equilibria in markov games. In Advances in Neural Information Processing Systems, 1641–1648.

    Google Scholar

    Zintgraf , L. M., Kanters , T. V., Roijers , D. M., Oliehoek , F. A. & Beau , P.2015. Quality assessment of MORL algorithms: a utility-based approach. In Benelearn 2015: Proceedings of the Twenty-Fourth Belgian-Dutch Conference on Machine Learning.

    Google Scholar

    Zintgraf , L. M., Roijers , D. M., Linders , S., Jonker , C. M. & Nowé , A.2018. Ordered preference elicitation strategies for supporting multi-objective decision making. In Proceedings of the 17th International Conference on Autonomous Agents and Multi-Agent Systems, 1477–1485. International Foundation for Autonomous Agents and Multiagent Systems.

    Google Scholar

  • Cite this article

    Roxana Rădulescu, Patrick Mannion, Yijie Zhang, Diederik M. Roijers, Ann Nowé. 2020. A utility-based analysis of equilibria in multi-objective normal-form games. The Knowledge Engineering Review 35(1), doi: 10.1017/S0269888920000351
    Roxana Rădulescu, Patrick Mannion, Yijie Zhang, Diederik M. Roijers, Ann Nowé. 2020. A utility-based analysis of equilibria in multi-objective normal-form games. The Knowledge Engineering Review 35(1), doi: 10.1017/S0269888920000351

Article Metrics

Article views(64) PDF downloads(110)

ORIGINAL RESEARCH   Open Access    

A utility-based analysis of equilibria in multi-objective normal-form games

Abstract: Abstract: In multi-objective multi-agent systems (MOMASs), agents explicitly consider the possible trade-offs between conflicting objective functions. We argue that compromises between competing objectives in MOMAS should be analyzed on the basis of the utility that these compromises have for the users of a system, where an agent’s utility function maps their payoff vectors to scalar utility values. This utility-based approach naturally leads to two different optimization criteria for agents in a MOMAS: expected scalarized returns (ESRs) and scalarized expected returns (SERs). In this article, we explore the differences between these two criteria using the framework of multi-objective normal-form games (MONFGs). We demonstrate that the choice of optimization criterion (ESR or SER) can radically alter the set of equilibria in a MONFG when nonlinear utility functions are used.

    • We thank Bart Bogaerts for his useful feedback and discussions on this work. This research received funding from the Flemish Government under the ‘Onderzoeksprogramma Artificiële Intelligentie (AI) Vlaandere’ program.

    • A vector whose coordinates are all non-negative and sum up to 1.

    • While the original paper refers to this type of equilibrium as ‘Pareto’, we note that Pareto is a too loose domination concept when considering only linear utility functions and would prefer ‘Convex’ in this case. For consistency, however, we keep the original term.

    • As is the case for single-agent decision problems (Roijers et al.2013; Roijers, 2016).

    • Please note that this is a monotonically increasing payoff function for positive-only payoffs. In the case of negative payoffs, we can set the utility to 0 as soon as the payoff value for one of the objectives becomes negative.

    • A complete implementation can be found here: https://github.com/rradules/equilibria_monfg.

    • We note that specialized algorithms exist to learn mixed strategy Nash equilibria (e.g., Fudenberg and Kreps, 1993) or correlated equilibria (e.g., Arifovic et al., 2016) in single-objective MAS. We leave the design and empirical evaluation of versions of these algorithms for learning or approximating equilibria in MOMAS under SER for future work.

    • This nonlinear optimization problem is solved using the ‘optimize’ module of the Scipy Python package (Virtanen et al., 2019).

    • This article extends an earlier unpublished paper (Rădulescu et al., 2019) that was originally presented at the Adaptive and Learning Agents Workshop 2019.

    • © The Author(s), 2020. Published by Cambridge University Press2020Cambridge University Press
References (48)
  • About this article
    Cite this article
    Roxana Rădulescu, Patrick Mannion, Yijie Zhang, Diederik M. Roijers, Ann Nowé. 2020. A utility-based analysis of equilibria in multi-objective normal-form games. The Knowledge Engineering Review 35(1), doi: 10.1017/S0269888920000351
    Roxana Rădulescu, Patrick Mannion, Yijie Zhang, Diederik M. Roijers, Ann Nowé. 2020. A utility-based analysis of equilibria in multi-objective normal-form games. The Knowledge Engineering Review 35(1), doi: 10.1017/S0269888920000351
  • Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return