doi:10.1017/S0269888920000351

Arifovic , J., Boitnott , J. F. & Duffy , J.2016. Learning correlated equilibria: an evolutionary approach. Journal of Economic Behavior & Organization 157, 171–190.

Aumann , R. J.1974. Subjectivity and correlation in randomized strategies. Journal of Mathematical Economics1(1), 67–96.

Aumann , R. J.1987. Correlated equilibrium as an expression of bayesian rationality. Econometrica: Journal of the Econometric Society 1, 1–18.

Bergstresser , K. and Yu , P.1977. Domination structures and multicriteria problems in n-person games. Theory and Decision8(1), 5–48.

Blackwell , D.et al.1956. An analog of the minimax theorem for vector payoffs. Pacific Journal of Mathematics6(1), 1–8.

Borm , P., Tijs , S. & van den Aarssen , J.1990. Pareto equilibria in multi-objective games. Methods of Operations Research60, 303–312.

Colby , M. & Tumer , K.2015. An evolutionary game theoretic analysis of difference evaluation functions. In Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, 1391–1398. ACM.

Devlin , S. & Kudenko , D.2011. Theoretical considerations of potential-based reward shaping for multi-agent systems. In Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 225–232.

Foster , D. P. & Vohra , R.1999. Regret in the on-line decision problem. Games and Economic Behavior29 (1–2), 7–35.

Fudenberg , D. & Kreps , D. M.1993. Learning mixed equilibria. Games and Economic Behavior5 (3), 320–367. ISSN 0899-8256.

Hart , S. & Schmeidler , D.1989. Existence of correlated equilibria. Mathematics of Operations Research14(1), 18–25.

Igarashi , A. & Roijers , D. M.2017. Multi-criteria coalition formation games. In International Conference on Algorithmic DecisionTheory, 197–213. Springer.

Jensen , J. L. W. V.et al.1906. Sur les fonctions convexes et les inégalités entre les valeurs moyennes. Actamathematica30, 175–193.

Lozan , V. & Ungureanu , V.2013. Computing the pareto-nash equilibrium set in finite multi-objective mixed-strategy games. Computer Science Journal of Moldova, 21 (2).

Lozovanu , D., Solomon , D. & Zelikovsky , A.2005. Multiobjective games and determining pareto-nashequilibria. Buletinul Academiei de Ştiinţe a Republicii Moldova. Matematica, (3), 115–122.

Mannion , P., Devlin , S., Duggan , J. & Howley , E.2018. Reward shaping for knowledge-based multi-objective multi-agent reinforcement learning. The Knowledge Engineering Review33, e23.

Mannion , P., Devlin , S., Mason , K., Duggan , J. & Howley , E.2017a. Policy invariance under reward transformations for multi-objective reinforcement learning. Neurocomputing263, 60–73.

Mannion , P., Duggan , J. & Howley , E.2016a. An experimental review of reinforcement learning algorithms for adaptive traffic signal control. In Autonomic Road Transport Support Systems, McCluskey , L. T., Kotsialos , A., Müller , P. J., Klügl , F., Rana , O. & Schumann , R. (eds), 47–66. Springer International Publishing.

Mannion , P., Duggan , J. & Howley , E.2017b. A theoretical and empirical analysis of reward transformations in multi-objective stochastic games. In Proceedings of the 16th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2017b.

Mannion , P., Mason , K., Devlin , S., Duggan , J. & Howley , E. 2016b. Multi-objective dynamic dispatch optimisation using multi-agent reinforcement learning. In Proceedings of the 15th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2016b.

Mossalam , H., Assael , Y. M., Roijers , D. M. & Whiteson , S.2016. Multi-objective deep reinforcement learning. In NIPS Workshop on Deep Reinforcement Learning.

Nash , J.1950. Equilibrium points in n-person games. Proceedings of the National Academy of Sciences36(1), 48–49. ISSN 0027-8424.

Nash , J.1951. Non-cooperative games. Annals of Mathematics54(2), 286–295.

Papadimitriou , C. H. & Roughgarden , T.2008. Computing correlated equilibria in multi-player games. Journal of the ACM (JACM)55(3), 14.

Rădulescu , R., Legrand , M., Efthymiadis , K., Roijers , D. M. & Nowé , A.2018. Deep multi-agent reinforcement learning in a homogeneous open population. In Proceedings of the 30th Benelux Conference on Artificial Intelligence (BNAIC 2018), 177–191.

Rădulescu , R., Mannion , P., Roijers , D. & Nowé , A.2019. Equilibria in multi-objective games: a utility-based perspective. In Adaptive and Learning Agents Workshop (at AAMAS 2019), May 2019.

Rădulescu , R., Mannion , P., Roijers , D. M. and Nowé , A.2020. Multi-objective multi-agent decision making: a utility-based analysis and survey. Autonomous Agents and Multi-Agent Systems34 (10).

Reymond , M., Patyn , C., Rădulescu , R., Deconinck , G. & Nowé , A.2018. Reinforcement learning for demand response of domestic household appliances. In Proceedings of the Adaptive and Learning Agents Workshop at FAIM 2018.

Roijers , D. M.2016. Multi-Objective Decision-Theoretic Planning. PhD thesis, University of Amsterdam.

Roijers , D. M., Steckelmacher , D. & Nowé , A.2018. Multi-objective reinforcement learning for the expected utility of the return. In Proceedings of the Adaptive and Learning Agents Workshop at FAIM 2018.

Roijers , D. M., Vamplew , P., Whiteson , S. & Dazeley , R.2013. A survey of multi-objective sequential decision-making. Journal of Artificial Intelligence Research48, 67–113.

Roijers , D. M. & Whiteson , S.2017. Multi-objective decision making. Synthesis Lectures on Artificial Intelligence and Machine Learning11(1), 1–129.

Shapley , L. S. & Rigby , F. D.1959. Equilibrium points in games with vector payoffs. Naval Research Logistics Quarterly6 (1), 57–61.

Talpert , V., Sobh , I., Kiran , B. R., Mannion , P., Yogamani , S., El-Sallab , A. & Perez , P.2019. Exploring applications of deep reinforcement learning for real-world autonomous driving systems. In International Conference on Computer Vision Theory and Applications (VISAPP), February 2019.

Vamplew , P., Dazeley , R., Berry , A., Issabekov , R. & Dekker , E.2011. Empirical evaluation methods for multiobjective reinforcement learning algorithms. Machine Learning84 (1–2), 51–80.

Van Moffaert , K. & Nowé , A.2014. Multi-objective reinforcement learning using sets of pareto dominating policies. The Journal of Machine Learning Research15(1), 3483–3512.

Virtanen , P., Gommers , R., Oliphant , T. E., Haberland , M., Reddy , T., Cournapeau , D., Burovski , E., Peterson , P., Weckesser , W., Bright , J., van der Walt , S. J., Brett , M., Wilson , J., Jarrod Millman , K., Mayorov , N., Nelson , A. R. J., Jones , E., Kern , R., Larson , E., Carey , C., Polat , İ., Feng , Y., Moore , E. W., Vand erPlas, J., Laxalde , D., Perktold , J., Cimrman , R., Henriksen , I., Quintero , E. A., Harris , C. R., Archibald , A. M., Ribeiro , A. H., Pedregosa , F., van Mulbregt , P. & Contributors , S.2019. SciPy 1.0–Fundamental Algorithms for Scientific Computing in Python. arXiv e-prints, art. arXiv:1907.10121, July 2019.

Voorneveld , M., Vermeulen , D. & Borm , P.1999. Axiomatizations of paretoequilibria in multicriteria games. Games and Economic Behavior280 (1), 146–154.

Walraven , E. & Spaan , M. T. J.2016. Planning under uncertainty for aggregated electric vehicle charging with renewable energy supply. In Proceedings of the European Conference on Artificial Intelligence, 904–912.

Watkins , C. J. C. H. Learning from Delayed Rewards. PhD thesis, King’s College, Cambridge, UK, 1989.

Wierzbicki , A. P.1995. Multiple criteria games – theory and applications. Journal of Systems Engineering and Electronics60 (2), 65–81.

Wiggers , A. J., Oliehoek , F. A. & Roijers , D. M.2016. Structure in the value function of two-player zero-sum games of incomplete information. In Proceedings of the Twenty-second European Conference on Artificial Intelligence, 1628–1629. IOS Press.

Yliniemi , L., Agogino , A. K. & Tumer , K.2015. Simulation of the introduction of new technologies in air traffic management. Connection Science270 (3), 269–287.

Yliniemi , L. & Tumer , K.2016. Multi-objective multiagent credit assignment in reinforcement learning and nsga-ii. Soft Computing200 (10), 3869–3887.

Zhang , Y., Rădulescu , R., Mannion , P., Roijers , D. M. & Nowé , A.2020. Opponent modelling for reinforcement learning in multi-objective normal form games. In Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2020), May 2020.

Zinkevich , M., Greenwald , A. & Littman , M. L.2006. Cyclic equilibria in markov games. In Advances in Neural Information Processing Systems, 1641–1648.

Zintgraf , L. M., Kanters , T. V., Roijers , D. M., Oliehoek , F. A. & Beau , P.2015. Quality assessment of MORL algorithms: a utility-based approach. In Benelearn 2015: Proceedings of the Twenty-Fourth Belgian-Dutch Conference on Machine Learning.

Zintgraf , L. M., Roijers , D. M., Linders , S., Jonker , C. M. & Nowé , A.2018. Ordered preference elicitation strategies for supporting multi-objective decision making. In Proceedings of the 17th International Conference on Autonomous Agents and Multi-Agent Systems, 1477–1485. International Foundation for Autonomous Agents and Multiagent Systems.