doi:10.1017/S0269888921000163

Barrett , L. & Narayanan , S. 2008. Learning all optimal policies with multiple criteria. In Proceedings of the 25th International Conference on Machine Learning, New York, NY, USA, pp. 41–47.

Gabor , Z., Kalmar , Z. & Szepesvari , C. 1998. Multi-criteria reinforcement learning. In The Fifteenth International Conference on Machine Learning, San Francisco, CA, USA, pp. 197–205.

Geibel , P. 2006. Reinforcement learning for MDPs with Constraints. In Machine Learning: ECML 2006, Lecture Notes in Computer Science, vol. 4212.

Hwang , C. L. & Yoon , K. 1981. Multiple Attribute Decision Making: Methods and Applications, Lecture Notes in Economics and Mathematical Systems. Springer-Verlag.

Hwang , C. L. & Yoon , K. 1981. Multiple Attribute Decision Making: Methods and Applications. Springer-Verlag.

Issabekov , R. and Vamplew , P. 2012. An empirical comparison of two common multiobjective reinforcement learning algorithms. In AI 2012: Advances in Artificial Intelligence. Lecture Notes in Computer Science, vol. 7691, pp. 626–636.

Keeney , R. L. & Raiffa , H. 1976. Decision with Multiple Objectives: Preferences and Value Tradeoffs. Wiley.

MacCrimmon , K. R. & Toda , M. 1969. The experimental determination of indifference curves. The Review of Economic Studies, 36(4), 433–450.

MacCrimmon , K. R. & Wehrung , D. A. 1977. Trade-off Analysis: The Indifference and Preferred Proportions Approaches, Conflicting Objectives in Decisions. Wiley, pp. 123–147.

Moffaert , K. V. 2014. Multi-criteria reinforcement learning for sequential decision making problems, Ph.D. dissertation, Dept. Comput. Sci., Vrije Universiteit Brussel., Brussels, Belgium.

Moffaert , K. V., Drugan , M. M. & Nowé , A. 2013. Scalarized multi-objective reinforcement learning: Novel design techniques. In IEEE ADPRL, Singapore, pp. 191–199.

Moffaert , K. V. & Nowé , A. 2014. Multi-objective reinforcement learning using sets of pareto dominating policies. Journal of Machine Learning Research 15, 3483–3512.

Nguyen , T. T., Nguyen , N. D., Vamplew , P., Nahavandi , S., Dazeley , R. & Lim , C. P. 2020. A multi-objective deep reinforcement learning framework. Engineering Applications of Artificial Intelligence 96.

Roijers , D. M., Röpke , W., Nowe , A. & Radulescu , R. 2021. On following pareto-optimal policies in multi-objective planning and reinforcement learning. Paper Presented at Multi-Objective Decision Making Workshop 2021.

Roijers , D. M., Vamplew , P., Whiteson , S. & Dazeley , R. 2013. A survey of multi-objective sequential decision-making. Journal of Artificial Intelligence Research 48(1), 67–113.

Roijers , D. M., Zintgraf , L. M., Libin , P. & Nowé , A. 2018. Interactive multi-objective reinforcement learning in multi-armed bandits for any utility function. In ALA Workshop at FAIM, vol. 8.

Roijers , D. M., Zintgraf , L. M., Libin , P., Reymond , M., Bargiacchi , E. & Nowé , A. 2020. Interactive multi-objective reinforcement learning in multi-armed bandits with gaussian process utility models. In ECML-PKDD 2020: Proceedings of the 2020 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases.

Roijers , D. M., Zintgraf , L. M. & Nowé , A. 2017. Interactive thompson sampling for multi-objective multi-armed bandits. In Algorithmic Decision Theory, ADT 2017, Lecture Notes in Computer Science, vol. 10576. Springer.

Sutton , R. S. and Barto , A. G. 1998. Reinforcement Learning: An Introduction. Adaptive Computation and Machine Learning. MIT Press.

Tsitsiklis , J. N. 1994. Asynchronous stochastic approximation and q-learning. Journal of Machine Learning 16(3), 185–202.

Vamplew , P., Dazeley , R., Berry , A., Issabekov , R. & Dekker , E. 2011. Empirical evaluation methods for multiobjective reinforcement learning algorithms. Machine Learning 84, 51–80.

Vamplew , P., Dazeley , R. & Foale , C. 2017. Softmax exploration strategies for multiobjective reinforcement learning. Neurocomputing, 263, 74–86.

Vamplew , P., Issabekov , R., Dazeley , R., Foale , C., Berry , A., Moore , T. & Creighton , D. 2017. Steering approaches to Pareto-optimal multiobjective reinforcement learning. Neurocomputing 263, 26–38.

Vamplew , P., Yearwood , J., Dazeley , R. & Berry , A. 2008. On the limitations of scalarization for multi-objective reinforcement learning of Pareto fronts. In AI 2008: Advances in Artificial Intelligence. Lecture Notes in Computer Science, vol. 5360, pp. 372–378.

Watkins , C. 1989. Learning from delayed rewards, Ph.D. thesis, University of Cambridge, England.

Yoon , K. 1980. Systems selection by multiple attribute decision making, Ph.D. Dissertation, Kansas State University, Manhattan, Kansas.